Data Center Bridging (DCB) for Intel(R) Network Connections
===========================================================

September 10, 2008

Contents
========

- Background
- Requirements
- Functionality
- How To Build a DCB capable system
- Setup
- Operation
- Testing
- dcbtool Overview
- FAQ
- Known Issues
- License
- Support

Background
==========

In the 2.4.x kernel, qdiscs were introduced. The rationale behind this effort 
was to provide QoS in software, as hardware did not provide the necessary 
interfaces to support it. In 2.6.23, Intel pushed the notion of multiqueue 
support into the qdisc layer. This provides a mechanism to map the software 
queues in the qdisc structure into multiple hardware queues in underlying 
devices. In the case of Intel adapters, we leverage this combination to map
qdisc queues into the queues within our hardware controllers.

Within the Data Center, the perception is that traditional Ethernet: 
a) has high latency
b) is prone to losing frames rendering it unacceptable for storage applications

In an effort to address these issues, Intel and a host of industry leaders have
been working on addressing these problems. Specifically, within the IEEE 802.1 
standards body there are a number of task forces working on enhancements to 
address these concerns. Listed below are the applicable standards bodies:

	Enhanced Transmission Selection  
		Priority Groups: IEEE 802.1Qaz
	Lossless Traffic Class
		Priority Flow Control: IEEE 802.1Qbb
		Congestion Notification: IEEE 802.1Qau
	Improved bi-sectional bandwidth  
		Shortest Path Bridging: IEEE 802.1Qaq / IETF TRILL
	Dynamic Configuration 
		DCB Capability exchange protocol: In consideration in IEEE 802.
                1Qtbd

The software solution that is being released represents Intel's implementation
of these efforts. It is worth noting that many of these standards have not been
ratified - this is a pre-standards release, so users are advised to check 
Sourceforge often. While we have worked with some of the major ecosystem 
vendors in validating this release, there are many vendors which still have 
solutions in development. As these solutions become available and standards get 
ratified, we will work with ecosystem partners and the standards body to ensure
that the Intel solution works as expected.

Requirements
============

- Linux kernel version 2.6.27 or later.
- Linux ixgbe driver version 1.3.31.2 or newer.
- 2.6.23 or newer version of the "iproute2" package should be downloaded and
  installed from http://developer.osdl.org/dev/iproute2/download in
  order to obtain a multi-queue aware version of the 'tc' utility.
- Version 2.5.33 of Flex should be installed (to support 2.6.23 version of 
  iproute2). SLES10 is known to have an older version of Flex.  The latest
  Flex source can be obtained from http://flex.sourceforge.net/
- An up to date netlink library needs to be installed in order to compile dcbd.
- 82598-based Intel adapter.

Functionality
=============

dcbd 
    - Executes the DCB capabilities exchange protocol to exchange DCB
      configuration with the peer device using LLDP.
    - Supports DCBX versions 1 and 2
    - Retrieves and stores DCB configuration to a configuration file.
    - Controls the DCB settings of the network driver based on the
      operation of the DCB capabilities exchange protocol.
    - Detects and reacts to link up and link down events on network ports
    - Supports the Priority Group, Priority Flow Control and App TLV (FCoE,
      subtype 0) features.
    - Generates client interface events when the operational configuration
      or state of a feature changes.

dcbtool
    - Interacts with dcbd via the client interface.
    - Queries the state of the local, operational and peer configuration for
      the Priority Group, Priority Flow Control and App TLV features.
    - Interactive mode allows multiple commands to be entered interactively,
      as well as displaying asynchronous events.
    - Enables or disables DCB for an interface.

How To Build a DCB-Capable System
=================================

Linux kernel install:
---------------------

1. Requires 2.6.27 kernel.
2. Untar and make the kernel. Listed below are the required kernel options:

Required configuration options:

From make menu config
a.In the Device Drivers, Network device support menu 
     Select Netdevice multiple hardware queue support.
b. In the Networking, Network Options, QoS menu
     Select Hardware Multiqueue-aware Multi Band Queuing (MULTIQ)
     Select Multi  Band Priority Queueing (PRIO)
     Select Elementary classification ( BASIC ) 
     Select Universal 32bit comparisons w/ hashing (U32)
     Select Extended Matches and make sure U32 key is selected
     Select Actions -> SKB Editing
 
3. Build the kernel. 
4. Create a link from /usr/include/linux to 
   /usr/src/kernels/linux-2.x.xx.x/include/linux.  Use the following command: 
   ln -s /usr/src/kernels/linux-2.x.xx.x/include/linux /usr/include/linux.

ixgbe Base Driver Install
-------------------------

1. Download the ixgbe driver from the e1000 project in Sourceforge. 
   Please ensure the driver version is TBD or later.
2. Build & install as outlined in the ixgbe readme.  
   (tar zxvf ixgbe-xxx; make; make install).

dcbd Application Install
------------------------

1. Download iproute2 from the web. Listed below is a link for iproute:

   http://devresources.linux-foundation.org/dev/iproute2/download/. 

   Please ensure that you use the version that corresponds to the kernel 
   verison that you are using. Follow the build/installation instructions
   in the README with the tarball. Typically, the commands 
   ./configure;make;make install should work.
	
2. Download the latest version dcbd-x.y.z tarball from the e1000 project in 
   sourceforge and untar it. Go into the dcbd-x.y.z directory and run
   the following commands 
   
     make clean; make; make install. 

   This will build and copy 'dcbd' and 'dcbtool' to /usr/sbin, make the 
   '/etc/sysconfig/dcbd'directory (default location of the dcbd.conf file)
   and setup dcbd to run as a system service using the chkconfig program. 
   Verify that the dcbd service is working as expected with the 'service dcbd
   status' command. If the service is not on, issue the command 'service dcbd 
   start'

   dcbd will create the dcbd.conf file if it does not exist.

   For development purposes, 'dcbd' can be run directly from the build 
   directory.

Options
-------

dcbd has the following command line options:
    -h show usage information
    -f configfile    use the specified file as the config file instead of
                     the default file - /etc/sysconfig/dcbd/dcbd.conf
    -d run dcbd as a daemon
    -v show dcbd version

SETUP:
======

1. Load the ixgbe module. 

2. Verify dcbd service is functional.
   If dcbd was installed, do "service dcbd status" to check, "service
   dcbd start" to start.
   Or, run "dcbd -d" from the command line to start.

3. Enable DCB on the selected ixgbe port:  dcbtool sc ethX dcb on
  
4. The dcbtool command can be used to query and change the DCB configuration
   (ie., various percentages to different queues).  Use dcbtool -h to see a 
   list of options. 

Operation
=========

The 'tc' command is used to setup the qdisc and filters to cause network 
traffic to be transmitted on different queues.

To configure the MULTIQ qdisc to have 8 bands:

# tc qdisc add dev ethX root handle 1: multiq

Using the Flow id as the target in a TC filter allows you to classify a packet
into a band. Here are some examples of how to filter traffic into various 
bands using the flow ids:

# tc filter add dev ethX protocol ip parent 1: u32 match ip dport 80 \
0xffff action skbedit queue_mapping 1

# tc filter add dev ethX protocol ip parent 1: u32 match ip dport 53 \
0xffff action skbedit queue_mapping 2

# tc filter add dev ethX protocol ip parent 1: u32 match ip dport 5001 \
0xffff action skbedit queue_mapping 3

# tc filter add dev ethX protocol ip parent 1: u32 match ip dport 22 \
0xffff action skbedit queue_mapping 4

# tc filter add dev ethX protocol ip parent 1: u32 match ip dport 137 \
0xffff action skbedit queue_mapping 5

# tc filter add dev ethX protocol ip parent 1: u32 match ip dport 138 \ 
0xffff action skbedit queue_mapping 5

# tc filter add dev ethX protocol ip parent 1: u32 match ip dport 139 \
0xffff action skbedit queue_mapping 5

# tc filter add dev ethX protocol ip parent 1: u32 match ip dport 25 \
0xffff action skbedit queue_mapping 6

# tc filter add dev ethX protocol ip parent 1: u32 match ip dport 21 \
0xffff action skbedit queue_mapping 7

# tc filter add dev ethX protocol ip parent 1: u32 match ip dport 20 \
0xffff action skbedit queue_mapping 8

Here is an example that sets up a filter based on EtherType.  In this example,
the EtherType is 0x8906.

# tc filter add dev ethX protocol 802_3 parent 1: handle 0xfc0e basic match \
'cmp(u16 at 12 layer 1 mask 0xffff eq 35078)' action skbedit queue_mapping 4

Testing
=======

To test in a back-to-back setup, use the following tc commands to setup the 
qdisc and filters for TCP ports 5000 through 5007.

# tc qdisc add dev ethX root handle 1: multiq

# tc filter add dev ethX protocol ip parent 1: \
u32 match ip dport 5000 0xffff action skbedit queue_mapping 1

# tc filter add dev ethX protocol ip parent 1: \
u32 match ip sport 5000 0xffff action skbedit queue_mapping 1

# tc filter add dev ethx protocol ip parent 1: \
u32 match ip dport 5001 0xffff action skbedit queue_mapping 2

# tc filter add dev ethX protocol ip parent 1: \
u32 match ip sport 5001 0xffff action skbedit queue_mapping 2

# tc filter add dev ethX protocol ip parent 1: \
u32 match ip dport 5002 0xffff action skbedit queue_mapping 3

# tc filter add dev ethX protocol ip parent 1: \
u32 match ip sport 5002 0xffff action skbedit queue_mapping 3

# tc filter add dev ethX protocol ip parent 1: \
u32 match ip dport 5003 0xffff action skbedit queue_mapping 4

# tc filter add dev ethX protocol ip parent 1: \
u32 match ip sport 5003 0xffff action skbedit queue_mapping 4

# tc filter add dev ethX protocol ip parent 1: \
u32 match ip dport 5004 0xffff action skbedit queue_mapping 5

# tc filter add dev ethX protocol ip parent 1: \
u32 match ip sport 5004 0xffff action skbedit queue_mapping 5

# tc filter add dev ethX protocol ip parent 1: \
u32 match ip dport 5005 0xffff action skbedit queue_mapping 6

# tc filter add dev ethX protocol ip parent 1: \
u32 match ip sport 5005 0xffff action skbedit queue_mapping 6

# tc filter add dev ethX protocol ip parent 1: \
u32 match ip dport 5006 0xffff action skbedit queue_mapping 7

# tc filter add dev ethX protocol ip parent 1: \
u32 match ip sport 5006 0xffff action skbedit queue_mapping 7

# tc filter add dev ethX protocol ip parent 1: \
u32 match ip dport 5007 0xffff action skbedit queue_mapping 8

# tc filter add dev ethX protocol ip parent 1: \
u32 match ip sport 5007 0xffff action skbedit queue_mapping 8  

dcbtool Overview
=================

dcbtool is a tool to query and set the DCB settings of an Ethernet device.

Options
-------

The basic usage of dcbtool is:  
     dcbtool [options] <cmd> [cmd arguments]

If no command is specified, then dcbtool will enter an interactive mode which 
allows commands to be entered interactively. When invoked in interactive 
mode, dcbtool will also register with dcbd as an event monitor.

-h     shows the dcbtool usage message
-v     shows dcbtool version information

Commands
--------

help		shows the dcbtool usage message

level <value>	sets the threshold dcbd uses to send event messages to this 
                interface.
			Levels are:
			0:	MSG_DUMP - for dcbd debugging
			1:	MSG_DEBUG - for dcbd debugging
			2:	MSG_INFO - for free form dcbd informational 
                                messages
			3:	MSG_WARNING - for free form dcbd warning 
                                messages
			4:	MSG_DCB - for formated DCB event messages

ping		test command. dcbd responds with "PONG"

license		displays dcbtool license information

quit		exit from interactive mode

gc  <ifname>  <dcb | pg | pfc | app:<subtype>>
                Get configuration command. Returns the local configuration for
                the specified interface and DCB feature. Features are:
                dcb - DCB state of port
                pg - priority groups
                pfc - priority flow control
                app:<subtype> - application with subtypes:
	                 fcoe

go  <ifname>  <pg | pfc | app:<subtype>>
                Get operational status command. Returns the current DCB 
                operational status and configuration for the specified 
                interface and DCB feature.

gp  <ifname> <pg | pfc | app:<subtype>>
                Get peer status command. Returns the current DCB status 
                and configuration for the DCB peer of the specified interface
                and DCB feature.

sc  <ifname> dcb <on | off>
                Set the DCB state of the port.

sc  <ifname> <pg | pfc | app:<subtype>>  [a:<0|1>] [e:<0|1>] [w:<0|1>] 
                [feature attributes]
		Set configuration command. Allows setting, per DCB feature, 
                the following:
		a:<0|1> - enable/disable advertising the feature via DCB
		e:<0|1> - enable/disable the DCB feature
		w:<0|1> - enable/disable willing flag for the feature
                If one of the above parameters is not specified in the set
                config command, then the DCB feature attributes must be 
                specified.

                pg - priority group attributes
                Priority group attributes which are not supplied will remain
		unchanged.  Except for the 'uppct'.  If 'uppct' is not
		supplied and the resulting percentages within each PGID do
		not add up to 100%, then dcbd will compute new user priority
		percentages for each PGID so that the sum for each PGID is
		100%.

                up2tc:<0-7><0-7><0-7><0-7><0-7><0-7><0-7><0-7>
	        This is the user priority to traffic class mapping parameter. 
                If not specified, the default value is "01234567".

                pgid:<0-7F><0-7F><0-7F><0-7F><0-7F><0-7F><0-7F><0-7F>
	        This is the priority group ID mapping. From left to right, 
                each number indicates the priority group ID to which the 
                corresponding user priority, starting with zero, belongs.
		A value of 'F' (0xF) indicates that the user priority
		belongs to the link strict group.

                pgpct:<0-100>,<0-100>,<0-100>,<0-100>,<0-100>,<0-100>,<0-100>,
                <0-100>		                                
	        This is the priority group bandwidth percentage of link 
                setting. From left to right, each number indicates the 
                percentage of total link bandwidth which is allocated to the 
                corresponding priority group ID, starting with zero.

                uppct:<0-100>,<0-100>,<0-100>,<0-100>,<0-100>,<0-100>,<0-100>,
                <0-100>					
 	        This is the user priority bandwidth percentage of bandwidth 
                group setting. From left to right, each number indicates the 
                percentage of total priority group bandwidth which is 
                allocated to the corresponding user priority, starting with 
                zero.

                strict:<0-2><0-2><0-2><0-2><0-2><0-2><0-2><0-2>	
	        This is the strict priority settings for each user priority.  

                From left to right, each number indicates the strict priority 
                setting for each user priority, starting with zero. A value 
                of 0 represents no strict priority setting, 1 represents group
                strict.

                pfc - priority flow control attributes

                pfcup:<0|1><0|1><0|1><0|1><0|1><0|1><0|1><0|1>	
                This is the priority flow control enable setting per user 
                priority. From left to right, each number indicates the enable
                setting of priority flow control for the corresponding user 
                priority, starting from zero. 0 means disabled, 1 means 
                enabled.

                app - application attributes

                appcfg:<xx>
	        This represents the application settings for the specified 
                application subtype. Assuming that the application attribute 
                setting is an arbitrary sequence of bytes of a given length, 
                the setting is supplied to dcbtool by representing each byte as
                a 2 character hexadecimal string. Thus, if the application 
                attribute setting is the value  0x5a, the dcbtool attribute 
                would be "appcfg:5a"

FAQ
===

- How did Intel verify their DCB solution?

  Answer - The Intel solution is continually evolving as the relevant 
  standards become solidified and more vendors introduce DCB capable systems.
  That said, we initially used test automation to verify the DCB state 
  machine. As the state machine became more robust and we had DCB capable 
  hardware, we began to test back to back with our adapters. Finally, we 
  introduced DCB capable switches in our test bed.

Known Issues
============

- Prior to kernel 2.6.26, tso will be disabled when the driver is put into DCB 
  mode.

- A TX unit hang may be observed when link strict priority is set when a large 
  amount of traffic is transmitted on the link strict priority.

License
=======

  dcbd and dcbtool - DCB daemon and command line utility DCB configuration
  Copyright(c) 2007-2008 Intel Corporation.

  Portions of dcbd and dcbtool (basically program framework) are
  base on:
    hostapd-0.5.7
    Copyright (c) 2004-2007, Jouni Malinen <j@w1.fi>
 
  This program is free software; you can redistribute it and/or modify it
  under the terms and conditions of the GNU General Public License,
  version 2, as published by the Free Software Foundation.

  This program is distributed in the hope it will be useful, but WITHOUT
  ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
  FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
  more details.

  You should have received a copy of the GNU General Public License along with
  this program; if not, write to the Free Software Foundation, Inc.,
  51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.

  The full GNU General Public License is included in this distribution in
  the file called "COPYING".

Support
=======

  Contact Information:
  e1000-eedc Mailing List <e1000-eedc@lists.sourceforge.net>
  Intel Corporation, 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497
