             Open Fabrics Enterprise Distribution (OFED)
                    SDP in OFED 1.5.2 Release Notes

                          August 2010



===============================================================================
Table of Contents
===============================================================================
1. Overview
2. Bug Fixes and Enhancements since OFED 1.5.1
3. Known Issues
4. Verification Applications/Flows/Tests

===============================================================================
1. Overview
===============================================================================
SDP in OFED is at GA level for OFED 1.5.2
Main changes are:
- Fixed stability issues
- Latency is 4 msec
- BW without jitter
- Improved device removal
- Bug fixes

Missing features:
- AIO support
- inline send support
- ZCopy pipeline mode

===============================================================================
2. Bug Fixes and Enhancements since OFED 1.5.1
===============================================================================
* Cleanups
    - Removed unnecessary variables.
    - Removed printk warning.
    - Added support for 2.6.30 / 2.6.32.
    - Removed whitespaces.
    - Removed sdp_bzcopy_thresh module parameter.
    - Removed many irq/bh locks

* Bug Fixes
    - Improved recovery from errors.
    - Fixed datapath hangings.
    - Fixed module reference count.
    - Fixed orphan count logic issues.
    - Fixed device removal.
    - Added support for ib devices that do not support fmr.
    - Improved support for PPC.
    - Fixed OOB support

* Enhancements
    - Enabled FMR pool cache.
    - Limited FMR resources.
    - Added support for handling multi iov's in ZCOPY.
    - Added usage of polling in rx.
    - Added usage of max number of SGE from HW capabilities.
    - Added module parameter to disable SDP over RoCE.
    - Made sdp_socket.h available to user applications.
    - Has CPU affinity per skb handling
    - RX polling in usec granularity


===============================================================================
3. Known Issues
===============================================================================
- BUG 1331 - TCP allows connecting to IP_ANY - 0.0.0.0 (as a destination address!).
  SDP does not allow connecting to IP_ANY and will reject the connection.

- BUG 1444 - The setsockopt(SO_RCVBUF) is not working in sdp socket. To limit top
  system wide sdp memory usage for recv, use the module parameter top_mem_usage.

- SDP is at beta level on Infinihost HCA family

- Each SDP socket currently consumes up to 2 MBytes of memory. If this value
  is high for your installation, it is possible to trade off performance
  for lower memory utilization per socket by reducing the value of the
  "rcvbuf_scale" module parameter (default: 16).

  Note: The minimum legal value for the "rcvbuf_scale" module is 1.
  At this parameter value, each socket will consume approximately 128 KBytes.

- Small message size performance is low when messages are sent by client
  at a rate lower than the rate at which they are consumed by server,
  and when TCP_CORK is not set. This is observed, for example, with iperf
  benchmark. As a workaround, set the TCP_CORK socket option
  to ensure data is sent in at least 32K byte chunks.

- Performance is low on 32-bit kernels, as SDP utilizes high memory
  to ease memory pressure. Moving to a 64-bit kernel solves this
  problem even if the application remains a 32-bit one.

- By default, SDP utilizes a 2 Kbyte MTU size.  This may cause PCI-X cards
  using Mellanox Technologies "Infinihost" HCAs to experience low bandwidth.
  Workaround:  reset the MTU size to 1K in this situation, using either of
  the two methods below:

  1. Activate the "tavor quirk" workaround in opensm:
     a. Create an opensm options cache file (/var/cache/osm/opensm.opts):
          > opensm --cache-options -o
     b. Add the following line to /var/cache/osm/opensm.opts:
          enable_quirks TRUE
     c. Rerun opensm using your usual command line options to activate
        the opensm quirk option.

  2. Activate the "tavor quirk" workaround in cma:
       set the tavor_quirk module parameter of the rdma_cm module to value 1
       (default: 0).

- ZCopy is enabled by default for blocks larger than 64K. ZCopy can be disabled 
  by setting the module paramter sdp_zcopy_thresh to zero or to any other value 
  by setting it to another non zero value.

- ZCOPY mode gives good performance for large blocks with very small cpu 
  utilization. When in use, all messages longer than 'sdp_zcopy_thresh' bytes 
  in length will cause the user space buffer to be pinned and the data sent 
  directly from the original buffer. This results in less CPU usage and on many 
  systems in enhanced bandwidth.
  ZCOPY is most efficient with multi stream jobs and it performs better as the 
  message size increases.
  The default 64K value for 'sdp_zcopy_thresh' is sometimes too low for some 
  systems. You must experiment with your hardware to select the best value.

- ZCOPY vs BCOPY:
  ZCOPY performance is more efficient in weak cpu and multi streams, whereas 
  BCOPY is more efficient in single stream.

- To use SDP over RoCE, please set 'sdp_link_layer_ib_only' module parameter 
  to 0.

===============================================================================
4. Verification Applications/Flows/Tests
===============================================================================
- ssh/sshd
- wget/netscape/firefox/apache                  
- netpipe               
- netperf             
- LTP socket tests
- iperf-2.0.2         
- ttcp
- Threaded and forking echo client server examples
- Various Java client server applications (SUN:jre, BEA:jrockit/WebLogic, GNU:gij/gcj)
- Many UNIX utilities to verify that pre-load did not harm the applications


