ExoGENI 40Gbps TCP throughput testing

Background – WAN approach

The initial effort to perform 40G TCP throughput testing included two network nodes/endpoints with one located at the StarLight facility in Chicago and the other at the Open Science Facility at NERSC in Oakland. Unfortunately, the results are not consistently repeatable. At times, a 27.5Gbps TCP stream was achieved using Iperf3 for traffic generation. It’s still a work in progress, as more troubleshooting needs to occur to make it available as a service.

LAN Implementation

In order to demonstrate that the ExoGENI nodes can support the required throughput, we restricted the size of the testbed to include two servers within the same rack connected by a top of rack switch. The following diagram (Fig. 1) illustrates the 40GbE connectivity.

40GbE LAN implementation

Figure 1. 40GbE LAN implementation

Server description:
IBM System x3650 M4
Dual Intel Xeon Processor E5-2650 8C 2.0GHz 20MB Cache 1600MHz 95W
(Quantity 8) 8GB (1x8GB, 2Rx4, 1.5V) PC3-12800 CL11 ECC DDR3 1600MHz LP RDIMM
Mellanox ConnectX-3 EN Dual-port QSFP+ 40GbE Adapter

Top of rack switch:
IBM Networking Operating System RackSwitch G8264

Using Flukes (Fig. 2) to provision the directly connected bare metal servers, each server was provided the following post-boot script to 40G tune the Centos 6.5 kernel and prepare the servers for testing which includes loading the correct driver for the 40G NIC. A thorough description of the tuning process is provided at ESnet’s Fasterdata Knowledge Base – http://fasterdata.es.net/host-tuning/linux/ . Another important thing to know is that the G8264’s upstream port from the receiving server needs to have flow control enabled in the receive direction in order to get pause frames that are generated from the receiving server. It’s to prevent the switch from sending faster than the server can receive the incoming frames.

Flukes 40GbE post-boot script

Figure 2. Flukes generated 40GbE post-boot configuration

Bare Metal postboot script:


#!/bin/bash
#driver: mlx4_en
#version: 2.1.11 (Oct 26 2015)
#firmware-version: 2.31.5050
#fetch and install the 40G driver
mount -o remount,size=100M tmpfs /tmp/
cd /opt
wget geni-images.renci.org/images/ckh/mlnx-en-2.1-1.0.0.tgz
tar xzvf mlnx-en-2.1-1.0.0.tgz
cd mlnx-en-2.1-1.0.0
echo y | ./install.sh
#remove existing driver
modprobe -r mlx4_en
#load new driver
modprobe mlx4_en
#set kernel parameters
sysctl -w net.core.rmem_max=536870912
sysctl -w net.core.wmem_max=536870912
sysctl -w net.ipv4.tcp_rmem="4096 87380 268435456"
sysctl -w net.ipv4.tcp_wmem="4096 65536 268435456"
sysctl -w net.ipv4.tcp_congestion_control=htcp
sysctl -w net.ipv4.tcp_mtu_probing=1
sysctl -w net.ipv4.tcp_timestamps=1
sysctl -w net.ipv4.tcp_sack=1
sysctl -w net.ipv4.tcp_low_latency=0
sysctl -w net.core.netdev_max_backlog=250000
/sbin/ifconfig p2p1 txqueuelen 10000
/usr/sbin/ethtool -C p2p1 rx-usecs 100
/usr/sbin/set_irq_affinity_bynode.sh 1 p2p1
ifconfig p2p1 mtu 9000

Results

[root@sl-w9 proc]# iperf3 -i1 -t15 -c10.0.0.2 -w200m -A1,1
Connecting to host 10.0.0.2, port 5201
[ 4] local 10.0.0.1 port 43260 connected to 10.0.0.2 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr Condo
[ 4]   0.00-1.00   sec 4.09 GBytes 35.1 Gbits/sec   0   4.71 MBytes
[ 4]   1.00-2.00   sec 4.14 GBytes 35.6 Gbits/sec   0   4.71 MBytes
[ 4]   2.00-3.00   sec 4.15 GBytes 35.6 Gbits/sec   0   4.71 MBytes
[ 4]   3.00-4.00   sec 4.15 GBytes 35.6 Gbits/sec   0   4.71 MBytes
[ 4]   4.00-5.00   sec 4.14 GBytes 35.6 Gbits/sec   0   4.71 MBytes
[ 4]   5.00-6.00   sec 4.15 GBytes 35.6 Gbits/sec   0   4.71 MBytes
[ 4]   6.00-7.00   sec 4.14 GBytes 35.6 Gbits/sec   0   4.71 MBytes
[ 4]   7.00-8.00   sec 4.15 GBytes 35.6 Gbits/sec   0   4.71 MBytes
[ 4]   8.00-9.00   sec 4.15 GBytes 35.6 Gbits/sec   0   4.71 MBytes
[ 4]   9.00-10.00 sec 4.15 GBytes 35.6 Gbits/sec   0   4.71 MBytes
[ 4] 10.00-11.00 sec 4.15 GBytes 35.6 Gbits/sec   0   4.71 MBytes
[ 4] 11.00-12.00 sec 4.15 GBytes 35.6 Gbits/sec   0   4.71 MBytes
[ 4] 12.00-13.00 sec 4.15 GBytes 35.6 Gbits/sec   0   4.71 MBytes
[ 4] 13.00-14.00 sec 4.15 GBytes 35.6 Gbits/sec   0   4.71 MBytes
[ 4] 14.00-15.00 sec 4.15 GBytes 35.6 Gbits/sec    0   4.71 MBytes
– – – – – – – – – – – – – – – – – – – – – – – – –
[ ID] Interval           Transfer     Bandwidth       Retr
[ 4]   0.00-15.00 sec 62.1 GBytes 35.6 Gbits/sec   0             sender
[ 4]   0.00-15.00 sec 62.1 GBytes 35.6 Gbits/sec                 receiver

iperf Done.

Miscellaneous

[root@sl-w9 proc]# cat /etc/redhat-release
CentOS release 6.5 (Final)

[root@sl-w9 proc]# ethtool -i p2p1
driver: mlx4_en
version: 2.1.11 (Oct 26 2015)
firmware-version: 2.31.5050
bus-info: 0000:20:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no

 

Have something to add?

Loading Facebook Comments ...