ORCA5 upgrade

As we’re upgrading the ExoGENI infrastructure to the new release of ORCA (5.0 Eastsound), there are a few things experimenters should know about the features and capabilities of this new release.

The main feature being added is the so called state recovery, or the ability to restart the various ORCA actors and retain the state about created slices. This will allow experimenters to run long-lived experiments without concerns about the interference of software updates or some other disruptive events. The recovery handles many situations, although catastrophic events may still result in the loss of slice information.

Another area of attention for us has been bare-metal node provisioning – we have strengthened the code that performs bare-metal provisioning, making it more error-proof and also added the ability to attach iSCSI storage volumes to bare-metal nodes. This capability until now has only worked for virtual machine slivers.

ORCA5 has allowed us to enable hybrid mode support in the rack switches, which in simple terms means those experimenters that care to use the OpenFlow capabilities of the switch, can do that, while the rest can use traditional VLANs, with more predictable performance guarantees.

Finally we introduced the ability to see the boot consoles of VMs in case of failure, a feature we hope will help in debugging stubborn image creation issues.

Running OpenFlow experiments with real rack switches

We’ve described in the past how to run OpenFlow experiments in ExoGENI using virtual OVS switches hosted in virtual machines. With the release of ORCA5, it is now possible to run some experiments using the real OpenFlow switches in ExoGENI racks (for IBM-based racks they are the BNT G8264).

To do that, start your OpenFlow controller on a host with a publicly accessible IP address. Then create a slice in Flukes (remember to assign IP addresses to dataplane interfaces of the nodes) as shown in the figure below, making sure to mark the reservation as ‘OpenFlow’ and fill in the details – your email, slice password (not really important – can be any random string and you don’t need to remember it) and the url of the controller in the form of

tcp:<hostname or ip>:<port number, typically 6633>

Submit the slice and wait for the Flowvisor on the rack head node to connect to your controller. You should see the various events (e.g. PACKET_IN) flowing by in the controller log. And that’s it.

Creating a slice using OpenFlow capabilities of the rack switch

Creating a slice using OpenFlow capabilities of the rack switch

Current limitations – only one link per slice. Only ORCA5 IBM racks can do this at the moment, which excludes UvA, WVnet, NCSU and Duke racks.

List of Posts

Featured

This is a ‘sticky’ post listing all ExoBlog entries for your convenience.

Dealing with LLDP topology discovery in ExoGENI slices.

In this article, courtesy of Rajesh Gopidi, we are going to introduce you to a way of dealing with LLDP-based topology discovery in OpenFlow slices, i.e. slices that stand up OVS instances in the nodes and may run their own version of FlowVisor on top of OVS to simulate, for example, a multi-controller environment within the slice.

The proposed solution is implemented as a new command command in Flowvisor which updates the ethertype and destination MAC address parameters used to identify LLDP packets in order to prevent them being “swallowed” by the hardware switches within ExoGENI infrastructure that create links between nodes.

A typical OpenFlow slice in ExoGENI

A typical OpenFlow slice in ExoGENI

It is very easy to stand up OpenFlow experiments in ExoGENI by using one of the the OVS images available to the experimenters. However, one typical problem all of them face is the inability of OpenFlow controller(s) to discover the topology of switches – OpenVSwitch running inside a node – by sending LLDP packets. This is due to the fact that underlying physical switche(s) that make up the virtual link(s) connecting nodes in the slice absorb them instead of relaying them. This precludes the OpenFlow controllers in the slice from discovering the topology.

One simple work-around for the above problem is to modify the header – ethertype and destination MAC address – of LLDP packets sent out by the OpenFlow controller(s) so that the underlying physical switches don’t recognize them. For example, in Floodlight controller, one can change the header of LLDP packets by modifying the constants TYPE_LLDP and LLDP_STANDARD_DST_MAC_STRING in net.floodlightcontroller.packet.Ethernet.java and net.floodlightcontroller.linkdiscovery.internal.LinkDiscoveryManager files respectively.

While the above work-around can be successfully applied in scenarios where there is only one OpenFlow controller programming the switches in the network. However, consider a case where the network in the slice is virtualized using Flowvisor, and multiple controllers are enabled to program a switch simultaneously. In such scenarios, Flowvisor, which also uses LLDP, will not be able to classify packets with modified LLDP header as LLDP probes. Thus, it nullifies the work around and drops LLDP packets sent by OpenFlow controller(s).

This is where the new command comes in handy. Using the new command one can modify the parameters used by Flowvisor to identify LLDP packets. To update both the ether type and destination MAC address parameters, use the command as shown below:

fvctl update-lldp-header <ether type in hex> <destination MAC address>
To verify if the update was successful, use the command “fvctl get-lldp-header-info” and check if the values are as expected.
Example usage of the new flowvisor command

Example usage of the new flowvisor command

After modifying the LLDP parameters as shown above, the OpenFlow controller is able to discover the links in the topology using LLDP packets, as shown in the figures below:

A Snapshot of Flowvisor log showing the processing of LLDP packets by it.

A Snapshot of Flowvisor log showing the processing of LLDP packets by it.

Topology discovered by the OpenFlow Floodlight controller using LLDP packets

Topology discovered by the OpenFlow Floodlight controller using LLDP packets

To make this work you will need to install the updated flowvisor into your slice. The code for the flowvisor that has these new options can be found in github. Export to zip, download and install by following the instructions in the INSTALL file.

 

 

 

 

 

 

 

Notes from June 2014 maintenance

There are no new software features. This maintenance was meant to restore stitching to UCD rack broken by the reconfiguration of Internet2 AL2S.

  • There is a new rack at TAMU that is accessible for intra-rack slices only for now due to lack of physical connectivity to its upstream provider. Once the connectivity is there we will enable stitching to TAMU. See the wiki for rack controller information.
  • There is a new version of Flukes with minor changes. As a reminder, Flukes has been reported not to work well with Java 7 on certain platforms, specifically Mac OS. Java 6 typically solves the problems.
  • Connectivity to OSF rack currently is malfunctioning due to problems with ESnet OSCARS control software instance. We will announce when those problems have been addressed.
    • Has been fixed.
  • Stitching to UCD has been restored.

 

 

Using stitchports to connect slices to external infrastructure

This post comes to us courtesy of Cong Wang from UMass Amherst.

ExoGENI provides the key features for stitching among multiple ExoGENI racks or connecting non-ExoGENI nodes to ExoGENI slices via a stitchport. This article provides some initial instructions on how to stitch ExoGENI slices to the external infrastructure with stitchports using VLANs.

The procedure of stitching to external stitch port is similar to the example above, except that the local machine needs to be configure correctly to allow connection to the VLAN. In the following example, the local laptop (outside ExoGENI testbed) is located in University of Wisconsin Madison campus. It connects to ExoGENI via layer 2 vlan 920. The Flukes request is shown in the figure. The available stitchport URL and Label/Tag can be found in ExoGENI wiki. More stitchports at other locations can be added upon request via the users mailing list.

stitchport1

After the slice is successfully reserved, the manifest view should be similar to the one shown below:

stitchport2

In order to attach the local (non-ExoGENI) node to the VLAN, the interface connected to the VLAN has to be configured correctly. In addition to the regular IP configuration, the attached interface has to be configured with the correct VLAN ID.

In the following an example for the case of Ubuntu is given:

  • In case this module is not loaded the 8021q module into the kernel:
     sudo modprobe 8021q
  • Then create a new interface that is a member of a specific VLAN. VLAN id 10 is used in this example. Keep in mind you can only use physical interfaces as a base, creating VLAN’s on virtual interfaces (i.e. eth0:1) will not work. We use the physical interface eth1 in this example. The following command will add an additional interface next to the interfaces which have been configured already, so your existing configuration of eth1 will not be affected:
  •  sudo vconfig add eth1 10
  • At last, assign an address to the new interface (the address should be consistent with other IP addresses using in the dataplane of the slice):
     sudo ip addr add 172.16.0.5/24 dev eth1.10

At this point, you should be able to ping Node1, which means the stitchport has been successfully attached into ExoGENI.

Lehigh University CSE 303 Operating System HW10

Author: Dawei Li

Topic: Category-based N-gram Counting and Analysis Using Map-Reduce Framework

Class Size: 14 students

We use one GENI account (through Flukes) to create slices for all students, and distribute the SSH private key as well as the assigned IP address to each of them.

Statistics
Total Slices 15
Total Nodes 73
Controllers Used OSF (7 slices/7 nodes each) WVN (8 slices/3 nodes each)
Duration 9 days (11 slices) 14 days (4 slices)

Comments:

The Flukes tool is really convenient. I just spend a few hours to figure out how to use it and how to create a Hadoop cluster. Only one thing is that I have to poll the status of the slice myself to know if it is ready or not. As far as I know, the Flack GUI of ProtoGENI can poll it automatically and show users the changing status until it is ready.

I have heard no complaint from students about connection problem, meaning that the testbed resources are relatively stable whether accessing on campus or not. Some students cannot log into the testbed just because they are not familiar with SSH. However, one grader said that he couldn’t log into the testbed on May 3rd (around midnight) using one OSF slice, but he can log in again the next morning.

Notes from April 2014 maintenance

This note summarizes the results of maintenance across the entire ExoGENI Testbed in April 2014.

The highlights

  • Minor fixes added to BEN to better support inter-domain topologies
  • UDP performance issue addressed. See below for more details.
  • FOAM updated across the racks
  • Floodlight updated across the racks
  • Connectivity
    • most of the connectivity caveats from the previous note still apply.
    • UvA rack is currently not reachable. We suspect a problem with the configuration in SURFnet, which we will address.
      • This behavior appears to have resolved itself by 05/06 without our intervention. Please report any further problems. 

The details: UDP performance

Some of you have observed very poor performance for UDP transfers – extremely high packet losses and very low transfer rates. This issue was traced to three separate causes:

  • Poor implementation of “learning switch” functionality in the version of Floodlight OpenFlow controller we were using. It resulted in sudden losses of packets after a period of time, particularly between bare-metal nodes. To resolve this issue we upgraded Floodlight to version 0.9 and replaced the “learning switch” module in it with a better-behaved “forwarding” module.
  • Insufficient configuration of the QEMU interface to the guest VM, which resulted in very high packet losses. We updated the Quantum agent to support the proper options.
  • Sensitivity of UDP transfers to host- and guest-side transmit and receive buffers. We tuned the host-side buffers on worker nodes, however the tuning guest-side must be accomplished by the experimenter.

To explain further how to get the best performance out of UDP on ExoGENI we will publish a separate blog entry in the immediate future.

New Capability: SoNIC-enabled Software-Defined Precise Network Measurements in GENI

This post comes to us courtesy of Dr. Hakim Weatherspoon of Cornell University.

It introduces a new capability recently added to ExoGENI slices for precise network measurements using SoNIC, Software-defined Network Interface Card (http://sonic.cs.cornell.edu). Users can now allocate bare-metal nodes that are equipped with SoNIC cards to their topology for precise network measurements. Examples of possible usage includes precise traffic capture and generation, network device characterization, available bandwidth estimation, fine-grain clock synchronization, and even physical layer (PHY) timing channel creation and detection.  See the Figure.

sonic1

SoNIC enables realtime access from software to the entire network stack, especially including the data link and physical layers of a 10 Gbps Ethernet network. By implementing the creation of the bitstream in software and the transmission of the bitstream in hardware, SoNIC provides complete control over the entire network stack in realtime. SoNIC utilizes commodity-off-the-shelf multi-core processors to implement part of the physical layer in software, and employs an FPGA board to transmit optical signals over the wire. As an example of SoNIC’s fine-granularity control, it can perform precise network measurements at picosecond scale, accurately characterizing network components such as routers, switches, and network interface cards. For a complete description of SoNIC’s internal design, readers can refer the papers published in NSDI 2013 and 2014 (http://fireless.cs.cornell.edu/sonic/publications.php).

Here we demonstrate how to create a slice with bare-metal nodes to use SoNIC in ExoGENI.  In ExoGENI, a bare-metal node is a special type of a compute node, in which users are granted full root access to the bare-metal resource. Currently, SoNIC-enabled bare-metal nodes are located at RENCI XO Rack and UCD XO Rack. To use a SoNIC-enabled node, simply add a compute node to your slice request from the drop down list of resources. After that, drag a link between two compute nodes to form a dumb-bell topology. The link represents the network between two compute nodes. In this case, it will represent the network stretch from Chapel Hill, NC to UC Davis, CA.

Your request should look like the following figure.

sonic2

Next, we will configure the nodes to be bare-metal nodes with SoNIC installed. Right click on one of the compute nodes to view and set its properties. For the node type, select ExoGENI Bare-metal. For the domain, select RENCI (Chapel Hill, NC USA) XO Rack for node 0. You can also configure the Link 0 IP address. Note this configures the IP address of the Chelsio 10G NIC available in the bare-metal node. It does not configure the SoNIC card.

sonic3

Similarly, we configure the other node in the topology to use the bare-metal resource from UCD XO Rack.

Now, double check the settings, give a name to the slice and click the submit button. Wait a while for the slice to be instantiated. You should see a pop-up window with response from ORCA as shown below.

sonic5

In the next blog post, we will demonstrate how to use the created SoNIC slice to perform interesting network measurement experiment in ExoGENI.

Notes from Mar 2014 maintenance

This note summarizes the results of maintenance on XO-BBN, XO-RCI, XO-FIU, XO-UFL, XO-SL racks and ExoSM controller.

The highlights

The purpose of the maintenance event was to reconfigure several of the racks to allow for wider range of vlans to be usable for stitching.

  • New rack XO-UCD at UC Davis was added, however it is not fully reachable due to unresolved connectivity in CENIC.
  • The rack at Starlight (XO-SL) has been reconfigured to use the currently available narrow range of vlans and added support for GEC19 SDX demo via vlan 1655.
  • Support for SONIC cards was added to XO-RCI and XO-UCD
  • A controller bug was resolved in ExoSM

Connectivity caveats

Not all racks currently visible through ExoSM can be reached using stitching. We expect these issues should be resolved in the immediate future:

  • XO-UCD connectivity via CENIC across all vlans
    • Resolved on 03/07/2014
  • XO-SL connectivity across all vlans
    • Resolved on 03/12/2014
  • XO-NICTA continues to have problems with vlans 4003 and 4005
  • XO-BBN has a problem with vlan 2601 in NOX/BBN network
    • Resolved on 04/05/2014
  • Direct connectivity between XO-UFL and XO-FIU across all vlans
    • Resolved on 05/20/14