ORCA5 upgrade

As we’re upgrading the ExoGENI infrastructure to the new release of ORCA (5.0 Eastsound), there are a few things experimenters should know about the features and capabilities of this new release.

The main feature being added is the so called state recovery, or the ability to restart the various ORCA actors and retain the state about created slices. This will allow experimenters to run long-lived experiments without concerns about the interference of software updates or some other disruptive events. The recovery handles many situations, although catastrophic events may still result in the loss of slice information.

Another area of attention for us has been bare-metal node provisioning – we have strengthened the code that performs bare-metal provisioning, making it more error-proof and also added the ability to attach iSCSI storage volumes to bare-metal nodes. This capability until now has only worked for virtual machine slivers.

ORCA5 has allowed us to enable hybrid mode support in the rack switches, which in simple terms means those experimenters that care to use the OpenFlow capabilities of the switch, can do that, while the rest can use traditional VLANs, with more predictable performance guarantees.

Finally, we introduced the ability to see the boot consoles of VMs in case of failure, a feature we hope will help in debugging stubborn image creation issues.

Known issues:

  • Attachment to mesoscale VLANs
    • Won’t work properly with current NDL converter
    • Doesn’t work due to yet to be determined problems with switch hybrid configuration – packets don’t pass properly between OpenFlow and VLAN parts of the switch.
  • NDL conversion for some slice manifests may not work properly. Slices may appear disconnected. This requires an update to the NDL converter, which will be done once more racks are upgraded.

Notes from June 2014 maintenance

There are no new software features. This maintenance was meant to restore stitching to UCD rack broken by the reconfiguration of Internet2 AL2S.

  • There is a new rack at TAMU that is accessible for intra-rack slices only for now due to lack of physical connectivity to its upstream provider. Once the connectivity is there we will enable stitching to TAMU. See the wiki for rack controller information.
  • There is a new version of Flukes with minor changes. As a reminder, Flukes has been reported not to work well with Java 7 on certain platforms, specifically Mac OS. Java 6 typically solves the problems.
  • Connectivity to OSF rack currently is malfunctioning due to problems with ESnet OSCARS control software instance. We will announce when those problems have been addressed.
    • Has been fixed.
  • Stitching to UCD has been restored.



Notes from April 2014 maintenance

This note summarizes the results of maintenance across the entire ExoGENI Testbed in April 2014.

The highlights

  • Minor fixes added to BEN to better support inter-domain topologies
  • UDP performance issue addressed. See below for more details.
  • FOAM updated across the racks
  • Floodlight updated across the racks
  • Connectivity
    • most of the connectivity caveats from the previous note still apply.
    • UvA rack is currently not reachable. We suspect a problem with the configuration in SURFnet, which we will address.
      • This behavior appears to have resolved itself by 05/06 without our intervention. Please report any further problems. 

The details: UDP performance

Some of you have observed very poor performance for UDP transfers – extremely high packet losses and very low transfer rates. This issue was traced to three separate causes:

  • Poor implementation of “learning switch” functionality in the version of Floodlight OpenFlow controller we were using. It resulted in sudden losses of packets after a period of time, particularly between bare-metal nodes. To resolve this issue we upgraded Floodlight to version 0.9 and replaced the “learning switch” module in it with a better-behaved “forwarding” module.
  • Insufficient configuration of the QEMU interface to the guest VM, which resulted in very high packet losses. We updated the Quantum agent to support the proper options.
  • Sensitivity of UDP transfers to host- and guest-side transmit and receive buffers. We tuned the host-side buffers on worker nodes, however the tuning guest-side must be accomplished by the experimenter.

To explain further how to get the best performance out of UDP on ExoGENI we will publish a separate blog entry in the immediate future.

Notes from Mar 2014 maintenance

This note summarizes the results of maintenance on XO-BBN, XO-RCI, XO-FIU, XO-UFL, XO-SL racks and ExoSM controller.

The highlights

The purpose of the maintenance event was to reconfigure several of the racks to allow for wider range of vlans to be usable for stitching.

  • New rack XO-UCD at UC Davis was added, however it is not fully reachable due to unresolved connectivity in CENIC.
  • The rack at Starlight (XO-SL) has been reconfigured to use the currently available narrow range of vlans and added support for GEC19 SDX demo via vlan 1655.
  • Support for SONIC cards was added to XO-RCI and XO-UCD
  • A controller bug was resolved in ExoSM

Connectivity caveats

Not all racks currently visible through ExoSM can be reached using stitching. We expect these issues should be resolved in the immediate future:

  • XO-UCD connectivity via CENIC across all vlans
    • Resolved on 03/07/2014
  • XO-SL connectivity across all vlans
    • Resolved on 03/12/2014
  • XO-NICTA continues to have problems with vlans 4003 and 4005
  • XO-BBN has a problem with vlan 2601 in NOX/BBN network
    • Resolved on 04/05/2014
  • Direct connectivity between XO-UFL and XO-FIU across all vlans
    • Resolved on 05/20/14


Notes from Feb 2014 maintenance

This post is intended to describe changes in topology and behavior after shifting BBN (xo-bbn) , UH (xo-uh) and UFL (xo-ufl) racks from NLR to Internet2 AL2S as well as a number of software fixes.

The Highlights

  • Please visit the updated ExoGENI topology diagram to see how racks are connected to each other: https://wiki.exogeni.net/doku.php?id=public:experimenters:topology
  • Added initial poorly tested support for ‘speaks-for’ GENI credentials in GENI AM API wrapper deployed in ExoSM only for now.
  • Point-to-point and inter-rack multi-point stitching continues to be supported
    • A number of bug-fixes to improve the stability, see some caveats in the details below
    • Inter-rack multi-point stitching only available via ORCA native API/Flukes tool
  • An updated version of Flukes. Please see the Release Notes in Flukes for more information
    • Optional support for GENI Portal/Slice Authority – registers your slices with the GENI Portal
    • Support for the coloring extension (see details below)
  • An updated version of NDL-to-RSpec converter which includes the following fixes
    • Links now have proper properties (i.e. bandwidth) in manifests
    • Bug in inter-domain manifests with duplicate interface names fixed
    • Per interface VLAN ranges should be properly advertised in stitching extension RSpec
    • Support for coloring extension RSpec (see details below) introduced
  • Two new racks will be visible in ExoSM advertisements and in Flukes: XO-OSF and XO-SL.
    • OSF, located at Oakland Scientific Facility, Oakland, CA (xo-osf)
    • SL, located at Northwestern University, Chicago, IL (xo-sl)
  • Inter-rack connectivity has the following important caveats:
    • Currently it is not possible to stitch UFL and FIU directly to each other due to limitations of AL2S service. It is possible to have them as two branches of a multi-point connection. We are working on the solution to the point-to-point issue.
    • Connectivity to NICTA is experiencing problems due to what we think are misconfigured VLANs in TransPacWave. If your slice gets tags 4003 and 4005 going to NICTA, connectivity is not assured. Simply try to create a new slice, leaving the broken slice in place. Then delete the broken slice.
    • Connectivity to SL has not been properly plumbed in places, so does not work for the moment.
      • This issue has been resolved as of 03/12/14
    • Connectivity to UFL appears to be broken through FLR across all available VLANs. We are working to resolve this issue.
      • This issue has been resolved as of 02/20/2014

The details – Multi-point topology embedding and templates

When using post-boot script templates with multi-point connections, the following rule needs to be observed:

  • When embedding an intra-rack topology (slice local to a single rack) with a broadcast link, to get to the IP address of the node on the broadcast link use the  link name, e.g. “VLAN0”
  • When embedding an inter-rack topology (slice across multiple racks) with a broadcast link, to get to the IP address of the node on the broadcast link use the link name concatenated with the node name, e.g. “Node0-VLAN0”

This is a temporary limitation that will be removed in the near future.

Additionally, there are the following limitations to the topology embedding engine

  • it does not properly deal with slices that combine inter-rack multi-point connections with inter-rack point-to-point connections going across BEN (to xo-rci, for example).
  • it does not properly deal with slices that have two stitch ports on the same port URL, but different VLAN tags in the same slice

We expect to be able to remedy these soon, for now please avoid such requests.

The details – Application Coloring ontology and coloring RSpec Extension

This ontology was designed to allow attaching general application-specific attributes to slivers (nodes and links) and create labelled directed dependencies between them.
These are NOT read by the control framework, but, rather, transparently passed through from request to manifest and allow application-level annotation of the request. It is important to understand that the processing of the elements of this schema is left to the application creating requests and processing resulting  manifests.

This ontology (see https://geni-orca.renci.org/trac/browser/orca/trunk/ndl/src/main/resources/orca/ndl/schema/app-color.owl) is modeled after property graphs with multiple colors (or labels) associated with each node, link and color dependency. Each color or color dependency can have multiple properties associated with it, as may be needed by the applications running in the slice:

– any number of key-value pairs
– a blob of text
– a blob of XML

The new version of Flukes supports adding color-labeled properties to nodes and links and the creation of colored dependencies between elements of the slice, also with properties.

There is a matching RSpec coloring extension schema defined here: http://www.geni.net/resources/rspec/ext/color/2/color.xsd

The initial application of this extension is to allow GEMINI and GIMI to specify measurement roles of the slivers in the slice in RSpec. However, it was designed to be general to allow specifying other relationships and attributes without additional special-case effort for the aggregates to support them.


US Ignite recognizes researchers from NC State and RENCI for innovative app for monitoring power grids

Researchers from NCSU FREEDM center and RENCI took home an award  for best application in the energy and sustainability sector at a US Ignite Application Summit.

The demonstration involved an ExoGENI hardware-in-the loop slice that included laboratory infrastructure using multiple PMUs integrated with a Real-time Digital Simulator (RTDS),which are housed at the FREEDM Systems Center, dynamically linked to ExoGENI compute resources using BEN experimental network.

For more information visit RENCI website.