Using ExoGENI for training in reproducible synthesis research

Author: Jeffrey L. Tilson and Jonathan Mills.

Context: Open Science for Synthesis is unique bi-coastal training offered for early career scientists who want to learn new software and technology skills needed for open, collaborative, and reproducible synthesis research. UC Santa Barbara’ National Center for Ecological Analysis and Synthesis (NCEAS) and University of North Carolina’s Renaissance Computing Institute (RENCI) co-lead this three-week intensive training workshop with participants in both Santa Barbara, CA and Chapel Hill, NC from July 21 – August 8, 2014. The training was sponsored by the Institute for Sustainable Earth and Environmental Software (ISEES) and the Water Science Software Institute(WSSI), both of which are conceptualizing an institute for sustainable scientific software.

The participants were initially clustered into research groups based, in part, upon mutual interests. Then in conjunction with their research activities, daily bi-coastal sessions were started to develop expertise in sustainable software practices in the technical aspects that underlie successful open science and synthesis – from data discovery and integration to analysis and visualization, and special techniques for collaborative scientific research as applied to the team-projects. The specific projects are described at https://github.com/NCEAS/training/wiki/OSS-2014-Synthesis-Projects.

Specifics of ExoGENI: In support of the research teams, ExoGENI provisioned a total of three slices, where a slice is defined as one or more compute resources (virtual machines or bare metal nodes) that are interconnected via a dedicated private network.  The largest slice contained four virtual machines (VM), with each VM having 75 GB of disk space, 4 cpus, and 12GB of RAM.  A second slice, using two of the same sized VMs as the first, additionally had a 1 TB storage volume mounted via iSCSI onto each host.  The last slice utilized two bare metal nodes, each with 20 CPU cores and 96GB of RAM, and had R installed for statistical programming.  These slices were allocated throughout the duration of the conference. Access by workshop participants was provided via ssh keys. Workshop staff were provided additional keys for root access.

Lessons learned: The ExoGENI provided resources were easy to assemble and make available to the research teams. Each team provided their best guess regarding memory, disk, and computation needs which resulted in three different classes of ExoGENI resources.

The ExoGENI resources that were initiated for participants were all Linux oriented. Moving forward, alternative operating systems should be considered perhaps by getting research group feedback at the start of the workshop.

Comments are closed.