Using ExoGENI slice modify/dynamic slice capabilities

This short post demonstrates in video form how to use the Flukes GUI to drive ExoGENI slice modify capabilities.

There are a few items that are worth remembering when using ExoGENI slice modify

  • Initial slice doesn’t need to be bound, however for all follow on modify operations you must bind slice elements explicitly
  • Slice modify is a ‘batch’ operation – you can accumulate some number of modifications and then click ‘Submit Changes’ to realize them. For everyone’s sanity it is best to keep the number of changes relatively small in each step.
    • Corollary: Simply clicking ‘Delete’ on node or link does not immediately remove it. You still have to click ‘Submit Changes’ to make it happen.
  • Modifying multi-point inter-domain links is not allowed. You can create a new multi-point link or delete existing one, but you cannot change the degree of an existing link for the time being
  • It is not possible to link a new inter-domain path to an existing broadcast link
  • When deleting paths across multiple domains please delete not just the ION/AL2S crossconnect, but the two neighboring crossconnects belonging to each rack, as shown in this figure:

modify-delete-1

ExoGENI 40Gbps TCP throughput testing

Image

Background – WAN approach

The initial effort to perform 40G TCP throughput testing included two network nodes/endpoints with one located at the StarLight facility in Chicago and the other at the Open Science Facility at NERSC in Oakland. Unfortunately, the results are not consistently repeatable. At times, a 27.5Gbps TCP stream was achieved using Iperf3 for traffic generation. It’s still a work in progress, as more troubleshooting needs to occur to make it available as a service.

LAN Implementation

In order to demonstrate that the ExoGENI nodes can support the required throughput, we restricted the size of the testbed to include two servers within the same rack connected by a top of rack switch. The following diagram (Fig. 1) illustrates the 40GbE connectivity.

40GbE LAN implementation

Figure 1. 40GbE LAN implementation

Server description:
IBM System x3650 M4
Dual Intel Xeon Processor E5-2650 8C 2.0GHz 20MB Cache 1600MHz 95W
(Quantity 8) 8GB (1x8GB, 2Rx4, 1.5V) PC3-12800 CL11 ECC DDR3 1600MHz LP RDIMM
Mellanox ConnectX-3 EN Dual-port QSFP+ 40GbE Adapter

Top of rack switch:
IBM Networking Operating System RackSwitch G8264

Using Flukes (Fig. 2) to provision the directly connected bare metal servers, each server was provided the following post-boot script to 40G tune the Centos 6.5 kernel and prepare the servers for testing which includes loading the correct driver for the 40G NIC. A thorough description of the tuning process is provided at ESnet’s Fasterdata Knowledge Base – http://fasterdata.es.net/host-tuning/linux/ . Another important thing to know is that the G8264’s upstream port from the receiving server needs to have flow control enabled in the receive direction in order to get pause frames that are generated from the receiving server. It’s to prevent the switch from sending faster than the server can receive the incoming frames.

Flukes 40GbE post-boot script

Figure 2. Flukes generated 40GbE post-boot configuration

Bare Metal postboot script:


#!/bin/bash
#driver: mlx4_en
#version: 2.1.11 (Oct 26 2015)
#firmware-version: 2.31.5050
#fetch and install the 40G driver
mount -o remount,size=100M tmpfs /tmp/
cd /opt
wget geni-images.renci.org/images/ckh/mlnx-en-2.1-1.0.0.tgz
tar xzvf mlnx-en-2.1-1.0.0.tgz
cd mlnx-en-2.1-1.0.0
echo y | ./install.sh
#remove existing driver
modprobe -r mlx4_en
#load new driver
modprobe mlx4_en
#set kernel parameters
sysctl -w net.core.rmem_max=536870912
sysctl -w net.core.wmem_max=536870912
sysctl -w net.ipv4.tcp_rmem="4096 87380 268435456"
sysctl -w net.ipv4.tcp_wmem="4096 65536 268435456"
sysctl -w net.ipv4.tcp_congestion_control=htcp
sysctl -w net.ipv4.tcp_mtu_probing=1
sysctl -w net.ipv4.tcp_timestamps=1
sysctl -w net.ipv4.tcp_sack=1
sysctl -w net.ipv4.tcp_low_latency=0
sysctl -w net.core.netdev_max_backlog=250000
/sbin/ifconfig p2p1 txqueuelen 10000
/usr/sbin/ethtool -C p2p1 rx-usecs 100
/usr/sbin/set_irq_affinity_bynode.sh 1 p2p1
ifconfig p2p1 mtu 9000

Results

[root@sl-w9 proc]# iperf3 -i1 -t15 -c10.0.0.2 -w200m -A1,1
Connecting to host 10.0.0.2, port 5201
[ 4] local 10.0.0.1 port 43260 connected to 10.0.0.2 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr Condo
[ 4]   0.00-1.00   sec 4.09 GBytes 35.1 Gbits/sec   0   4.71 MBytes
[ 4]   1.00-2.00   sec 4.14 GBytes 35.6 Gbits/sec   0   4.71 MBytes
[ 4]   2.00-3.00   sec 4.15 GBytes 35.6 Gbits/sec   0   4.71 MBytes
[ 4]   3.00-4.00   sec 4.15 GBytes 35.6 Gbits/sec   0   4.71 MBytes
[ 4]   4.00-5.00   sec 4.14 GBytes 35.6 Gbits/sec   0   4.71 MBytes
[ 4]   5.00-6.00   sec 4.15 GBytes 35.6 Gbits/sec   0   4.71 MBytes
[ 4]   6.00-7.00   sec 4.14 GBytes 35.6 Gbits/sec   0   4.71 MBytes
[ 4]   7.00-8.00   sec 4.15 GBytes 35.6 Gbits/sec   0   4.71 MBytes
[ 4]   8.00-9.00   sec 4.15 GBytes 35.6 Gbits/sec   0   4.71 MBytes
[ 4]   9.00-10.00 sec 4.15 GBytes 35.6 Gbits/sec   0   4.71 MBytes
[ 4] 10.00-11.00 sec 4.15 GBytes 35.6 Gbits/sec   0   4.71 MBytes
[ 4] 11.00-12.00 sec 4.15 GBytes 35.6 Gbits/sec   0   4.71 MBytes
[ 4] 12.00-13.00 sec 4.15 GBytes 35.6 Gbits/sec   0   4.71 MBytes
[ 4] 13.00-14.00 sec 4.15 GBytes 35.6 Gbits/sec   0   4.71 MBytes
[ 4] 14.00-15.00 sec 4.15 GBytes 35.6 Gbits/sec    0   4.71 MBytes
– – – – – – – – – – – – – – – – – – – – – – – – –
[ ID] Interval           Transfer     Bandwidth       Retr
[ 4]   0.00-15.00 sec 62.1 GBytes 35.6 Gbits/sec   0             sender
[ 4]   0.00-15.00 sec 62.1 GBytes 35.6 Gbits/sec                 receiver

iperf Done.

Miscellaneous

[root@sl-w9 proc]# cat /etc/redhat-release
CentOS release 6.5 (Final)

[root@sl-w9 proc]# ethtool -i p2p1
driver: mlx4_en
version: 2.1.11 (Oct 26 2015)
firmware-version: 2.31.5050
bus-info: 0000:20:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no

 

Hadoop Tutorial

In this tutorial you will create a slice composed of three virtual machines that are a Hadoop cluster. The tutorial will lead you through creating the slice, observing the properties of the slice, and running a Hadoop example which sorts a large dataset.

After completing the tutorial you should be able to:

  1. Use ExoGENI to create a virtual distributed computational cluster.
  2. Create simple post boot scripts with simple template replacement.
  3. Use the basic functionality of a Hadoop filesystem.
  4. Observe resource utilization of compute nodes in a virtual distributed computational cluster.

Create the Request:

(Note: a completed rdf request file can be found here.   The request file can be loaded into Flukes by choosing File -> Open request)

1.  Start the Flukes application

2.  Create the Hadoop master node by selecting “Node” from the “Add Nodes” menu. Click in the field to place a VM in the request.Screen Shot 2015-09-17 at 2.19.01 PM

3.  Right-click the node and edit its properties.   Set:  (name: “master”,  node type: “XO medium”, image: “Hadoop 2.7.1 (Centos7)”, domain: “System select”).   The name must be “master”.Screen Shot 2015-09-17 at 2.30.32 PM

4. Create a group of Hadoop workers by choosing “Node Group” from the “Add Nodes” menu.  Click in the field to add the group to the request.Screen Shot 2015-09-17 at 2.31.55 PM

5.  Right-click the node group and edit its properties.   Set:  (name: “workers”,  node type: “XO medium”, image: “Hadoop 2.7.1 (Centos7)”, domain: “System select”, Number of Servers: “2”).   The name must be “workers”.Screen Shot 2015-09-17 at 2.33.23 PM

 

6.  Draw a link between the master and the workersScreen Shot 2015-09-17 at 2.34.50 PM

7.   Edit the properties of the master and workers by right-clicking and choosing “edit propoerties”.  Set the IP address of the master to 172.16.1.1/24 and the workers group to 172.16.1.100/24.  Note that each node in the workers group will be assigned the sequential  IP addresses starting with 172.16.1.100.

8.  Edit the post-boot script for the master node.   Right-click the master node and choose “Edit Properties”.  Click “PostBoot Script”.  A text box will appear.  Add the following script:

#!/bin/bash

#setup /etc/hosts
echo $master.IP("Link0") $master.Name() >> /etc/hosts
#set ( $sizeWorkerGroup = $workers.size() - 1 )
#foreach ( $j in [0..$sizeWorkerGroup] )
 echo $workers.get($j).IP("Link0") `echo $workers.get($j).Name() | sed 's/\//-/g'` >> /etc/hosts
#end

HADOOP_CONF_DIR=/home/hadoop/hadoop-2.7.1/etc/hadoop
CORE_SITE_FILE=${HADOOP_CONF_DIR}/core-site.xml
HDFS_SITE_FILE=${HADOOP_CONF_DIR}/hdfs-site.xml
MAPRED_SITE_FILE=${HADOOP_CONF_DIR}/mapred-site.xml
YARN_SITE_FILE=${HADOOP_CONF_DIR}/yarn-site.xml
SLAVES_FILE=${HADOOP_CONF_DIR}/slaves

echo "" > $CORE_SITE_FILE
cat > $CORE_SITE_FILE << EOF
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
 <name>fs.default.name</name>
 <value>hdfs://$master.Name():9000</value>
</property>
</configuration>
EOF

echo "" > $HDFS_SITE_FILE
cat > $HDFS_SITE_FILE << EOF
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
 <property>
 <name>dfs.datanode.du.reserved</name>
 <!-- cluster variant -->
 <value>20000000000</value>
 <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
 </description>
 </property>
<property>
 <name>dfs.replication</name>
 <value>2</value>
</property>
<property>
 <name>dfs.name.dir</name>
 <value>file:///home/hadoop/hadoopdata/hdfs/namenode</value>
</property>
<property>
 <name>dfs.data.dir</name>
 <value>file:///home/hadoop/hadoopdata/hdfs/datanode</value>
</property>
</configuration>
EOF

echo "" > $MAPRED_SITE_FILE
cat > $MAPRED_SITE_FILE << EOF
<configuration>
 <property>
 <name>mapreduce.framework.name</name>
 <value>yarn</value>
 </property>
</configuration>
EOF

echo "" > $YARN_SITE_FILE
cat > $YARN_SITE_FILE << EOF
<?xml version="1.0"?>
<configuration>
<!-- Site specific YARN configuration properties -->
 <property>
 <name>yarn.resourcemanager.resource-tracker.address</name>
 <value>master:8031</value>
 </property>
 <property>
 <name>yarn.resourcemanager.address</name>
 <value>master:8032</value>
 </property>
 <property>
 <name>yarn.resourcemanager.scheduler.address</name>
 <value>master:8030</value>
 </property>
 <property>
 <name>yarn.resourcemanager.admin.address</name>
 <value>master:8033</value>
 </property>
 <property>
 <name>yarn.resourcemanager.webapp.address</name>
 <value>master:8088</value>
 </property>
<property>
 <name>yarn.nodemanager.aux-services</name>
 <value>mapreduce_shuffle</value>
 </property>
</configuration>
EOF

echo "" > $SLAVES_FILE
cat > $SLAVES_FILE << EOF
#set ( $sizeWorkerGroup = $workers.size() - 1 )
#foreach ( $j in [0..$sizeWorkerGroup] )
 `echo $workers.get($j).Name() | sed 's/\//-/g'` 
#end
EOF


echo "" > /home/hadoop/.ssh/config
cat > /home/hadoop/.ssh/config << EOF
Host `echo $self.IP("Link0") | sed 's/.[0-9][0-9]*$//g'`.* master workers-* 0.0.0.0
 StrictHostKeyChecking no
 UserKnownHostsFile=/dev/null
EOF

chmod 600 /home/hadoop/.ssh/config
chown hadoop:hadoop /home/hadoop/.ssh/config

9.  Edit the post-boot script for the workers group and add the same script as the master.

10. Name your slice and submit it to ExoGENI.

11.  Wait for the resources to become ActiveScreen Shot 2015-09-17 at 2.45.44 PM

Check the status of the VMs:

1. Login to the master node.

2.  Observe the properties of the network interfaces

[root@master ~]# ifconfig 
ens3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
 inet 10.103.0.9 netmask 255.255.255.0 broadcast 10.103.0.255
 inet6 fe80::f816:3eff:fe0c:3d02 prefixlen 64 scopeid 0x20<link>
 ether fa:16:3e:0c:3d:02 txqueuelen 1000 (Ethernet)
 RX packets 3518 bytes 823670 (804.3 KiB)
 RX errors 0 dropped 0 overruns 0 frame 0
 TX packets 1704 bytes 218947 (213.8 KiB)
 TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
 inet 172.16.1.1 netmask 255.255.255.0 broadcast 172.16.1.255
 inet6 fe80::fc16:3eff:fe00:2862 prefixlen 64 scopeid 0x20<link>
 ether fe:16:3e:00:28:62 txqueuelen 1000 (Ethernet)
 RX packets 3734 bytes 689012 (672.8 KiB)
 RX errors 0 dropped 0 overruns 0 frame 0
 TX packets 1930 bytes 215880 (210.8 KiB)
 TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
 inet 127.0.0.1 netmask 255.0.0.0
 inet6 ::1 prefixlen 128 scopeid 0x10<host>
 loop txqueuelen 0 (Local Loopback)
 RX packets 251 bytes 31259 (30.5 KiB)
 RX errors 0 dropped 0 overruns 0 frame 0
 TX packets 251 bytes 31259 (30.5 KiB)
 TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

3. Observe the contents of the NEuca user data file.

[root@master ~]# neuca-user-data 
[global]
actor_id=b0e0e413-77ef-4775-b476-040ca8377d1d
slice_id=304eaa29-272d-48e2-8640-ec7300196ac9
reservation_id=c9fd992b-a67b-4cc6-8156-a062df9e8889
unit_id=e22734de-f187-41d4-b952-f07263fbfd43
;router= Not Specified
;iscsi_initiator_iqn= Not Specified
slice_name=pruth.hadoop.1
unit_url=http://geni-orca.renci.org/owl/c9d160a4-1ac0-4b9e-9a8f-8613077a6463#master
host_name=master
management_ip=128.120.83.33
physical_host=ucd-w5
nova_id=b8a37d51-bd50-40b6-89ff-6fbb87bb5fc1
[users]
root=no:ssh-dss AAAAB3NzaC1kc3MAAACBALYdIgCmJoZt+4YN7lUlDLR0ebwfsBd+d0Tw3O18JUc2bXoTzwWBGV+BkGtGljzmr1SDrRXkOWgA//pogG7B1vpizHV6K5MoFQDoSEy/64ycEago611xtMt13xuJei0pPyAphv/NrYlD1xZBMuEG9JTe8EK/H43ZhLcK4b1HwWrTAAAAFQDnZNZVDojT0aHHgJqBncy+iBHs9wAAAIBiMfYPoDgVASgknEzBschxTzTFuhof+lxBh0v5i9OsinMuRa1K5wBbA1eo63PKxywQSnODQhItme0Tn8Pp1ETpM0YkzE48K1NxW3l9iBipSRDMEh8aUlfX5R7xfRRY7tUNXlQQAzXYX8ZvXoA+mbZ9BkBXtSNI5uD1z3Gk5k/WQwAAAIBVIVuHJVgSiCw/m8yjCVH1QgO045ACf4l9/3HaoDwFaNrL1WKQvplhz/DVqtWq/2ZAIrwXr/0IgviRRZ/iVpul3s15ecTJzHAhHMaaDn4vuphH6xbs6JHLFyvBQGJy1euoY9BPqtTFZnH7KdWoChCQXfujDrtcx/5MfBn4tO5kQQ== pruth@dhcp152-54-9-28.europa.renci.org:
pruth=yes:ssh-dss AAAAB3NzaC1kc3MAAACBALYdIgCmJoZt+4YN7lUlDLR0ebwfsBd+d0Tw3O18JUc2bXoTzwWBGV+BkGtGljzmr1SDrRXkOWgA//pogG7B1vpizHV6K5MoFQDoSEy/64ycEago611xtMt13xuJei0pPyAphv/NrYlD1xZBMuEG9JTe8EK/H43ZhLcK4b1HwWrTAAAAFQDnZNZVDojT0aHHgJqBncy+iBHs9wAAAIBiMfYPoDgVASgknEzBschxTzTFuhof+lxBh0v5i9OsinMuRa1K5wBbA1eo63PKxywQSnODQhItme0Tn8Pp1ETpM0YkzE48K1NxW3l9iBipSRDMEh8aUlfX5R7xfRRY7tUNXlQQAzXYX8ZvXoA+mbZ9BkBXtSNI5uD1z3Gk5k/WQwAAAIBVIVuHJVgSiCw/m8yjCVH1QgO045ACf4l9/3HaoDwFaNrL1WKQvplhz/DVqtWq/2ZAIrwXr/0IgviRRZ/iVpul3s15ecTJzHAhHMaaDn4vuphH6xbs6JHLFyvBQGJy1euoY9BPqtTFZnH7KdWoChCQXfujDrtcx/5MfBn4tO5kQQ== pruth@dhcp152-54-9-28.europa.renci.org:
[interfaces]
fe163e002862=up:ipv4:172.16.1.1/24
[storage]
[routes]
[scripts]bootscript=#!/bin/bash
 #setup /etc/hosts
 echo 172.16.1.1 master >> /etc/hosts
 echo 172.16.1.100 `echo workers/0 | sed 's/\//-/g'` >> /etc/hosts
 echo 172.16.1.101 `echo workers/1 | sed 's/\//-/g'` >> /etc/hosts
 HADOOP_CONF_DIR=/home/hadoop/hadoop-2.7.1/etc/hadoop
 CORE_SITE_FILE=${HADOOP_CONF_DIR}/core-site.xml
 HDFS_SITE_FILE=${HADOOP_CONF_DIR}/hdfs-site.xml
 MAPRED_SITE_FILE=${HADOOP_CONF_DIR}/mapred-site.xml
 YARN_SITE_FILE=${HADOOP_CONF_DIR}/yarn-site.xml
 SLAVES_FILE=${HADOOP_CONF_DIR}/slaves
 echo "" > $CORE_SITE_FILE
 cat > $CORE_SITE_FILE << EOF
 <?xml version="1.0" encoding="UTF-8"?>
 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 <configuration>
 <property>
 <name>fs.default.name</name>
 <value>hdfs://master:9000</value>
 </property>
 </configuration>
 EOF
 echo "" > $HDFS_SITE_FILE
 cat > $HDFS_SITE_FILE << EOF
 <?xml version="1.0" encoding="UTF-8"?>
 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 <configuration>
 <property>
 <name>dfs.datanode.du.reserved</name>
 <!-- cluster variant -->
 <value>20000000000</value>
 <description>Reserved space in bytes per volume. Always leave this much space free for non dfs use.
 </description>
 </property>
 <property>
 <name>dfs.replication</name>
 <value>2</value>
 </property>
 <property>
 <name>dfs.name.dir</name>
 <value>file:///home/hadoop/hadoopdata/hdfs/namenode</value>
 </property>
 <property>
 <name>dfs.data.dir</name>
 <value>file:///home/hadoop/hadoopdata/hdfs/datanode</value>
 </property>
 </configuration>
 EOF
 echo "" > $MAPRED_SITE_FILE
 cat > $MAPRED_SITE_FILE << EOF
 <configuration>
 <property>
 <name>mapreduce.framework.name</name>
 <value>yarn</value>
 </property>
 </configuration>
 EOF
 echo "" > $YARN_SITE_FILE
 cat > $YARN_SITE_FILE << EOF
 <?xml version="1.0"?>
 <configuration>
 <!-- Site specific YARN configuration properties -->
 <property>
 <name>yarn.resourcemanager.resource-tracker.address</name>
 <value>master:8031</value>
 </property>
 <property>
 <name>yarn.resourcemanager.address</name>
 <value>master:8032</value>
 </property>
 <property>
 <name>yarn.resourcemanager.scheduler.address</name>
 <value>master:8030</value>
 </property>
 <property>
 <name>yarn.resourcemanager.admin.address</name>
 <value>master:8033</value>
 </property>
 <property>
 <name>yarn.resourcemanager.webapp.address</name>
 <value>master:8088</value>
 </property>
 <property>
 <name>yarn.nodemanager.aux-services</name>
 <value>mapreduce_shuffle</value>
 </property>
 </configuration>
 EOF
 echo "" > $SLAVES_FILE
 cat > $SLAVES_FILE << EOF
 `echo workers/0 | sed 's/\//-/g'` 
 `echo workers/1 | sed 's/\//-/g'` 
 EOF
 echo "" > /home/hadoop/.ssh/config
 cat > /home/hadoop/.ssh/config << EOF
 Host `echo 172.16.1.1 | sed 's/.[0-9][0-9]*$//g'`.* master workers-* 0.0.0.0
 StrictHostKeyChecking no
 UserKnownHostsFile=/dev/null
 EOF
 chmod 600 /home/hadoop/.ssh/config
chown hadoop:hadoop /home/hadoop/.ssh/config

4. Test for connectivity between the VMs.

[root@master ~]# ping workers-0
PING workers-0 (172.16.1.100) 56(84) bytes of data.
64 bytes from workers-0 (172.16.1.100): icmp_seq=1 ttl=64 time=1.21 ms
64 bytes from workers-0 (172.16.1.100): icmp_seq=2 ttl=64 time=0.967 ms
64 bytes from workers-0 (172.16.1.100): icmp_seq=3 ttl=64 time=1.03 ms
64 bytes from workers-0 (172.16.1.100): icmp_seq=4 ttl=64 time=1.06 ms
^C
--- workers-0 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3003ms
rtt min/avg/max/mdev = 0.967/1.069/1.212/0.089 ms
[root@master ~]# ping workers-1
PING workers-1 (172.16.1.101) 56(84) bytes of data.
64 bytes from workers-1 (172.16.1.101): icmp_seq=1 ttl=64 time=1.35 ms
64 bytes from workers-1 (172.16.1.101): icmp_seq=2 ttl=64 time=1.06 ms
64 bytes from workers-1 (172.16.1.101): icmp_seq=3 ttl=64 time=0.922 ms
64 bytes from workers-1 (172.16.1.101): icmp_seq=4 ttl=64 time=1.00 ms
^C
--- workers-1 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3003ms
rtt min/avg/max/mdev = 0.922/1.083/1.351/0.163 ms

Run the Hadoop HDFS:

1.  Login to the master node as root

2.  Switch to the hadoop user

[root@master ~]# su hadoop -
[hadoop@master root]$

3.  Change to hadoop user’s home directory

[hadoop@master root]$ cd ~

4.  Format the HDFS file system

[hadoop@master ~]$ hdfs namenode -format
15/09/17 18:50:22 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = master/172.16.1.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.7.1
...

...
15/09/17 18:50:24 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
15/09/17 18:50:24 INFO util.ExitUtil: Exiting with status 0
15/09/17 18:50:24 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master/172.16.1.1
************************************************************/

5.  Start the HDFS dfs service

[hadoop@master ~]$ start-dfs.sh
Starting namenodes on [master]
master: Warning: Permanently added 'master,172.16.1.1' (ECDSA) to the list of known hosts.
master: starting namenode, logging to /home/hadoop/hadoop-2.7.1/logs/hadoop-hadoop-namenode-master.out
workers-1: Warning: Permanently added 'workers-1,172.16.1.101' (ECDSA) to the list of known hosts.
workers-0: Warning: Permanently added 'workers-0,172.16.1.100' (ECDSA) to the list of known hosts.
workers-1: starting datanode, logging to /home/hadoop/hadoop-2.7.1/logs/hadoop-hadoop-datanode-workers-1.out
workers-0: starting datanode, logging to /home/hadoop/hadoop-2.7.1/logs/hadoop-hadoop-datanode-workers-0.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.
0.0.0.0: starting secondarynamenode, logging to /home/hadoop/hadoop-2.7.1/logs/hadoop-hadoop-secondarynamenode-master.out

6.  Start the yarn service

[hadoop@master ~]$ start-yarn.sh 
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/hadoop-2.7.1/logs/yarn-hadoop-resourcemanager-master.out
workers-1: Warning: Permanently added 'workers-1,172.16.1.101' (ECDSA) to the list of known hosts.
workers-0: Warning: Permanently added 'workers-0,172.16.1.100' (ECDSA) to the list of known hosts.
workers-1: starting nodemanager, logging to /home/hadoop/hadoop-2.7.1/logs/yarn-hadoop-nodemanager-workers-1.out
workers-0: starting nodemanager, logging to /home/hadoop/hadoop-2.7.1/logs/yarn-hadoop-nodemanager-workers-0.out

7.   Test to see if the HDFS has started

[hadoop@master ~]$ hdfs dfsadmin -report
Configured Capacity: 14824083456 (13.81 GB)
Present Capacity: 8522616832 (7.94 GB)
DFS Remaining: 8522567680 (7.94 GB)
DFS Used: 49152 (48 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0

-------------------------------------------------
Live datanodes (2):

Name: 172.16.1.101:50010 (workers-1)
Hostname: workers-1
Decommission Status : Normal 
Configured Capacity: 7412041728 (6.90 GB) 
DFS Used: 24576 (24 KB) 
Non DFS Used: 3150721024 (2.93 GB)
DFS Remaining: 4261296128 (3.97 GB)
DFS Used%: 0.00%
DFS Remaining%: 57.49%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu Sep 17 18:55:41 UTC 2015

Name: 172.16.1.100:50010 (workers-0)
Hostname: workers-0 
Decommission Status : NormalsA
Configured Capacity: 7412041728 (6.90 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 3150745600 (2.93 GB)
DFS Remaining: 4261271552 (3.97 GB)
DFS Used%: 0.00%
DFS Remaining%: 57.49%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu Sep 17 18:55:41 UTC 2015

 Simple HDFS tests:

1.  Create a small test file

[hadoop@master ~]$ echo Hello ExoGENI World > hello.txt

2. Put the test file into the Hadoop filesystem

[hadoop@master ~]$ hdfs dfs -put hello.txt /hello.txt

3. Check for the file’s existence

[hadoop@master ~]$ hdfs dfs -ls /
Found 1 items
-rw-r--r-- 2 hadoop supergroup 20 2015-09-17 19:11 /hello.txt

4. Check the contents of the file

[hadoop@master ~]$ hdfs dfs -cat /hello.txt
Hello ExoGENI World

Run the Hadoop Sort Testcase

Test the true power of the Hadoop filesystem by creating and sorting a large random dataset. It may be useful/interesting to login to the master and/or worker VMs and use tools like top, iotop, and iftop to observe the resource utilization on each of the VMs during the sort test. Note: on these VMs iotop and iftop must be run as root.

1.  Create a 1 GB random data set:  After the data is created, use the ls functionally to confirm the data exists. Note that the data is composed of several files in a directory.

[hadoop@master ~]$ hadoop jar /home/hadoop/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar teragen 10000000 /input
15/09/17 19:15:41 INFO client.RMProxy: Connecting to ResourceManager at master/172.16.1.1:8032
15/09/17 19:15:42 INFO terasort.TeraSort: Generating 10000000 using 2
15/09/17 19:15:42 INFO mapreduce.JobSubmitter: number of splits:2
15/09/17 19:15:42 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1442515994234_0001
15/09/17 19:15:43 INFO impl.YarnClientImpl: Submitted application application_1442515994234_0001
15/09/17 19:15:43 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1442515994234_0001/
15/09/17 19:15:43 INFO mapreduce.Job: Running job: job_1442515994234_0001
15/09/17 19:15:53 INFO mapreduce.Job: Job job_1442515994234_0001 running in uber mode : false
15/09/17 19:15:53 INFO mapreduce.Job: map 0% reduce 0%
15/09/17 19:16:05 INFO mapreduce.Job: map 1% reduce 0%
15/09/17 19:16:07 INFO mapreduce.Job: map 2% reduce 0%
15/09/17 19:16:08 INFO mapreduce.Job: map 3% reduce 0%
15/09/17 19:16:13 INFO mapreduce.Job: map 4% reduce 0%
15/09/17 19:16:17 INFO mapreduce.Job: map 5% reduce 0%
15/09/17 19:16:22 INFO mapreduce.Job: map 6% reduce 0%
...
15/09/17 19:23:10 INFO mapreduce.Job: map 96% reduce 0%
15/09/17 19:23:15 INFO mapreduce.Job: map 97% reduce 0%
15/09/17 19:23:19 INFO mapreduce.Job: map 98% reduce 0%
15/09/17 19:23:24 INFO mapreduce.Job: map 99% reduce 0%
15/09/17 19:23:28 INFO mapreduce.Job: map 100% reduce 0%
15/09/17 19:23:35 INFO mapreduce.Job: Job job_1442515994234_0001 completed successfully
15/09/17 19:23:35 INFO mapreduce.Job: Counters: 31
 File System Counters
 FILE: Number of bytes read=0
 FILE: Number of bytes written=229878
 FILE: Number of read operations=0
 FILE: Number of large read operations=0
 FILE: Number of write operations=0
 HDFS: Number of bytes read=167
 HDFS: Number of bytes written=1000000000
 HDFS: Number of read operations=8
 HDFS: Number of large read operations=0
 HDFS: Number of write operations=4
 Job Counters 
 Launched map tasks=2
 Other local map tasks=2
 Total time spent by all maps in occupied slots (ms)=916738
 Total time spent by all reduces in occupied slots (ms)=0
 Total time spent by all map tasks (ms)=916738
 Total vcore-seconds taken by all map tasks=916738
 Total megabyte-seconds taken by all map tasks=938739712
 Map-Reduce Framework
 Map input records=10000000
 Map output records=10000000
 Input split bytes=167
 Spilled Records=0
 Failed Shuffles=0
 Merged Map outputs=0
 GC time elapsed (ms)=1723
 CPU time spent (ms)=21600
 Physical memory (bytes) snapshot=292618240
 Virtual memory (bytes) snapshot=4156821504
 Total committed heap usage (bytes)=97517568
 org.apache.hadoop.examples.terasort.TeraGen$Counters
 CHECKSUM=21472776955442690
 File Input Format Counters 
 Bytes Read=0
 File Output Format Counters 
 Bytes Written=1000000000

2. Sort the data

[hadoop@master ~]$ hadoop jar /home/hadoop/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar terasort /input /output
15/09/17 19:25:25 INFO terasort.TeraSort: starting
15/09/17 19:25:27 INFO input.FileInputFormat: Total input paths to process : 2
Spent 216ms computing base-splits.
Spent 3ms computing TeraScheduler splits.
Computing input splits took 220ms
Sampling 8 splits of 8
Making 1 from 100000 sampled records
Computing parititions took 5347ms
Spent 5569ms computing partitions.
15/09/17 19:25:32 INFO client.RMProxy: Connecting to ResourceManager at master/172.16.1.1:8032
15/09/17 19:25:33 INFO mapreduce.JobSubmitter: number of splits:8
15/09/17 19:25:34 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1442515994234_0002
15/09/17 19:25:34 INFO impl.YarnClientImpl: Submitted application application_1442515994234_0002
15/09/17 19:25:34 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1442515994234_0002/
15/09/17 19:25:34 INFO mapreduce.Job: Running job: job_1442515994234_0002
15/09/17 19:25:42 INFO mapreduce.Job: Job job_1442515994234_0002 running in uber mode : false
15/09/17 19:25:42 INFO mapreduce.Job: map 0% reduce 0%
15/09/17 19:25:57 INFO mapreduce.Job: map 11% reduce 0%
15/09/17 19:26:00 INFO mapreduce.Job: map 17% reduce 0%
15/09/17 19:26:07 INFO mapreduce.Job: map 21% reduce 0%
15/09/17 19:26:10 INFO mapreduce.Job: map 25% reduce 0%
15/09/17 19:26:13 INFO mapreduce.Job: map 34% reduce 0%
15/09/17 19:26:14 INFO mapreduce.Job: map 37% reduce 0%
15/09/17 19:26:15 INFO mapreduce.Job: map 41% reduce 0%
15/09/17 19:26:16 INFO mapreduce.Job: map 44% reduce 0%
15/09/17 19:26:17 INFO mapreduce.Job: map 57% reduce 0%
15/09/17 19:26:18 INFO mapreduce.Job: map 58% reduce 0%
15/09/17 19:26:20 INFO mapreduce.Job: map 62% reduce 0%
15/09/17 19:26:22 INFO mapreduce.Job: map 62% reduce 8%
15/09/17 19:26:26 INFO mapreduce.Job: map 66% reduce 8%
15/09/17 19:26:27 INFO mapreduce.Job: map 67% reduce 8%
15/09/17 19:26:28 INFO mapreduce.Job: map 67% reduce 13%
15/09/17 19:26:35 INFO mapreduce.Job: map 72% reduce 13%
15/09/17 19:26:37 INFO mapreduce.Job: map 73% reduce 13%
15/09/17 19:26:38 INFO mapreduce.Job: map 79% reduce 13%
15/09/17 19:26:42 INFO mapreduce.Job: map 84% reduce 13%
15/09/17 19:26:44 INFO mapreduce.Job: map 84% reduce 17%
15/09/17 19:26:45 INFO mapreduce.Job: map 86% reduce 17%
15/09/17 19:26:48 INFO mapreduce.Job: map 89% reduce 17%
15/09/17 19:26:51 INFO mapreduce.Job: map 94% reduce 17%
15/09/17 19:26:53 INFO mapreduce.Job: map 96% reduce 17%
15/09/17 19:26:54 INFO mapreduce.Job: map 100% reduce 17%
...
15/09/17 19:34:21 INFO mapreduce.Job: map 100% reduce 87%
15/09/17 19:34:24 INFO mapreduce.Job: map 100% reduce 93%
15/09/17 19:34:27 INFO mapreduce.Job: map 100% reduce 99%
15/09/17 19:34:28 INFO mapreduce.Job: map 100% reduce 100%
15/09/17 19:34:28 INFO mapreduce.Job: Job job_1442515994234_0002 completed successfully
15/09/17 19:34:29 INFO mapreduce.Job: Counters: 51
 File System Counters
 FILE: Number of bytes read=2080000144
 FILE: Number of bytes written=3121046999
 FILE: Number of read operations=0
 FILE: Number of large read operations=0
 FILE: Number of write operations=0
 HDFS: Number of bytes read=1000000816
 HDFS: Number of bytes written=1000000000
 HDFS: Number of read operations=27
 HDFS: Number of large read operations=0
 HDFS: Number of write operations=2
 Job Counters 
 Killed map tasks=3
 Launched map tasks=11
 Launched reduce tasks=1
 Data-local map tasks=9
 Rack-local map tasks=2
 Total time spent by all maps in occupied slots (ms)=455604
 Total time spent by all reduces in occupied slots (ms)=495530
 Total time spent by all map tasks (ms)=455604
 Total time spent by all reduce tasks (ms)=495530
 Total vcore-seconds taken by all map tasks=455604
 Total vcore-seconds taken by all reduce tasks=495530
 Total megabyte-seconds taken by all map tasks=466538496
 Total megabyte-seconds taken by all reduce tasks=507422720
 Map-Reduce Framework
 Map input records=10000000
 Map output records=10000000
 Map output bytes=1020000000
 Map output materialized bytes=1040000048
 Input split bytes=816
 Combine input records=0
 Combine output records=0
 Reduce input groups=10000000
 Reduce shuffle bytes=1040000048
 Reduce input records=10000000
 Reduce output records=10000000
 Spilled Records=30000000
 Shuffled Maps =8
 Failed Shuffles=0
 Merged Map outputs=8
 GC time elapsed (ms)=2276
 CPU time spent (ms)=86920
 Physical memory (bytes) snapshot=1792032768
 Virtual memory (bytes) snapshot=18701041664
 Total committed heap usage (bytes)=1277722624
 Shuffle Errors
 BAD_ID=0
 CONNECTION=0
 IO_ERROR=0
 WRONG_LENGTH=0
 WRONG_MAP=0
 WRONG_REDUCE=0
 File Input Format Counters 
 Bytes Read=1000000000
 File Output Format Counters 
 Bytes Written=1000000000
15/09/17 19:34:29 INFO terasort.TeraSort: done

3.  Look at the output

List the output directory:

[hadoop@master ~]$ hdfs dfs -ls /output
Found 3 items
-rw-r--r-- 1 hadoop supergroup 0 2015-09-17 19:34 /output/_SUCCESS
-rw-r--r-- 10 hadoop supergroup 0 2015-09-17 19:25 /output/_partition.lst
-rw-r--r-- 1 hadoop supergroup 1000000000 2015-09-17 19:34 /output/part-r-00000

Get the sorted output file:

[hadoop@master ~]$ hdfs dfs -get /output/part-r-00000 part-r-00000

Look at the output file:

[hadoop@master ~]$ hexdump part-r-00000 | less

Does the output match your expectations?

 

ExoGENI: Getting Started Tutorial

This post describes how to create an account on ExoGENI, install the slice creation tool Flukes, and use Flukes to create simple slices.

Background:

These instructions assume you have been given a GENI account OR have an account with an institution that participates in the InCommon Federation.  Most colleges and universities do are in InCommon, check your university here.

Creating a GENI account:

  1. In a browser, go to the GENI portal (https://portal.geni.net/)Screen Shot 2015-09-17 at 12.13.41 PM
  2. Click the “Use GENI” buttonScreen Shot 2015-09-17 at 12.14.10 PM
  3. Under “Show a list of organizations” find your organization.   If you have been assigned a temporary account for an organized tutorial, choose “GENI Project Office”Screen Shot 2015-09-17 at 12.17.16 PM
  4. You will be redirected to your organization’s sign-on page (example shows UNC Chapel Hill’s Onyen system).  Login with the username and password you typically use to access services at your organization.  You will now be logged into the GENI Portal.Screen Shot 2015-09-17 at 12.18.04 PM
  5. Click your name in the upper right corner and choose “Profile” from the dropdown menu.Screen Shot 2015-09-17 at 12.27.43 PM
  6. If you are not part of any projects, you should request to join a project given to you by project PI or class instructor (there is a ‘Join Project’ button in the profile). Wait until you are approved.
  7. Click “SSL” in the grey bar.Screen Shot 2015-09-17 at 12.30.30 PM
  8. Click “Create an SSL certificate”Screen Shot 2015-09-17 at 12.33.46 PM
  9. Click “Generate Combined Certificate and Key File”Screen Shot 2015-09-17 at 12.35.01 PM
  10. Download your certificate and key. This will create a pem file that is named “geni-<your_login>.pem”. Save it to a  known location on your machine (suggestion:  ~/.ssl/geni-<your_login>.pem).    Screen Shot 2015-09-17 at 12.35.53 PM

At this point you can use your pem file to access ExoGENI.   However, if you would like to use other GENI resources you will need to join a GENI project.

Install/Configure Flukes:

1. Install Java on your system (if it is not there already). Note that we do not recommend using OpenJDK. Instead you should use Oracle JDK 7 or 8 for your platform.

2. Download the Flukes jnlp file:

wget http://geni-images.renci.org/webstart/flukes.jnlp

3. Edit (create) a ~/.flukes.properties file.  Modify below to point to your GENI pem file and the ssh keys that you wish to use.

orca.xmlrpc.url=https://geni.renci.org:11443/orca/xmlrpc
user.certfile=/Users/pruth/.ssl/geni-pruth.pem
user.certkeyfile=/Users/pruth/.ssl/geni-pruth.pem
enable.modify=true
ssh.key=~/.ssh/id_dsa
# SSH Public key to install into VM instances
ssh.pubkey=~/.ssh/id_dsa.pub
# Secondary login (works with ssh.other.pubkey)
ssh.other.login=pruth
# Secondary public SSH keys 
ssh.other.pubkey=~/.ssh/id_dsa.pub
# Should the secondary account have sudo privileges
ssh.other.sudo=yes
# Path to XTerm executable on your system
xterm.path=/opt/X11/bin/xterm

4. Run the the jnlp by double clicking the file or by using the javaws application on the command line.

javaws /path/to/the/file/flukes.jnlp

At this point the Flukes GUI should be running.  You may need to create security exceptions on your system to allow it to run.   In addition, there may be warnings about certificate authorities.  Usually clicking “run” or “continue” in the popup windows will work).  If Flukes is not running correctly see https://geni-orca.renci.org/trac/wiki/flukes for more information.

Simple Slice: 

  1. Start Flukes as described aboveScreen Shot 2015-09-18 at 12.58.45 PM
  2. Click in the field to place a VM in the request.Screen Shot 2015-09-17 at 1.16.15 PM
  3. Right click the new node and choose “Edit properties” from the menu.   If the application has not yet contacted the ExoGENI server it may ask for the password for your GENI key.   If you don’t know what this is it is probably an empty password.Screen Shot 2015-09-17 at 1.20.56 PM
  4. Choose a node type:  (suggestion:  XO Medium)
  5. Choose an image:  (suggestion:  Centos 6.3 v1.0.10)
  6. Choose a domain:  (suggestion:  Leave as “System select)Screen Shot 2015-09-17 at 1.25.49 PM
  7. Press “Ok”
  8. Name your slice by putting a short string in the text box next to the submit button:  (suggestion: <yourlogin>.test1Screen Shot 2015-09-17 at 1.27.35 PM
  9. Press the Submit button. After a few seconds a window will popup showing the resources you have been assigned (Note: this will take longer if you are participating in a large tutorial).   In the example, there in one VM that will be created at Pittsburg Supercomputing Center (PSC).Screen Shot 2015-09-17 at 1.29.48 PM
  10. Click “ok”
  11. Switch to the “Manifest View” by clicking “Manifest View”Screen Shot 2015-09-17 at 1.31.58 PM
  12. Click “My Slices”. A window will popup listing your new slice.Screen Shot 2015-09-17 at 1.33.33 PM
  13. Select the slice and click “Query”.  A window will popup listing the status of your requested resources.  The status could be:  Ticketed, Active, or Failed.  Ticketed resources are not yet ready.  Active resources are ready to use.   Failed resources experienced a problem that prevented it from becoming Active. Screen Shot 2015-09-17 at 1.36.58 PM
  14. If your resources are Ticketed then click “ok”, wait a couple of minutes, then query for the slice again.  Repeat until the resources are Active.
  15. If your xterm or putty configuration is correct you can login to the VM by right clicking the VM and choosing “Login to Node”.   Alternatively,  you can right-click the node and choose “View Properties” to see the public IP address of the VM.  You can then ssh to the VM using that address and the private key that corresponds to the public key you referenced in your .flukes.properties file.Screen Shot 2015-09-17 at 1.41.31 PM

Dumbell Slice

  1. Start Flukes
  2. Click the field twice to insert two VMs into the request.Screen Shot 2015-09-17 at 1.43.43 PM
  3. Click and drag a line from one node to the other.  This creates a point-to-point Ethernet between the nodes.Screen Shot 2015-09-17 at 1.44.45 PM
  4. Edit the properties of each node.  Set:  (node type: “XO medium”,  image: “Centos 6.3 v1.0.10”, domain: “System select”)
  5. In addition, set the Link0 IP address of one VM to 172.16.1.1/24 and the other to 172.16.1.2/24.Screen Shot 2015-09-17 at 1.48.55 PM
  6. Name your slice (suggestion:  <login_name>.dumbell1)
  7. Submit the request
  8. Query for the manifest. Notice that there are three resources this time (2 VMs and 1 link)Screen Shot 2015-09-17 at 1.51.06 PM
  9. After all resources are Active, login to one of the VMs. Run ifconfig.   Notice that the VM has an extra interface that has been assigned the 172.16.1.x address that you specified in the request.Screen Shot 2015-09-17 at 1.52.41 PM
  10. Try pinging the other VM.Screen Shot 2015-09-17 at 1.54.10 PM

Enabling Site-Aware Scheduling for Apache Storm in ExoGENI

Overview

Apache Storm is a free and open source distributed real-time computation system, designed for processing streaming data. It allows users to easily define and execute topologies. Storm is task parallel and follows the master/slave paradigm. For more details about Storm, please see Storm’s documentation.

This post will introduce how to enable site-aware scheduling for Storm in ExoGENI in order to take advantage of worldwide distributed sites in ExoGENI. We will walk you through an example running on an multi-site ExoGENI slice to give you an insight into the approach.

Storm Basics

In Storm, users execute topologies on a Storm cluster. A topology consists of components called spouts and bolts as shown in Figure 1. A spout acts as a data source and a bolt is a computation process. Spouts and bolts are connected with built-in or custom grouping schemas. Serialized data, referred to as tuples, are transferred between spouts and bolts.

Figure 1: Storm Topology

Figure 1: Storm Topology

Storm cluster provisions computing resources for executing submitted topologies. Nimbus is the master node that accepts topologies submissions from users and dispatches tasks to the slave nodes. Supervisors are slave nodes making progress on data processing. Storm employs Apache ZooKeeper for fault recovery and coordination between Nimbus and supervisors. The supervisor’s computing resource can be partitioned into multiple Worker Slots. In each Worker Slot, Storm can spawn multiple threads, referred to as Executors, which fulfill tasks assigned by Nimbus.

Figure 2: Storm cluster

Figure 2: Storm cluster

Problem

In Storm, Nimbus uses a scheduler to assign tasks to the Supervisors. The default scheduler aims at allocating computing resources evenly to topologies. It works well in terms of fairness among topologies, but it is impossible for users to predict the placement of topology components in the Storm cluster. However, in a deployment over multiple sites, it can be very important to allow users to assign a particular topology component to a supervisor locating at a specific site. Therefore, we explore an approach to address this problem.

Pluggable Scheduler

Since 0.8.0 release, Storm allows users to plug in custom scheduler in order to apply custom scheduling policy. The custom scheduler is required to implement the IScheduler interface, which contains two methods – prepare(Map conf) and schedule(Topologies topologies, Cluster cluster). The custom scheduling policy is implemented in the schedule(Topologies topologies, Cluster cluster) method. On the other hand, Storm preserves a field in the supervisor’s configuration, named as storm.scheduler.meta, in which users can specify any scheduling metadata in “key-value” pairs to facilitate scheduling. The metadata can be acquired by calling getSchedulerMeta() method. Now let’s go through the steps in detail.

ExoGENI Slice

An ExoGENI slice can describe a number of VMs deployed over multiple geographical sites and connected with user-defined network links. It facilitates the development of site-aware scheduling mechanism for Storm and more sophisticated scheduling algorithms based on site information. In this post, we present the example on an ExoGENI slice over TAMU, UH and FIU sites. Let’s set up the slice first.

VM Image

A VM image based on CentOS 6.3 has been created for the example. The image has all dependencies for Storm deployment installed, including OpenJDK, Storm, ZooKeeper, ZeroMQ and jzmq. The image information is listed below. you need to append this information to your .flukes.properties to make the image visible in Flukes.

image1.name: Storm-SiteAware-v0.2
image1.url: http://geni-images.renci.org/images/dcvan/storm/Storm-SiteAware-v0.2/Storm-SiteAware-v0.2.xml
image1.hash: 047baee53ecb3455b8c527f064194cad9a67771a

Slice and Post-Bootscript

We also created an example slice request. The topology of the slice is shown in Figure 3.

Figure 3: Slice topology

Figure 3: Slice topology

The slice is composed of a Nimbus node, a ZooKeeper node and 3 supervisor nodes. Each node has the post-bootscript that takes care of the setup automatically. There is a snippet of script in each supervisor node that detects which site the node is running on using neuca-user-data command, as shown below. The site information will then be appended to the supervisor’s configuration used for scheduling later. Other parts of the post-bootscript are included in the slice request.

site=$(/usr/local/bin/neuca-user-data |grep -m1 physical_host|awk -F'[=-]' '{print $2}')
while [ ! "$site" ];do
  site=$(/usr/local/bin/neuca-user-data |grep -m1 physical_host|awk -F'[=-]' '{print $2}')
  sleep 1
done

The example slice has a working solution installed and is supposed to serve as a reference. Now let’s walk through the idea step by step.

Solution

Now we have a Storm cluster with 3 supervisors deployed at TAMU, UH and UFL in ExoGENI, respectively. Assume we also have a topology with 1 spout and 3 bolts. We want the topology components to be scheduled as shown in Figure 4 after submission.

Figure 4: Use case

Figure 4: Use case

Step 1: Tag Supervisors

As mentioned above, Storm offers a field in the supervisor’s configuration for users to specify custom scheduling metadata. In this case, we tag the supervisors with the name of sites they are running at. The configuration file is located in $STORM_HOME/conf/storm.yaml and you need to append the following lines to it.

...
supervisor.scheduler.meta:
    site: "tamu"
...

Remember to restart Storm daemon to update the configuration. If you are using the example slice, the Storm daemon is monitored with Supervisor. So you simply need to restart supervisord by typing the following line.

# service supervisord restart

Step 2: Tag Topology Components

This step is done when building the topology with TopologyBuilder in the main method of a topology. The ComponentConfigurationDeclarer has a method called addConfiguration(String config, String value) that allows adding custom configuration, i.e. metadata. In our case, we add the site information using this method.

public class ExampleTopology{
    public static void main(String[] args){
        ...
        TopologyBuilder builder = new TopologyBuilder();
        builder.setSpout("spout_1", new ExampleSpout1(), 1).addConfiguration("site", "tamu");
        builder.setBolt("bolt_1", new ExampleBolt1(), 1).shuffleGrouping("spout_1").addConfiguration("site", "tamu");
        builder.setBolt("bolt_2", new ExampleBolt2(), 1).shuffleGrouping("spout_1").addConfiguration("site", "uh");
        builder.setBolt("bolt_3", new ExampleBolt3(), 1).globalGrouping("bolt_1").globalGrouping("bolt_2").addConfiguration("site", "ufl");
        StormSubmitter.submitTopology("example-topology", builder.createTopology());
        ...
    }
}

Step 3: Customize Scheduler

This step is where the magic happens. As we already have the site information ready on both the Storm cluster and the topology, we need to match them up now. As mentioned above, the scheduling logic is implemented in the schedule(Topologies topologies, Cluster cluster) method. The Topologies is a wrap-up of all the topologies submitted to the cluster. It has a collection of TopologyDetails instances, each of which contains information about the corresponding topology. The Cluster represents the Storm cluster, which has a getSupervisors() method to get a list of SupervisorDetails instances that contain details about the supervisors. Then we get the metadata using getSchedulerMeta() method as referred to above. Till now, the code should look as below.

public class SiteAwareScheduler implements IScheduler{

    private static final String SITE = "site";

    @Override
    public void prepare(Map conf) {}

    @Override
    public void schedule(Topologies topologies, Cluster cluster) {
	Collection<TopologyDetails> topologyDetails = topologies.getTopologies();
	Collection<SupervisorDetails> supervisorDetails = cluster.getSupervisors().values();
	Map<String, SupervisorDetails> supervisors = new HashMap<String, SupervisorDetails>();
	for(SupervisorDetails s : supervisorDetails){
	    Map<String, String> metadata = (Map<String, String>)s.getSchedulerMeta();
	    if(metadata.get(SITE) != null){	
		 supervisors.put((String)metadata.get(SITE), s);
	    }
	}
    }
}

Note that the supervisors are indexed by their sites in a hash map in order to accelerate supervisor lookup. Without caching the supervisors, we need to iterate over all the supervisors to find the particular supervisor for a component, resulting in O(number of components in a topology * number of supervisors in cluster) complexity. Because supervisors are relatively static compared to topology components, we cache the supervisors and lower the complexity to O(number of components in a topology) in average. The cache will be updated periodically as the schedule(Topologies topologies, Cluster cluster) method will be called by Storm’s core time to time and changes on supervisors will be reflected in time.

Then we need to iterate over the topologies to schedule them properly. Before looking for topology components, we check if a topology needs to be scheduled with needsScheduling(TopologyDetails topology). If a topology has not been scheduled yet, we move forward; otherwise, we simply skip to the next one. Details about the components are encapsulated in the StormTopology instance, which can be derived by calling getTopology() method of each TopologyDetails instance. In the StormTopology instance, we can get SpoutSpec and Bolt instances which are spouts and bolts in the topology. The site metadata is stored in a JSON string returned by get_json_conf() method. We need a JSON parser to parse the string in order to get the metadata. Here we use json-simple.

public class SiteAwareScheduler implements IScheduler{

    private static final String SITE = "site";

    @Override
    public void prepare(Map conf) {}

    @Override
    public void schedule(Topologies topologies, Cluster cluster) {
	Collection<TopologyDetails> topologyDetails = topologies.getTopologies();
	Collection<SupervisorDetails> supervisorDetails = cluster.getSupervisors().values();
	Map<String, SupervisorDetails> supervisors = new HashMap<String, SupervisorDetails>();
	JSONParser parser = new JSONParser();
	for(SupervisorDetails s : supervisorDetails){
	    Map<String, String> metadata = (Map<String, String>)s.getSchedulerMeta();
	    if(metadata.get(SITE) != null){	
		 supervisors.put((String)metadata.get(SITE), s);
	    }
	}
		
	for(TopologyDetails t : topologyDetails){
	    if(!cluster.needsScheduling(t)) continue;
	    StormTopology topology = t.getTopology();
	    Map<String, Bolt> bolts = topology.get_bolts();
	    Map<String, SpoutSpec> spouts = topology.get_spouts();
	    try{
		for(String name : bolts.keySet()){
		    Bolt bolt = bolts.get(name);
		    JSONObject conf = (JSONObject)parser.parse(bolt.get_common().get_json_conf());
		    if(conf.get(SITE) != null && supervisors.get(conf.get(SITE)) != null){
			String site = (String)conf.get(SITE);
		    }
		}
	        for(String name : spouts.keySet()){
		    SpoutSpec spout = spouts.get(name);
		    JSONObject conf = (JSONObject)parser.parse(spout.get_common().get_json_conf());
		    if(conf.get(SITE) != null && supervisors.get(conf.get(SITE)) != null){
			String site = (String)conf.get(SITE);
		    }
		}
	    }catch(ParseException pe){
		pe.printStackTrace();
	    }
	}
    }
}

Now we have site metadata for each topology component, we can retrieve corresponding supervisors from the cache using the metadata. As introduced above, a supervisor consists of a number of worker slots and a component usually occupies one of them rather than the entire supervisor. The available worker slots can be acquired by calling getAvailableSlots(SupervisorDetails supervisor). On the other hand, a component may be composed of multiple executors, which will be returned by getNeedsSchedulingComponentToExecutor(TopologyDetails topology) and can be distributed over different worker slots. For simplicity, we schedule all executors of a component to the first available worker slot in this tutorial. Finally, assign(WorkerSlot slot, String topologyId, Collection executors) will assign the component to the correct worker slot.

public class SiteAwareScheduler implements IScheduler{

    private static final String SITE = "site";

    @Override
    public void prepare(Map conf) {}

    @Override
    public void schedule(Topologies topologies, Cluster cluster) {
	Collection<TopologyDetails> topologyDetails = topologies.getTopologies();
	Collection<SupervisorDetails> supervisorDetails = cluster.getSupervisors().values();
	Map<String, SupervisorDetails> supervisors = new HashMap<String, SupervisorDetails>();
	JSONParser parser = new JSONParser();
	for(SupervisorDetails s : supervisorDetails){
	    Map<String, String> metadata = (Map<String, String>)s.getSchedulerMeta();
	    if(metadata.get(SITE) != null){	
		 supervisors.put((String)metadata.get(SITE), s);
	    }
	}
		
	for(TopologyDetails t : topologyDetails){
	    if(cluster.needsScheduling(t)) continue;
	    StormTopology topology = t.getTopology();
	    Map<String, Bolt> bolts = topology.get_bolts();
	    Map<String, SpoutSpec> spouts = topology.get_spouts();
	    try{
		for(String name : bolts.keySet()){
		    Bolt bolt = bolts.get(name);
		    JSONObject conf = (JSONObject)parser.parse(bolt.get_common().get_json_conf());
		    if(conf.get(SITE) != null && supervisors.get(conf.get(SITE)) != null){
			String site = (String)conf.get(SITE);
		    }
                    SupervisorDetails supervisor = supervisors.get(site);
                    List<WorkerSlot> workerSlots = cluster.getAvailableWorkerSlots(supervisor);
                    List<ExecutorDetails> executors = cluster.getNeedsSchedulingComponentToExecutors(t).get(name);
                    if(!workerSlots.isEmpty() && executors != null){
                       cluster.assign(workerSlots.get(0), t.getId(), executors);
                    }
		}
	        for(String name : spouts.keySet()){
		    SpoutSpec spout = spouts.get(name);
		    JSONObject conf = (JSONObject)parser.parse(spout.get_common().get_json_conf());
		    if(conf.get(SITE) != null && supervisors.get(conf.get(SITE)) != null){
			String site = (String)conf.get(SITE);
		    }
                    SupervisorDetails supervisor = supervisors.get(site);
                    List<WorkerSlot> workerSlots = cluster.getAvailableWorkerSlots(supervisor);
                    List<ExecutorDetails> executors = cluster.getNeedsSchedulingComponentToExecutors(t).get(name);
                    if(!workerSlots.isEmpty() && executors != null){
                       cluster.assign(workerSlots.get(0), t.getId(), executors);
                    }
		}
	    }catch(ParseException pe){
		pe.printStackTrace();
	    }
	}
    }
}

Step 4: Install Scheduler

After finishing the scheduler, we need to package it into a jar file for installation. It is recommended to use Maven to do it for you as it automates everything well. You can use the POM below with Maven to compile your scheduler.

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>storm.custom</groupId>
  <artifactId>scheduler</artifactId>
  <version>0.0.1</version>
  <name>storm-custom-schedulers</name>
  <description>Storm's custom schedulers</description>
  
  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
  </properties>
  
  <dependencies>
    <dependency>
        <groupId>org.apache.storm</groupId>
        <artifactId>storm-core</artifactId>
        <version>0.9.4</version>
    </dependency>
    <dependency>
        <groupId>com.googlecode.json-simple</groupId>
        <artifactId>json-simple</artifactId>
        <version>1.1.1</version>
    </dependency>
  </dependencies>
  
  <build>
    <plugins>
        <plugin>
           <groupId>org.apache.maven.plugins</groupId>
           <artifactId>maven-compiler-plugin</artifactId>
           <version>3.1</version>
           <configuration>
               <!-- or whatever version you use -->
             <source>1.7</source>
             <target>1.7</target>
           </configuration>
        </plugin>
    </plugins>
  </build>
</project>

Type in the following command to update the jar file every time you make changes on the scheduler.

# mvn clean package -U

The command will create a jar file at $SCHEDULER_HOME/target/scheduler-v0.0.1.jar. Copy it into $STORM_HOME/lib/ and tell Nimbus to use the new scheduler by appending the following lines to the configuration file at $STORM_HOME/conf/storm.yaml.

...
storm.scheduler: "storm.custom.scheduler.SiteAwareScheduler"
...

Restart the Nimbus daemon to reflect the changes on configuration. Again, we simply restart supervisord in this example.

# service supervisord restart

There is working example at /root/scheduler in the VM image.

Note: you should always restart supervisors before Nimbus, so that the Nimbus can get the site metadata from the supervisors. Otherwise, the Nimbus would fail to start due to null pointer because it could not find the entry with a key named “site” in the supervisors’ metadata.

Step 5: Validation

Now it is the time to validate if the scheduler is working correctly. Storm provides a web interface that shows the system’s status. Unfortunately, it does not reflect user metadata, so we need to validate it manually. The access to the web interface is http://<nimbus-host-ip>:8080

First of all, submit a topology with site metadata to the cluster. When the topology starts running, you will see the topology link on the web as shown in Figure 5.

Figure 5: Topology link

Figure 5: Topology link

Then click the topology link and you will get a view of the topology. It will give you all the information about the topology. Spouts and bolts in the topology are listed as shown in Figure 6.

FIgure 6: Spouts/Bolts

FIgure 6: Spouts/Bolts

Because spouts and bolts are not the smallest scheduling unit, it does not provide any information about where are the components running. So we can select a component randomly and get a view of a component. In this view, you will see a list of executors and their information, within which there is a Host column indicating on which supervisor the executor is running, as shown in Figure 7.

Figure 7: Exectors

Figure 7: Exectors

Since you know where the component should be running, you should be able to if they are scheduled correctly. Go through all the components to check the correctness.

Scheduler for Production

The example scheduler is designed only for demonstration of the site-aware scheduling mechanism but far from being production-ready. The example is based on the following assumptions.

  1. All submitted topologies and supervisors have correct site metadata.
  2. There is always at least one worker slot available on every supervisor at any point of time.
  3. Every topology evenly distributed its components to every site involved in the computation.

However, these assumptions may not stand in production and it requires more delicate work.

To address the first problem, we can add a backup scheduler for claiming resources on supervisors without site metadata and scheduling topology components without site information. The Storm’s default scheduler, EvenScheduler, can be a good candidate as it distributes workload evenly among supervisors and avoid “hot spot” effectively.

To solve the second problem, we could either cluster supervisors at a particular site to provide sufficient computing resources, or set priority for tasks to enable task preemption. In order to cluster supervisors, we need to put supervisors identified at a particular site into a list in the example with the code shown below.

...
Map<String, List<SupervisorDetails>> supervisors = new HashMap<String, List<SupervisorDetails>>();
for(SupervisorDetails s : supervisorDetails){
    Map<String, String> meta = (Map<String, String>) s.getSchedulerMeta();
    String site = meta.get(SITE);
    if(meta != null && site != null){
        if(supervisors.get(site) == null){
            supervisors.put(site, new CopyOnWriteArrayList<SupervisorDetails>());
        }
        supervisors.get(site).add(s);
    }
}
...

To preempt tasks, Storm offers a freeSlot(WorkerSlot slot) method. We will not discuss more about design of the priority rules here as it is out of the scope of this post.

Load balance is also important for a production environment, especially that site-aware scheduling may exacerbate overload situation on popular sites. Fortunately, Storm supports scheduling with fine granularity – the minimum scheduling unit is an executor, which is basically a Java thread. As you may notice in the example code, you can get a list of executors from a topology component. You are free to break the list into multiple sub-lists and distribute them over multiple supervisors at a particular site, instead of putting a monolithic component on a single supervisor and leaving others idle. In fact, Storm’s default scheduler schedule a topology in this way. In terms of fault tolerance, it is a good practice to duplicate executors of a topology component over multiple supervisors, so that you still have the topology running if a supervisor node fails.

References

The solution is inspired by this post.

Using Docker in ExoGENI

Overview

This brief post explains how to use Docker in ExoGENI. The image built for this post is posted in ExoGENI Image Registry and also available in Flukes.

Name: Docker-v0.1
URL: http://geni-images.renci.org/images/ibaldin/docker/centos6.6-docker-v0.1/centos6.6-docker-v0.1.xml
Hash: b2262a8858c9c200f9f43d767e7727a152a02248

This is a CentOS 6.6-based image with a Docker install on top. Note that this post is not meant to serve as a primer on Docker. For this you should consult Docker documentation.

Theory of operation

Docker is a platform to configure, ship and run applications. It uses thin containers to isolate dockers from each other. The docker daemon uses a devmapper to manage its disk space. By default Docker creates a sparse 100G file for data, and each docker can take up to 10G of disk space, which clearly is too large to run inside a VM, should it try to fill it up.

To ensure this doesn’t happen the ExoGENI Docker VM sizes each docker not to exceed 5G and the overall space given to Docker in its sparse file is limited to 20G. This setting is adjusted by editing a line in /etc/sysconfig/docker:

other_args="--storage-opt dm.basesize=5G --storage-opt dm.loopdatasize=20G"

If you wish to resize the amount of space available to your Docker, please edit this line accordingly and then do the following:

$ service docker stop
$ cd /var/lib
$ rm -rf docker
$ service docker start

Please note that wiping out the /var/lib/docker directory as shown above will wipe out all images and containers you may have created so far. If you wish to save the image you created, please do

$ docker save -o image.tar 

and save each image. Once Docker disk space is resized and restarted, you can reload the images back using docker load command.

Using the Docker image

You can simply boot some number of instances with the Docker image pointed above and start loading Dockers from Docker Hub or creating your own.

We recommend using the larger VM sizes, like XOLarge and XOExtraLarge to make sure you don’t run out of disk space.

Using perfSonar in ExoGENI

Overview

Special thanks go to Brian Tierney of LBL/ESnet for his help in creating the perfSonar image.

This post describes how to use a perfSonar image in ExoGENI slices. The image built for this blog post is now posted in the ExoGENI Image Registry and available in Flukes.

Name: psImage-v0.3
URL: http://geni-images.renci.org/images/ibaldin/perfSonar/psImage-v0.3/psImage-v0.3.xml
Hash: e45a2c809729c1eb38cf58c4bff235510da7fde5

Note that we are using a Level 2 perfSonar image out of a Centos 6.6 base image with modified ps_light docker from ESnet. However the registration with the perfSonar lookup service is disabled in this image.

Theory of operation

The perfSonar image uses Docker technology to deploy its components. The following elements are included in the image:

  • Client programs for nuttcp, iperf, iperf3, bwctl and owamp included as simple RPMs accessible by all users
  • Server programs for bwctl and owamp running inside a Docker

The image starts Docker on boot, loads the needed Docker images and automatically launches the ‘ps_light_xo’ Docker with the server programs in it.

-bash-4.1# docker ps
CONTAINER ID        IMAGE                COMMAND                CREATED             STATUS              PORTS               NAMES
ba28266c1aec        ps_light_xo:latest   "/bin/sh -c '/usr/bi   6 minutes ago       Up 6 minutes                            suspicious_lovelace  

Under normal operation the user should not have no interact with the server programs – the Docker is running in net host mode and the server programs listen on all the interfaces the VM may have. However, if needed, the user can gain access to the Docker with server programs using the following command:

$ docker exec -ti <guid> /bin/bash

where ‘<guid>’ refers to the automatically started Docker. You can find out the guid by issuing this command:

$ docker ps

Using the image

You can create a topology using the perfSonar image (listed in Image registry and above) and then run the client programs on some nodes against server nodes on other nodes. Since the image has both client and server programs, measurements can be done in any direction as long as IP connectivity is assured.

Once the slice has booted try a few client programs:

-bash-4.1# owping 172.16.0.2
Approximately 13.0 seconds until results available

--- owping statistics from [172.16.0.1]:8852 to [172.16.0.2]:8966 ---
SID:	ac100002d8c6ba8674af285470d65b0b
first:	2015-04-01T14:42:15.627
last:	2015-04-01T14:42:25.314
100 sent, 0 lost (0.000%), 0 duplicates
one-way delay min/median/max = -0.496/-0.4/-0.144 ms, (err=3.9 ms)
one-way jitter = 0.2 ms (P95-P50)
TTL not reported
no reordering


--- owping statistics from [172.16.0.2]:8938 to [172.16.0.1]:8954 ---
SID:	ac100001d8c6ba867d50999ce0a1166f
first:	2015-04-01T14:42:15.553
last:	2015-04-01T14:42:24.823
100 sent, 0 lost (0.000%), 0 duplicates
one-way delay min/median/max = 1.09/1.3/1.5 ms, (err=3.9 ms)
one-way jitter = 0.2 ms (P95-P50)
TTL not reported
no reordering

or

-bash-4.1# bwctl -c 172.16.0.2
bwctl: Using tool: iperf
bwctl: 16 seconds until test results available

RECEIVER START
------------------------------------------------------------
Server listening on TCP port 5578
Binding to local address 172.16.0.2
TCP window size: 87380 Byte (default)
------------------------------------------------------------
[ 15] local 172.16.0.2 port 5578 connected with 172.16.0.1 port 59083
[ ID] Interval       Transfer     Bandwidth
[ 15]  0.0-10.0 sec  1356333056 Bytes  1081206753 bits/sec
[ 15] MSS size 1448 bytes (MTU 1500 bytes, ethernet)

RECEIVER END

Things to note

OWAMP in particular is sensitive to accurate time measurements, which is why the VMs come packaged with ntpd that starts on boot. However this does not solve all the problems. Measuring jitter in a VM may produce unpredictable results due to VM sharing cores with other VMs in the same worker node. While in ExoGENI we do not oversubscribe cores, we also do not (yet) do any core pinning when placing VMs inside workers, which means time artifacts may occur when VMs switch cores.

The end result is that while jitter measurements using OWAMP may have high resolution, their accuracy should be questioned. To improve the accuracy try using larger instance sizes, like XOLarge and XOExtraLarge.

Network Delay Emulation

Tao Qian, a research assistant at RENCI and a PhD student of Professor Frank Mueller at the Computer Science Department, NC State University, has built a software module that allows latency emulation of IP packets per interface in Linux VMs. There is an Ubuntu image pre-loaded with this module available from the ExoGeni image proxy:

To emulate latency in your virtual topology, use the followed command to add delay in the post boot script.
  • nsdelay IP delay
For example, add 100ms delay to node “Node0″ on the interface for link “Link0″.
  • nsdelay $Node0.IP(“Link0”) 100

To remove the delay, use

  • nsdelay IP 0

which will setup the link with a FIFO queue without delay.

To change the delay, you have to remove the current delay first if any by using “nsdelay IP 0”, then set the delay again. Otherwise, the VM will shutdown if try to add delay again before clearing the current delay.
Notes:

  • The latency is purely emulated on the virtual interfaces of the VMs, which means that the physical transportation latency on the virtual links in your virtual topologies is not counted, which you may need to take into consideration when conducting experimentation.
  •  If delays are changed in high frequency, excessive packet drop may occur.

Using Dropbox for ExoGENI Images

This post describes how to create and use custom ExoGENI images stored in Dropbox.   For information about creating custom images refer to previous posts about taking snapshots of existing VMs or creating images from VirtualBox VMs.

Background:

These instructions assume you have created a working Dropbox account.

Images:

These instructions are intended to help you host your own custom ExoGENI images in your own Dropbox account.   However, for the tutorial you may find it useful to use the following image, kernel, and ramdisk that are known to work on ExoGENI.

Pushing the Images to Dropbox:

Copy the image, kernel, and ramdisk files to the public folder of your Dropbox account.   For each of the files right-click the file and choose “Copy public link…“.   Record these links for later.  These are the public html links to your files.

The links will look something like the following.  Note: 12345678 will a unique number associated with your Dropbox account.

  • Image: https://dl.dropboxusercontent.com/u/12345678/centos6.3-v1.0.11.tgz
  • Kernel: https://dl.dropboxusercontent.com/u/12345678/vmlinuz-2.6.32-431.17.1.el6.x86_64
  • Ramdisk: https://dl.dropboxusercontent.com/u/12345678/initramfs-2.6.32-431.17.1.el6.x86_64.img

If your urls do not match the previous pattern, you will likely need to move your files to the public folder.

Creating the Metadata file by hand:

Next you need to create the xml metadata file that describes the three files.   The metadata file contains the public url for each file as well and the sha1 hash for each file.

The sha1 hash can be found by typing the following on most *nix machines.

> sha1sum centos6.3-v1.0.11.tgz initramfs-2.6.32-431.17.1.el6.x86_64.img vmlinuz-2.6.32-431.17.1.el6.x86_64 
cb899c42394eecc008ada7c9b75456c7d7e1149b  centos6.3-v1.0.11.tgz
fc927a776e819b0951b5e8daf81f6991128e9abf  initramfs-2.6.32-431.17.1.el6.x86_64.img
726abdfd57dbe0ca079f3b38f8cce8b9f2323efa  vmlinuz-2.6.32-431.17.1.el6.x86_64

At this point you can create the xml metadata file to look like the following. Note: For the purposes of the tutorial we will name our metadata file centos6.3-v1.0.11.xml.

<images>
   <image>
      <type>ZFILESYSTEM</type>
      <signature>cb899c42394eecc008ada7c9b75456c7d7e1149b</signature>
      <url>https://dl.dropboxusercontent.com/u/12345678/centos6.3-v1.0.11.tgz</url>
   </image>
   <image>
      <type>KERNEL</type>
      <signature>fc927a776e819b0951b5e8daf81f6991128e9abf</signature>
      <url>https://dl.dropboxusercontent.com/u/12345678/initramfs-2.6.32-431.17.1.el6.x86_64.img</url>
   </image>
   <image>
      <type>RAMDISK</type>
      <signature>fc927a776e819b0951b5e8daf81f6991128e9abf</signature>
      <url>https://dl.dropboxusercontent.com/u/12345678/initramfs-2.6.32-431.17.1.el6.x86_64.img</url>
   </image>
</images>

Creating the Metadata file Automatically 

If you want to create the metadata file automatically you can use this script.

The script takes as parameters the three public urls to the image files.   It will then download each file, find the sha1sum, and create the metadata file.   Note: that this may take some time because the file must download the image file which could be very large.

> ./create-image-metadata.sh -z https://dl.dropboxusercontent.com/u/12345678/centos6.3-v1.0.11.tgz -r https://dl.dropboxusercontent.com/u/12345678/initramfs-2.6.32-431.17.1.el6.x86_64.img -k https://dl.dropboxusercontent.com/u/12345678/vmlinuz-2.6.32-431.17.1.el6.x86_64 -n centos6.3-v1.0.11.xml
getting https://dl.dropboxusercontent.com/u/12345678/centos6.3-v1.0.11.tgz
 % Total % Received % Xferd Average Speed Time Time Time Current
 Dload Upload Total Spent Left Speed
100 345M 100 345M 0 0 23.9M 0 0:00:14 0:00:14 --:--:-- 29.8M
getting https://dl.dropboxusercontent.com/u/12345678/vmlinuz-2.6.32-431.17.1.el6.x86_64
 % Total % Received % Xferd Average Speed Time Time Time Current
 Dload Upload Total Spent Left Speed
100 4033k 100 4033k 0 0 2628k 0 0:00:01 0:00:01 --:--:-- 3168k
getting https://dl.dropboxusercontent.com/u/12345678/initramfs-2.6.32-431.17.1.el6.x86_64.img
 % Total % Received % Xferd Average Speed Time Time Time Current
 Dload Upload Total Spent Left Speed
100 12.0M 100 12.0M 0 0 4470k 0 0:00:02 0:00:02 --:--:-- 4946k
Creating XML image descriptor file centos6.3-v1.0.11.xml
Metadata:
<images> 
 <image>
 <type>ZFILESYSTEM</type>
 <signature>cb899c42394eecc008ada7c9b75456c7d7e1149b</signature>
 <url>https://dl.dropboxusercontent.com/u/12345678/centos6.3-v1.0.11.tgz</url>
 </image>
 <image>
 <type>KERNEL</type> 
 <signature>726abdfd57dbe0ca079f3b38f8cce8b9f2323efa</signature>
 <url>https://dl.dropboxusercontent.com/u/12345678/vmlinuz-2.6.32-431.17.1.el6.x86_64</url>
 </image>
 <image>
 <type>RAMDISK</type>
 <signature>fc927a776e819b0951b5e8daf81f6991128e9abf</signature>
 <url>https://dl.dropboxusercontent.com/u/12345678/initramfs-2.6.32-431.17.1.el6.x86_64.img</url>
 </image>
</images>
XML image descriptor file SHA1 hash is: d964e8f63c8e48c419a9eb1db50fb657eb19b468
XML image descriptor file is: centos6.3-v1.0.11.xml

Push the Metadata File to Dropbox

Find the sha1 hash of the metadata file (or look at the stdout from the create-image-metadata.sh script):

> sha1sum centos6.3-v1.0.11.xml 
d964e8f63c8e48c419a9eb1db50fb657eb19b468  centos6.3-v1.0.11.xml

Copy the metadata file to the public folder of your Dropbox account.   Right-click the file to get its public Dropbox url.

Using the image

You can now use the url and hash of the metadata file in your ExoGENI requests.