Accumulo Tutorial

Apache Accumulo is a powerful distributed key store based on Google’s BigTable. Accumulo is written in Java and operates over the Hadoop Distributed File System (HDFS). It uses Zookeeper to track the status of distributed data. Accumulo provides useful cell-based access control and customizable server-side processing, and provides support for using Accumulo tables as input and output for MapReduce jobs.

This page gives a introduction on how to deploy Accumulo cluster on ExoGENI.

Create the Request:

1. Download the request RDF file from here.

2. Start the Flukes application.

3. In Flukes, click “File” -> “Open request”, then find the RDF you’ve downloaded.
Accumulo request

4. Select the rack you would like to use by right clicking a node, then click “Edit properties…”, then choose the desired domain.

5. Edit the needed post-boot script for all the nodes. Right-click the master node and choose “Edit Properties”. Click “PostBoot Script”. A text box will appear. You can make edits upon needs. The default postboot script is universal across all the nodes.

Here the default username is “root”, and password is “secret”, set at the beginning of the postboot script. Make sure to update it to your own password!

Due to some firewall issues, this postboot script only works for CentOS 6.X.

############################################################
# Hadoop section copied from hadoop recipe:
# https://github.com/RENCI-NRIG/exogeni-recipes/tree/master/hadoop/hadoop-2.7.3/hadoop_exogeni_postboot.txt
############################################################

# using stable2 link for Hadoop Version
# HADOOP_VERSION=hadoop-2.7.3
ZOOKEEPER_VERSION=zookeeper-3.4.6
ACCUMULO_VERSION=1.8.1

# The default accumulo password.  Should be changed!
ACCUMULO_PASSWORD=secret

# Velocity Hacks
#set( $bash_var = '${' )
#set( $bash_str_split = '#* ' )
############################################################

# setup /etc/hosts
############################################################
echo $NameNode.IP("VLAN0") $NameNode.Name() >> /etc/hosts
echo $ResourceManager.IP("VLAN0") $ResourceManager.Name() >> /etc/hosts
#set ( $sizeWorkerGroup = $Workers.size() - 1 )
#foreach ( $j in [0..$sizeWorkerGroup] )
 echo $Workers.get($j).IP("VLAN0") `echo $Workers.get($j).Name() | sed 's/\//-/g'` >> /etc/hosts
#end

echo `echo $self.Name() | sed 's/\//-/g'` > /etc/hostname
/bin/hostname -F /etc/hostname

# Install Java
############################################################
yum makecache fast
yum -y update # disabled only during testing. should be enabled in production
yum install -y wget java-1.8.0-openjdk-devel

export JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:/bin/java::")

cat > /etc/profile.d/java.sh << EOF
export JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:/bin/java::")
export PATH=\$JAVA_HOME/bin:\$PATH
EOF

# Install Hadoop
############################################################
stable2=$(curl --location --insecure --show-error https://dist.apache.org/repos/dist/release/hadoop/common/stable2)
# stable2 should look like: link hadoop-2.7.4
HADOOP_VERSION=${bash_var}stable2${bash_str_split}}
mkdir -p /opt/${HADOOP_VERSION}

# use the suggested mirror for the actual download
curl --location --insecure --show-error "https://www.apache.org/dyn/mirrors/mirrors.cgi?action=download&filename=hadoop/common/${HADOOP_VERSION}/${HADOOP_VERSION}.tar.gz" > /opt/${HADOOP_VERSION}.tgz

tar -C /opt/${HADOOP_VERSION} --extract --file /opt/${HADOOP_VERSION}.tgz --strip-components=1
rm -f /opt/${HADOOP_VERSION}.tgz*

export HADOOP_PREFIX=/opt/${HADOOP_VERSION}
export HADOOP_YARN_HOME=${HADOOP_PREFIX}
HADOOP_CONF_DIR=${HADOOP_PREFIX}/etc/hadoop

cat > /etc/profile.d/hadoop.sh << EOF
export HADOOP_HOME=${HADOOP_PREFIX}
export HADOOP_PREFIX=${HADOOP_PREFIX}
export HADOOP_YARN_HOME=${HADOOP_PREFIX}
export HADOOP_CONF_DIR=${HADOOP_PREFIX}/etc/hadoop
export PATH=\$HADOOP_PREFIX/bin:\$PATH
EOF

# Configure iptables for Hadoop (Centos 6)
############################################################
# https://www.vultr.com/docs/setup-iptables-firewall-on-centos-6
iptables -F; iptables -X; iptables -Z
#Allow all loopback (lo) traffic and drop all traffic to 127.0.0.0/8 other than lo:
iptables -A INPUT -i lo -j ACCEPT
iptables -A INPUT -d 127.0.0.0/8 -j REJECT
#Block some common attacks:
iptables -A INPUT -p tcp ! --syn -m state --state NEW -j DROP
iptables -A INPUT -p tcp --tcp-flags ALL NONE -j DROP
iptables -A INPUT -p tcp --tcp-flags ALL ALL -j DROP
#Accept all established inbound connections:
iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
#Allow SSH connections:
iptables -A INPUT -p tcp --dport 22 -j ACCEPT

# Allow internal cluster connections
iptables -I INPUT -i eth1 -p tcp -j ACCEPT

#Node specific iptables config
if [[ $self.Name() == NameNode ]]
then
  # connections to namenode allowed from outside the cluster
  iptables -A INPUT -p tcp --dport 50070 -j ACCEPT
elif [[ $self.Name() == ResourceManager ]]
then
  # connections to resource manager from outside the cluster
  iptables -A INPUT -p tcp --dport 8088 -j ACCEPT
elif [[ $self.Name() == Workers* ]]
then
  # TODO ?
  : #no-op
elif [[ $self.Name() == AccumuloMaster ]]
then
  # connections to accumulo monitor from outside the cluster
  iptables -A INPUT -p tcp --dport 9995 -j ACCEPT
fi

# complete the iptables config
#set the default policies:
iptables -P INPUT DROP
iptables -P OUTPUT ACCEPT
iptables -P FORWARD DROP
#Save the iptables configuration with the following command:
service iptables save

# Create hadoop user and setup SSH
############################################################
useradd -U hadoop
mkdir /home/hadoop/.ssh

# Namenode will generate private SSH key
if [[ $self.Name() == NameNode ]]
then
  ssh-keygen -t rsa -N "" -f /home/hadoop/.ssh/id_rsa
  cat /home/hadoop/.ssh/id_rsa.pub >> /home/hadoop/.ssh/authorized_keys

  # allow cluster to download SSH public key
  # port is only accessible to internal cluster
  mkdir /public_html
  cp -u /home/hadoop/.ssh/id_rsa.pub /public_html/
  (cd /public_html; python -c 'import SimpleHTTPServer,BaseHTTPServer; BaseHTTPServer.HTTPServer(("", 8080), SimpleHTTPServer.SimpleHTTPRequestHandler).serve_forever()') &
else
  # Need to download SSH public key from master
  until wget -O /home/hadoop/.ssh/id_rsa.pub "http://namenode:8080/id_rsa.pub"
  do
    sleep 2
  done
  cat /home/hadoop/.ssh/id_rsa.pub >> /home/hadoop/.ssh/authorized_keys
fi

# Add host RSA keys to SSH known hosts files
# Need to wait until these succeed
until ssh-keyscan namenode >> /home/hadoop/.ssh/known_hosts; do sleep 2; done
until ssh-keyscan resourcemanager >> /home/hadoop/.ssh/known_hosts; do sleep 2; done
#set ( $sizeWorkerGroup = $Workers.size() - 1 )
#foreach ( $j in [0..$sizeWorkerGroup] )
  until ssh-keyscan `echo $Workers.get($j).Name() | sed 's/\//-/g'` >> /home/hadoop/.ssh/known_hosts
  do
    sleep 2
  done
#end

# Fix permissions in .ssh
chown -R hadoop:hadoop /home/hadoop/.ssh
chmod -R g-w /home/hadoop/.ssh
chmod -R o-w /home/hadoop/.ssh

# see if the NameNode can copy private key to other nodes
if [[ $self.Name() == NameNode ]]
then
  until sudo -u hadoop scp -o BatchMode=yes /home/hadoop/.ssh/id_rsa resourcemanager:/home/hadoop/.ssh/id_rsa; do sleep 2; done
  #set ( $sizeWorkerGroup = $Workers.size() - 1 )
  #foreach ( $j in [0..$sizeWorkerGroup] )
    until sudo -u hadoop scp -o BatchMode=yes /home/hadoop/.ssh/id_rsa `echo $Workers.get($j).Name() | sed 's/\//-/g'`:/home/hadoop/.ssh/id_rsa
    do
      sleep 2
    done
  #end
fi

# Configure Hadoop
############################################################
CORE_SITE_FILE=${HADOOP_CONF_DIR}/core-site.xml
HDFS_SITE_FILE=${HADOOP_CONF_DIR}/hdfs-site.xml
MAPRED_SITE_FILE=${HADOOP_CONF_DIR}/mapred-site.xml
YARN_SITE_FILE=${HADOOP_CONF_DIR}/yarn-site.xml
SLAVES_FILE=${HADOOP_CONF_DIR}/slaves

echo "hadoop_exogeni_postboot: configuring Hadoop"

cat > $CORE_SITE_FILE << EOF
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
   <name>fs.default.name</name>
   <value>hdfs://$NameNode.Name():9000</value>
  </property>
</configuration>
EOF

cat > $HDFS_SITE_FILE << EOF
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
   <name>dfs.replication</name>
   <value>2</value>
  </property>
</configuration>
EOF

cat > $MAPRED_SITE_FILE << EOF
<configuration>
 <property>
   <name>mapreduce.framework.name</name>
   <value>yarn</value>
 </property>
</configuration>
EOF

cat > $YARN_SITE_FILE << EOF
<?xml version="1.0"?>
<configuration>
<!-- Site specific YARN configuration properties -->
  <property>
    <name>yarn.resourcemanager.hostname</name>
    <value>$ResourceManager.Name()</value>
  </property>
  <property>
    <name>yarn.resourcemanager.bind-host</name>
    <value>0.0.0.0</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>
</configuration>
EOF

cat > $SLAVES_FILE << EOF
#set ( $sizeWorkerGroup = $Workers.size() - 1 )
#foreach ( $j in [0..$sizeWorkerGroup] )
 `echo $Workers.get($j).Name() | sed 's/\//-/g'`
#end
EOF

# make sure the hadoop user owns /opt/hadoop
chown -R hadoop:hadoop ${HADOOP_PREFIX}

# Start Hadoop
############################################################
echo "hadoop_exogeni_postboot: starting Hadoop"

if [[ $self.Name() == NameNode ]]
then
  sudo -E -u hadoop $HADOOP_PREFIX/bin/hdfs namenode -format
  sudo -E -u hadoop $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start namenode
elif [[ $self.Name() == ResourceManager ]]
then
  # make sure the NameNode has had time to send the SSH private key
  until [ -f /home/hadoop/.ssh/id_rsa ]
  do
    sleep 2
  done
  sudo -E -u hadoop $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start resourcemanager
elif [[ $self.Name() == Workers* ]]
then
  # make sure the NameNode has had time to send the SSH private key
  until [ -f /home/hadoop/.ssh/id_rsa ]
  do
    sleep 2
  done
  sudo -E -u hadoop $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start datanode
  sudo -E -u hadoop $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start nodemanager
fi


############################################################
# ZooKeeper
# Assumes cluster has already been configured for Hadoop
############################################################

# setup /etc/hosts
############################################################
echo $AccumuloMaster.IP("VLAN0") $AccumuloMaster.Name() >> /etc/hosts
echo $NameNode.IP("VLAN0") zoo1 >> /etc/hosts
echo $ResourceManager.IP("VLAN0") zoo2 >> /etc/hosts
echo $AccumuloMaster.IP("VLAN0") zoo3 >> /etc/hosts

# Install ZooKeeper
############################################################
mkdir -p /opt/${ZOOKEEPER_VERSION}
wget -nv --output-document=/opt/${ZOOKEEPER_VERSION}.tgz https://dist.apache.org/repos/dist/release/zookeeper/${ZOOKEEPER_VERSION}/${ZOOKEEPER_VERSION}.tar.gz
tar -C /opt --extract --file /opt/${ZOOKEEPER_VERSION}.tgz
rm /opt/${ZOOKEEPER_VERSION}.tgz*

export ZOOKEEPER_HOME=/opt/${ZOOKEEPER_VERSION}

cat > /etc/profile.d/zookeeper.sh << EOF
export ZOOKEEPER_HOME=/opt/${ZOOKEEPER_VERSION}
export ZOO_DATADIR_AUTOCREATE_DISABLE=1
export PATH=\$ZOOKEEPER_HOME/bin:\$PATH
EOF

# Configure ZooKeeper
############################################################
ZOOKEEPER_DATADIR=/var/lib/zookeeper/
mkdir -p ${ZOOKEEPER_DATADIR}

cat > ${ZOOKEEPER_HOME}/conf/zoo.cfg << EOF
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial 
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between 
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
dataDir=${ZOOKEEPER_DATADIR}
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the 
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
server.1=zoo1:2888:3888
server.2=zoo2:2888:3888
server.3=zoo3:2888:3888
EOF

if [[ $self.Name() == NameNode ]]
then
  echo 1 > ${ZOOKEEPER_DATADIR}/myid
elif [[ $self.Name() == ResourceManager ]]
then
  echo 2 > ${ZOOKEEPER_DATADIR}/myid
elif [[ $self.Name() == AccumuloMaster ]]
then
  echo 3 > ${ZOOKEEPER_DATADIR}/myid
fi

# Start ZooKeeper
############################################################
if [[ $self.Name() == NameNode ]] || [[ $self.Name() == ResourceManager ]] || [[ $self.Name() == AccumuloMaster ]]
then
  echo "accumulo_exogeni_postboot: starting ZooKeeper"
  ${ZOOKEEPER_HOME}/bin/zkServer.sh start
fi


############################################################
# Accumulo
# Assumes cluster has already been configured for Hadoop and Zookeeper
############################################################

# Complete SSH setup for Accumulo Master
############################################################
until ssh-keyscan accumulomaster >> /home/hadoop/.ssh/known_hosts; do sleep 2; done
if [[ $self.Name() == AccumuloMaster ]]
then
  ssh-keyscan `neuca-get-public-ip` >> /home/hadoop/.ssh/known_hosts
  ssh-keyscan 0.0.0.0 >> /home/hadoop/.ssh/known_hosts
fi

# see if the NameNode can copy private key to other nodes
if [[ $self.Name() == NameNode ]]
then
  until sudo -u hadoop scp -o BatchMode=yes /home/hadoop/.ssh/id_rsa accumulomaster:/home/hadoop/.ssh/id_rsa; do sleep 2; done
fi

# Install Accumulo
############################################################
mkdir -p /opt/accumulo-${ACCUMULO_VERSION}
curl --location --insecure --show-error https://dist.apache.org/repos/dist/release/accumulo/${ACCUMULO_VERSION}/accumulo-${ACCUMULO_VERSION}-bin.tar.gz > /opt/accumulo-${ACCUMULO_VERSION}.tgz
tar -C /opt/accumulo-${ACCUMULO_VERSION} --extract --file /opt/accumulo-${ACCUMULO_VERSION}.tgz --strip-components=1
rm -f /opt/accumulo-${ACCUMULO_VERSION}.tgz*

export ACCUMULO_HOME=/opt/accumulo-${ACCUMULO_VERSION}

cat > /etc/profile.d/accumulo.sh << EOF
export ACCUMULO_HOME=/opt/accumulo-$ACCUMULO_VERSION
#export PATH=\$ACCUMULO_HOME/bin:\$PATH
EOF

# make sure the hadoop user owns /opt/accumulo
chown -R hadoop:hadoop ${ACCUMULO_HOME}

# Configure Accumulo
############################################################

# accumulo bootstrap_config.sh tries to create a temp file in CWD.
# 512MB bug https://issues.apache.org/jira/browse/ACCUMULO-4585
cd ${ACCUMULO_HOME}
sudo -E -u hadoop ${ACCUMULO_HOME}/bin/bootstrap_config.sh --size 1GB --jvm --version 2

# tell accumulo where to run each service
sed -i "/localhost/ s/.*/$AccumuloMaster.Name()/" ${ACCUMULO_HOME}/conf/masters
sed -i "/localhost/ s/.*/$AccumuloMaster.Name()/" ${ACCUMULO_HOME}/conf/monitor
sed -i "/localhost/ s/.*/$AccumuloMaster.Name()/" ${ACCUMULO_HOME}/conf/gc
sed -i "/localhost/ s/.*/$AccumuloMaster.Name()/" ${ACCUMULO_HOME}/conf/tracers # not sure where these should be run ?

cat > ${ACCUMULO_HOME}/conf/slaves << EOF
#set ( $sizeWorkerGroup = $Workers.size() - 1 )
#foreach ( $j in [0..$sizeWorkerGroup] )
 `echo $Workers.get($j).Name() | sed 's/\//-/g'`
#end
EOF

# Need monitor to bind to public port
sed -i "/ACCUMULO_MONITOR_BIND_ALL/ s/^# //" ${ACCUMULO_HOME}/conf/accumulo-env.sh

# setup zookeeper hosts
sed -i "/localhost:2181/ s/localhost:2181/zoo1:2181,zoo2:2181,zoo3:2181/" ${ACCUMULO_HOME}/conf/accumulo-site.xml

# disable SASL (?) Kerberos ??
# this is disabled correctly by bootstrap_config.sh
#sed -i '/instance.rpc.sasl.enabled/!b;n;s/true/false/' ${ACCUMULO_HOME}/conf/accumulo-site.xml

# if the password is changed by the user, the script needs to change it here too.
sed -i "/<value>secret/s/secret/${ACCUMULO_PASSWORD}/" ${ACCUMULO_HOME}/conf/accumulo-site.xml

# Start Accumulo
# Start each host separately, as they may be at different 
# stages of configuration
############################################################
if [[ $self.Name() == AccumuloMaster ]]
then
  # wait until we have the SSH private key
  until [ -f /home/hadoop/.ssh/id_rsa ]
  do
    sleep 2
  done

  # init and run accumulo
  sudo -E -u hadoop ${ACCUMULO_HOME}/bin/accumulo init --instance-name exogeni --password ${ACCUMULO_PASSWORD} --user root
  sudo -E -u hadoop ${ACCUMULO_HOME}/bin/start-here.sh

elif [[ $self.Name() == Workers* ]]
then
  # make sure the NameNode has had time to send the SSH private key
  until [ -f /home/hadoop/.ssh/id_rsa ]
  do
    sleep 2
  done

  # need to wait for 'init' of accumulo to finish
  until sudo -E -u hadoop ${HADOOP_PREFIX}/bin/hdfs dfs -ls /accumulo/instance_id > /dev/null 2>&1
  do
    sleep 1
  done

  sudo -E -u hadoop ${ACCUMULO_HOME}/bin/start-here.sh
fi

6. Name your slice and submit it to ExoGENI.

7. Wait for the resources to become Active.
Accumulo manifest

Check the status of Accumulo:

1. It takes about 25 minutes after the slice is up, for the postboot script to install Accumulo. To check if the postboot script has completed simply do:

[cwang@AccumuloMaster ~]$ ps -ef | grep neuca
root      1178     1  0 10:17 ?        00:00:05 /usr/bin/python /usr/bin/neucad start
cwang    25906 25794  0 12:30 pts/0    00:00:00 grep neuca
[cwang@AccumuloMaster ~]$ ps -ef | grep 1178
root      1178     1  0 10:17 ?        00:00:05 /usr/bin/python /usr/bin/neucad start
root      1459  1178  0 10:18 ?        00:00:00 [bootscript] <defunct>
cwang    25911 25794  0 12:30 pts/0    00:00:00 grep 1178

The statement below indicates postboot script execution has completed:

root      1459  1178  0 10:18 ?        00:00:00 [bootscript] <defunct>

2. To check if the installation of Zookeeper is successful, the zkCli.sh tool can be helpful:

[cwang@AccumuloMaster ~]$ zkCli.sh -server zoo1
Connecting to zoo1
2017-10-27 12:33:34,394 [myid:] - INFO  [main:Environment@100] - Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
2017-10-27 12:33:34,399 [myid:] - INFO  [main:Environment@100] - Client environment:host.name=AccumuloMaster
2017-10-27 12:33:34,399 [myid:] - INFO  [main:Environment@100] - Client environment:java.version=1.8.0_151
2017-10-27 12:33:34,402 [myid:] - INFO  [main:Environment@100] - Client environment:java.vendor=Oracle Corporation
2017-10-27 12:33:34,402 [myid:] - INFO  [main:Environment@100] - Client environment:java.home=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.151-1.b12.el6_9.x86_64/jre
2017-10-27 12:33:34,402 [myid:] - INFO  [main:Environment@100] - Client environment:java.class.path=/opt/zookeeper-3.4.6/bin/../build/classes:/opt/zookeeper-3.4.6/bin/../build/lib/*.jar:/opt/zookeeper-3.4.6/bin/../lib/slf4j-log4j12-1.6.1.jar:/opt/zookeeper-3.4.6/bin/../lib/slf4j-api-1.6.1.jar:/opt/zookeeper-3.4.6/bin/../lib/netty-3.7.0.Final.jar:/opt/zookeeper-3.4.6/bin/../lib/log4j-1.2.16.jar:/opt/zookeeper-3.4.6/bin/../lib/jline-0.9.94.jar:/opt/zookeeper-3.4.6/bin/../zookeeper-3.4.6.jar:/opt/zookeeper-3.4.6/bin/../src/java/lib/*.jar:/opt/zookeeper-3.4.6/bin/../conf:
2017-10-27 12:33:34,402 [myid:] - INFO  [main:Environment@100] - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2017-10-27 12:33:34,402 [myid:] - INFO  [main:Environment@100] - Client environment:java.io.tmpdir=/tmp
2017-10-27 12:33:34,402 [myid:] - INFO  [main:Environment@100] - Client environment:java.compiler=
2017-10-27 12:33:34,402 [myid:] - INFO  [main:Environment@100] - Client environment:os.name=Linux
2017-10-27 12:33:34,402 [myid:] - INFO  [main:Environment@100] - Client environment:os.arch=amd64
2017-10-27 12:33:34,402 [myid:] - INFO  [main:Environment@100] - Client environment:os.version=2.6.32-696.1.1.el6.x86_64
2017-10-27 12:33:34,403 [myid:] - INFO  [main:Environment@100] - Client environment:user.name=cwang
2017-10-27 12:33:34,403 [myid:] - INFO  [main:Environment@100] - Client environment:user.home=/home/cwang
2017-10-27 12:33:34,403 [myid:] - INFO  [main:Environment@100] - Client environment:user.dir=/home/cwang
2017-10-27 12:33:34,405 [myid:] - INFO  [main:ZooKeeper@438] - Initiating client connection, connectString=zoo1 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@3eb07fd3
Welcome to ZooKeeper!
2017-10-27 12:33:34,434 [myid:] - INFO  [main-SendThread(NameNode:2181):ClientCnxn$SendThread@975] - Opening socket connection to server NameNode/172.16.100.1:2181. Will not attempt to authenticate using SASL (unknown error)
JLine support is enabled
2017-10-27 12:33:34,580 [myid:] - INFO  [main-SendThread(NameNode:2181):ClientCnxn$SendThread@852] - Socket connection established to NameNode/172.16.100.1:2181, initiating session
2017-10-27 12:33:34,594 [myid:] - INFO  [main-SendThread(NameNode:2181):ClientCnxn$SendThread@1235] - Session establishment complete on server NameNode/172.16.100.1:2181, sessionid = 0x15f5e3661d10001, negotiated timeout = 30000

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
[zk: zoo1(CONNECTED) 0] ls / 
[accumulo, zookeeper, tracers]

A detailed instruction of Accumulo operations can be found in Accumulo user manual. Here we give a simple example from YCSB:

First we use a simple Ruby script, based on the HBase README, to generate adequate split-point.

[cwang@AccumuloMaster ~]$ echo 'num_splits = 20; puts (1..num_splits).map {|i| "user#{1000+i*(9999-1000)/num_splits}"}' | ruby > /tmp/splits.txt

Then we create Accumulo table, insert data and list available tables:

[cwang@AccumuloMaster ~]$ /opt/accumulo-1.8.1/bin/accumulo shell -u root -p secret
OpenJDK 64-Bit Server VM warning: If the number of processors is expected to increase from one, then you should configure the number of parallel GC threads appropriately using -XX:ParallelGCThreads=N
2017-10-27 13:13:45,604 [conf.ConfigSanityCheck] WARN : Use of instance.dfs.uri and instance.dfs.dir are deprecated. Consider using instance.volumes instead.
2017-10-27 13:13:47,635 [trace.DistributedTrace] INFO : SpanReceiver org.apache.accumulo.tracer.ZooTraceClient was loaded successfully.

Shell - Apache Accumulo Interactive Shell
- 
- version: 1.8.1
- instance name: exogeni
- instance id: 98a4486f-2456-46d9-9823-c4ac516b6006
- 
- type 'help' for a list of available commands
- 
root@exogeni> createtable usertable
root@exogeni usertable> addsplits -t usertable -sf /tmp/splits.txt
root@exogeni usertable> config -t usertable -s table.cache.block.enable=true
root@exogeni usertable> tables
accumulo.metadata
accumulo.replication
accumulo.root
trace
usertable

Accumulo should have been successfully installed.

Images with Network IDS Tools

Two new images with network IDS tools are added to ExoGENI Image Registry. Both images can be used to deploy Network IDS tools to the slices.
Centos 7.4 v1.0.4 BRO
Ubuntu 14.04 Security Onion

Bro Network Security Monitor” is a framework that can be used to monitor network traffic. It has built-in analyzers to inspect the traffic for all kinds of activity. Bro Web Site includes documentation.

Security Onion is a Linux distro for intrusion detection, network security monitoring, and log management. It’s based on Ubuntu and contains Snort, Suricata, Bro, OSSEC, Sguil, Squert, ELSA, Xplico, NetworkMiner, and many other security tools. Detailed information can be found on wiki and web site.

1. Configuration of Bro instance from image “Centos 7.4 v1.0.4 BRO”
Bro v2.5.2 is built from source with pf_ring and installed to /opt directory.

Minimal starting configuration can be done by modifying /opt/bro/etc/node.cfg and /opt/bro/etc/broctl.cfg files.

Standalone:

[root@bro ~]# cat /opt/bro/etc/node.cfg 
# Example BroControl node configuration.
#
# This example has a standalone node ready to go except for possibly changing
# the sniffing interface.

# This is a complete standalone configuration.  Most likely you will
# only need to change the interface.
[bro]
type=standalone
host=localhost
interface=eth0

Cluster (multiple workers with pf_ring):

[root@bro ~]# cat /opt/bro/etc/node.cfg 
[manager]
type=manager
host=localhost
#
[proxy]
type=proxy
host=localhost

[bro-eth1]
type=worker
host=localhost
interface=eth1
lb_method=pf_ring
lb_procs=5
#pin_cpus=1,3

Deploy configuration and start Bro:

[root@bro ~]# broctl deploy
checking configurations ...
installing ...
removing old policies in /opt/bro/spool/installed-scripts-do-not-touch/site ...
removing old policies in /opt/bro/spool/installed-scripts-do-not-touch/auto ...
creating policy directories ...
installing site policies ...
generating cluster-layout.bro ...
generating local-networks.bro ...
generating broctl-config.bro ...
generating broctl-config.sh ...
stopping ...
bro-eth1-1 not running
bro-eth1-2 not running
proxy not running
manager not running
starting ...
starting manager ...
starting proxy ...
starting bro-eth1-1 ...
starting bro-eth1-2 ...

[root@bro ~]# broctl status
Name         Type    Host             Status    Pid    Started
manager      manager localhost        running   1307   25 Oct 15:46:02
proxy        proxy   localhost        running   1348   25 Oct 15:46:04
bro-eth1-1   worker  localhost        running   1399   25 Oct 15:46:05
bro-eth1-2   worker  localhost        running   1401   25 Oct 15:46:05

2. Configuration of Security Onion instance from image “Ubuntu 14.04 Security Onion”

After deploying the VM, login with SSH X11 forwarding and run sosetup.

$ ssh -Y -i ~/.ssh/id_rsa root@147.72.248.6
... [output omitted] ...

root@so-1:~# sosetup

Follow the prompts:

Screen Shot 2017-10-25 at 10.37.27

Next window about network interface configuration can be omitted since we will not change the management interface. However, if a configuration needs to be done through this window, eth0 should be selected as the management interface with the current IP address of the VM (from 10.103.0.0/24 subnet) and netmask (255.255.255.0) and default gateway 10.103.0.1 .

Screen Shot 2017-10-25 at 10.37.49

Details about the server configuration can be found on the wiki . This sample configuration will select “Evaluation Mode”.

Screen Shot 2017-10-25 at 10.39.04

Dataplane interfaces (eth1, eth2 … ) can be selected for monitoring.
Screen Shot 2017-10-25 at 10.39.23

A local user account needs to be created to access Squil, Squert and ELSA.
Screen Shot 2017-10-25 at 10.39.48

Screen Shot 2017-10-25 at 10.40.10

Screen Shot 2017-10-25 at 10.40.28

Configuration changes will be committed.

Screen Shot 2017-10-25 at 10.40.43

Screen Shot 2017-10-25 at 12.14.01

Information messages pop up.

Screen Shot 2017-10-25 at 10.41.59

Screen Shot 2017-10-25 at 10.42.25

Screen Shot 2017-10-25 at 10.48.39

Firewall needs to be configured to allow connections to the instance. this should be done after sosetup is completed.

Screen Shot 2017-10-25 at 10.49.13

Screen Shot 2017-10-25 at 10.49.36

Configure firewall for access to the instance (Entries are mentioned with bold text below):

root@so-1:~# so-allow
This program allows you to add a firewall rule to allow connections from a new IP address.

What kind of device do you want to allow?

[a] - analyst - ports 22/tcp, 443/tcp, and 7734/tcp
[c] - apt-cacher-ng client - port 3142/tcp
[l] - syslog device - port 514
[o] - ossec agent - port 1514/udp
[s] - Security Onion sensor - 22/tcp, 4505/tcp, 4506/tcp, and 7736/tcp

If you need to add any ports other than those listed above,
you can do so using the standard 'ufw' utility.

For more information, please see the Firewall page on our Wiki:
https://github.com/Security-Onion-Solutions/security-onion/wiki/Firewall

Please enter your selection (a - analyst, c - apt-cacher-ng client, l - syslog, o - ossec, or s - Security Onion sensor):
a
Please enter the IP address of the analyst you'd like to allow to connect to port(s) 22,443,7734:
152.54.9.188
We're going to allow connections from 152.54.9.188 to port(s) 22,443,7734.

Here's the firewall rule we're about to add:
sudo ufw allow proto tcp from 152.54.9.188 to any port 22,443,7734

We're also whitelisting 152.54.9.188 in /var/ossec/etc/ossec.conf to prevent OSSEC Active Response from blocking it.  Keep in mind, the OSSEC server will be restarted once configuration is complete.

To continue and add this rule, press Enter.
Otherwise, press Ctrl-c to exit.
PRESS ENTER
Rule added
Rule has been added.

Here is the entire firewall ruleset:
Status: active

To                         Action      From
--                         ------      ----
22/tcp                     ALLOW       Anywhere
22,443,7734/tcp            ALLOW       152.54.9.188
22/tcp (v6)                ALLOW       Anywhere (v6)


Added whitelist entry for 152.54.9.188 in /var/ossec/etc/ossec.conf.

Restarting OSSEC Server...
Deleting PID file '/var/ossec/var/run/ossec-remoted-5006.pid' not used...
Killing ossec-monitord .. 
Killing ossec-logcollector .. 
ossec-remoted not running ..
Killing ossec-syscheckd .. 
Killing ossec-analysisd .. 
ossec-maild not running ..
Killing ossec-execd .. 
Killing ossec-csyslogd .. 
OSSEC HIDS v2.8 Stopped
Starting OSSEC HIDS v2.8 (by Trend Micro Inc.)...
Started ossec-csyslogd...
2017/10/25 16:20:23 ossec-maild: INFO: E-Mail notification disabled. Clean Exit.
Started ossec-maild...
Started ossec-execd...
Started ossec-analysisd...
Started ossec-logcollector...
Started ossec-remoted...
Started ossec-syscheckd...
Started ossec-monitord...
Completed.

Check status of the services:

root@so-1:~# service nsm status
Status: securityonion
  * sguil server                                                                           [  OK  ]
Status: HIDS
  * ossec_agent (sguil)                                                                    [  OK  ]
Status: Bro
Name         Type       Host          Status    Pid    Started
bro          standalone localhost     running   7390   25 Oct 16:14:13
Status: so-1-eth1
  * netsniff-ng (full packet data)                                                         [  OK  ]
  * pcap_agent (sguil)                                                                     [  OK  ]
  * snort_agent-1 (sguil)                                                                  [  OK  ]
  * snort-1 (alert data)                                                                   [  OK  ]
  * barnyard2-1 (spooler, unified2 format)                                                 [  OK  ]

Web UI can be accessed through the public IP address of the VM. Squert and ELSA can be accessed from the links:

Screen Shot 2017-10-25 at 10.56.43

Screen Shot 2017-10-25 at 10.57.02

Screen Shot 2017-10-25 at 10.57.42

neuca-guest-tools 1.7: VM configuration at instantiation and “Picking yourself up by the bootstraps”

Do I dare
Disturb the universe?
In a minute there is time
For decisions and revisions which a minute will reverse.

— T. S. Eliot, “The Love Song of J. Alfred Prufrock”

ExoGENI experimenters!

Have you ever wanted to reboot your VMs, but found yourself unable to log back into them after having done so?

If so – then, fear the reboot no more; we here at ExoGENI Central have heard your laments, and have worked hard to address them!

We proudly announce the availability of neuca-guest-tools 1.7.
For those who are unaware – the neuca-guest-tools are included in most ExoGENI VM images, and handle the business of performing certain types of configuration (e.g. network address assignment) when VMs are created. In this respect, they are similar to cloud-init – but perform several additional tasks.

In this latest release, we have performed a significant clean-up and re-organization of the code. Several known and latent bugs were fixed (though, others may well have been introduced), and all python code has been PEP8-ified (for ease of reading and modification).

As to new features and changes in behavior?

  • We ensure that network interfaces retain their device names across reboots. This is accomplished by generating udev files for network interface devices on Linux. By doing this, we are able to prevent the management interface from being subverted by the kernel’s probe order during a reboot (which was the primary reason for VMs with multiple interfaces becoming unreachable after a reboot).
  • In Linux VMs that use NetworkManager (I’m looking at you, CentOS 7), NetworkManager is not allowed to interfere with the configuration of interfaces that are meant to be under the management of the neuca-guest-tools. This is done by having neuca-guest-tools mark dataplane interfaces as “unmanaged” within the context of NetworkManager.
  • Dataplane interface address configurations are only modified when the neuca-guest-tools are restarted, or when a change has been made to the request. Therefore, if you make manual changes to a dataplane interface while the VM is running (for example, via ifconfig), that change should persist until you either: reboot the VM, restart neuca-guest-tools, or make a change to your request that alters that interface. Dataplane interfaces can still be excluded from any address configuration changes by adding their MAC addresses (comma-separated, if there are multiple interfaces you wish to ignore) to the “dataplane-macs-to-ignore” configuration item in: /etc/neuca/config
  • Both System V init scripts and systemd unit files should now be named: neuca-guest-tools

As we have traditionally done, the neuca-guest-tools primarily target various flavors of Linux. Any Unix-like OS should be supportable, and contributions to enable those OSes are welcome. Support for Windows is on the long-term horizon.

The source repository can be found here:
https://github.com/RENCI-NRIG/neuca-guest-tools

Following Ubuntu’s lead, however, we’re no longer packaging (or supporting) 12.04; while the python code should still work with the distributed version of python in 12.04 (2.6!) – maintaining packages for distributions without vendor support seemed somewhat counter-productive.

Packages for recent and supported versions of CentOS, Fedora, and Ubuntu can be found here:
http://software.exogeni.net/repo/exogeni/neuca-guest-tools/

A source release can also be found at that location, for those who wish to attempt installing on versions of Linux for which we do not have packages.

We have also provided new VM images that contain the latest release of neuca-guest-tools; these are:

  • Centos 6.9 v1.0.2
  • Centos 7.4 v1.0.2
  • Fedora 25 v1.0.6
  • Fedora 26 v1.0.1
  • Ubuntu 14.04 v1.0.3
  • Ubuntu 16.04 v1.0.3

As a reminder – if you’d like to check what version of the neuca-guest-tools you’re running in your VM, you can run:

neuca-version

A sample run might look like the following:


[root@Node0 ~]# neuca-version
NEuca version 1.7

Remember: unless you’re running neuca-guest-tools 1.7, any VMs having multiple interfaces are unlikely to survive a reboot.

Finally – if you’d like to create your own images from one of the ones that already has already been provided for you, we suggest taking a look at the “image capture” script, which can be found here:
http://geni-images.renci.org/images/tools/imgcapture.sh

We’ve recently made some changes to it as well, so that custom images are captured more reliably. We’ve also added the ability to capture xattrs (if they are set on filesystems within your image); this should enable the ability to boot SELinux-enabled images. If interest is expressed in performing experiments using SELinux-enabled images, we will provide base VM images that have SELinux-enabled (from which customized images can be derived).

If you would like an example of how to use the image capture script, please take a look at the following fine ExoBlog entry:
Creating a Custom Image from an Existing Virtual Machine

We hope you enjoy the new release of neuca-guest-tools!

Working with Linux eBPF and XDP on ExoGENI

Overview

If you are interested in exploring the eBPF mechanism available in linux since late 3.x kernel series, there’s a Fedora 25 image with kernel 4.13.0 that include eBPF and XDP support that can be used in ExoGENI dataplane.

Name:Fedora 25 BPF XDP
URL: http://geni-images.renci.org/images/standard/fedora/fedora25-v1.0.3/fedora25-v1.0.3.xml
Hash: ed658235257296e230bbb640c0bbae2ec2b2602d

The image is automatically available in Flukes in the image list.

XDP is envisioned to eventually supplant the use of Intel’s DPDK, although at present isn’t as full features as DPDK.

A few definitions:

  • eBPF – extended Berkeley Packet Filters – an in-kernel virtual machine that allows attaching small BPF programs to various points in the running kernel: kprobes, uprobes and portions of the networking stack – TC classifiers, ingress and egress actions and XDP.
    • The kernel must be compiled with eBPF support
    • A separate option is the JIT compiler for eBPF included into the kernel which improves performance of BPF programs
    • BPF programs are typically written using a subset of C (less for loops, static variables and constants).
    • BPF programs are checked by eBPF verifier prior to insertion into the kernel to guarantee they terminate.
    • BPF is becoming a widely used tool for run-time unobtrusive profiling of various aspects of behavior of the system – tracing system calls, I/O behavior, network behavior.
  • XDP – a novel mechanism that allows attaching BPF programs to the receive path in the kernel before the networking stack gets hold of the frame from the driver. The program is presented with a single memory page containing the received frame (so there is no sk_buff with its scatter-gather semantics to deal with). Frames can be parsed by BPF programs and discarded, passed on to the stack or redirected back into the network driver. XDP has more limited compatibility compared to TC hooks due to the need for driver support. It, however has higher performance, due to simpler memory layout (sk_buffs have scatter-gather, while XDP uses a single memory page).
  • BCC (BPF Compiler Collection) – a collection of tools form Iovisor project that make it easy to write eBPF programs by combining Python and with BPF programs written in C.

There is a kernel compatibility matrix that shows which kernel versions have which BPF features available.

Using BPF on ExoGENI

Simply boot some number of instances with the Fedora 25 BPF XDP image to get access to full features of BPF as of kernel 4.13.0. The 8139 driver used in ExoGENI appears to support XDP, so it will be available on the management interface. Dataplane interfaces use the virtio-net driver, which, unfortunately, does not support XDP, however it is still possible to install eBPF programs on dataplane interfaces, however they have to be of type BPF_PROG_TYPE_SCHED_CLS or BPF_PROG_TYPE_SCHED_ACT and operate on sk_buffs, not pages, the way XDP programs do.

Read the references above and try the examples located under /usr/share/bcc/examples within the image to become more familiar with various capabilities.

Jumbo Frame Support on Dataplane Interfaces

ExoGENI Testbed supports jumbo frames on dataplane interfaces across sites. (Currently, all racks except UMass and WVN racks support jumbo frames. UMass and WVN interfaces will be configured in the following weeks.)

VMs are created with dataplane interfaces that have an MTU of 1500 bytes. Underneath bridges and physical interfaces along the path are configured for an MTU of 9000. Currently, neuca tools don’t have an option to setup MTU size. MTU size needs to be modified within inside the VM.

Screen Shot 2017-06-13 at 13.28.23

On node0:

root@Node0:~# ifconfig eth1
eth1      Link encap:Ethernet  HWaddr FA:16:3E:00:21:09  
          inet addr:172.16.0.1  Bcast:172.16.0.3  Mask:255.255.255.252
          inet6 addr: fe80::f816:3eff:fe00:2109/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:25 errors:0 dropped:0 overruns:0 frame:0
          TX packets:5 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:1922 (1.8 KiB)  TX bytes:378 (378.0 b)

root@Node0:~# ifconfig eth1 mtu 9000

On node1:

root@Node1:~# ifconfig eth1
eth1      Link encap:Ethernet  HWaddr FA:16:3E:00:16:BD  
          inet addr:172.16.0.2  Bcast:172.16.0.3  Mask:255.255.255.252
          inet6 addr: fe80::f816:3eff:fe00:16bd/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:7 errors:0 dropped:0 overruns:0 frame:0
          TX packets:5 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:530 (530.0 b)  TX bytes:398 (398.0 b)

root@Node1:~# ifconfig eth1 mtu 9000

Now, jumbo frames can be exchanged:

root@Node0:~# ping -M do -s 8972 -c 3 172.16.0.2
PING 172.16.0.2 (172.16.0.2) 8972(9000) bytes of data.
8980 bytes from 172.16.0.2: icmp_seq=1 ttl=64 time=114 ms
8980 bytes from 172.16.0.2: icmp_seq=2 ttl=64 time=56.9 ms
8980 bytes from 172.16.0.2: icmp_seq=3 ttl=64 time=56.7 ms

--- 172.16.0.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2059ms
rtt min/avg/max/mdev = 56.732/76.150/114.798/27.329 ms

-M do : Prohibits fragmentation
-s : Sets packetsize. 8972 bytes (padding) + 20 bytes (TCP header) + 8 bytes (ICMP header) = 9000

Creating Windows images on ExoGENI

In this post, creation and configuration of a Windows image is described. Virtual Machines running Windows OS can be used on ExoGENI. As it is mentioned on this post ExoGENI does not host VM images. Windows images should be created by the users. This can be done on any platform. ExoGENI testbed baremetal servers can be used for image creation as well. In this post, steps for installing KVM to a baremetal server to use as a virtualization platform for image creation is described, too.

  • Activation of the OS should be managed by the user.
  • Dataplane interfaces can be created by ORCA, however configuration of interfaces is not supported yet.
  • Attaching iSCSI storage is not supported yet.
  • Upon creation of the VM, IP address assignment and other network configuration for dataplane interfaces should be done manually within inside the VM.

Steps to create and deploy an image are as below:

  1. Install virtualization platform for image creation
  2. Install and customize the OS
  3. Create and deploy the image

Virtualization Platform Installation

1. Create a slice with one baremetal server. Baremetal server will be used as the hypervisor to provision an instance. Image file of the instance will be converted and deployed on a web server or image registry to provision VMs on ExoGENI.

2. Install KVM and virtualization platform.

yum update -y
yum install qemu-kvm qemu-img -y
yum groupinstall virtualization-client virtualization-platform virtualization-tools -y
service libvirtd start
yum install vnc -y

3. Install RPMs for X11 Forwarding. We will need to launch VNC viewer to access the VM.

yum install -y xorg-x11-xauth xorg-x11-fonts-* xorg-x11-utils
touch /root/.Xauthority

4. On ExoGENI, baremetal servers are booted off “stateless images” and OS runs on ramdisk. Each server has two harddrives which can be partitioned and mounted after the server is provisioned. (Another option is attaching storage to the server during slice creation.) We will use the physical drives, partition and mount to provide storage for the hypervisor. (If there are already defined partitions, either re-use them or delete, re-partition and create filesystem)

[root@Node0 ~]# fdisk /dev/sda
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel with disk identifier 0x77e3f885.
Changes will remain in memory only, until you decide to write them.
After that, of course, the previous content won't be recoverable.

Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

WARNING: DOS-compatible mode is deprecated. It's strongly recommended to
         switch off the mode (command 'c') and change display units to
         sectors (command 'u').

Command (m for help): p

Disk /dev/sda: 299.0 GB, 298999349248 bytes
255 heads, 63 sectors/track, 36351 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x77e3f885

   Device Boot      Start         End      Blocks   Id  System

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-36351, default 1): 
Using default value 1
Last cylinder, +cylinders or +size{K,M,G} (1-36351, default 36351): 
Using default value 36351

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

[root@Node0 ~]# fdisk  -l

Disk /dev/sda: 299.0 GB, 298999349248 bytes
255 heads, 63 sectors/track, 36351 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x77e3f885

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1               1       36351   291989376   83  Linux

[root@Node0 ~]# mkfs.ext4 /dev/sda1
mke2fs 1.41.12 (17-May-2010)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
18251776 inodes, 72997344 blocks
3649867 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
2228 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks: 
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
4096000, 7962624, 11239424, 20480000, 23887872, 71663616

Writing inode tables: done                            
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 35 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.

Create directories, mount hard drive:

[root@Node0 ~]# mkdir /opt/kvm
[root@Node0 ~]# mount /dev/sda1 /opt/kvm
[root@Node0 ~]# mkdir /opt/kvm/iso
[root@Node0 ~]# mv /var/lib/libvirt /opt/kvm/.
[root@Node0 ~]# ln -s /opt/kvm/libvirt /var/lib/libvirt

5. Create bridge interface which will be used by KVM. During this image creation process, we will not need to use network connection to the VM, but bridge interface can be used to access the VM, if needed. Public interface of the server will be bridged to access the VM, so we need to create and configure the bridge with the public interface. (This interface depends on the type of racks. On UCS-B series ExoGENI racks, this interface is eth0, whereas on IBM racks em1.)

Create a script, then execute:

### Configure br0 on the baremetal server
#!/bin/bash

PHYS_IF="eth0"
BR_IF="br0"
IPADDR=$(ifconfig $PHYS_IF | grep "inet addr" | awk '{print $2}' | cut -d: -f2)
NETMASK=$(ifconfig $PHYS_IF | grep "inet addr" | awk '{print $4}' | cut -d: -f2)
GATEWAY=$(ip route show | grep default | awk '{print $3}')

brctl addbr ${BR_IF}
ifconfig ${PHYS_IF} 0.0.0.0 down
brctl addif ${BR_IF} ${PHYS_IF}

ifconfig ${PHYS_IF} up
ifconfig ${BR_IF} ${IPADDR} netmask ${NETMASK} up
route add -net default gw ${GATEWAY}
[root@Node0 ~]# chmod +x  configure_bridge.sh 
[root@Node0 ~]# ./configure_bridge.sh
[root@Node0 ~]# ifconfig -a
br0       Link encap:Ethernet  HWaddr 00:25:B5:00:02:7F  
          inet addr:10.101.0.16  Bcast:10.101.0.255  Mask:255.255.255.0
          inet6 addr: fe80::225:b5ff:fe00:27f/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:21 errors:0 dropped:0 overruns:0 frame:0
          TX packets:19 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:1444 (1.4 KiB)  TX bytes:1750 (1.7 KiB)

eth0      Link encap:Ethernet  HWaddr 00:25:B5:00:02:7F  
          inet6 addr: fe80::225:b5ff:fe00:27f/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:819589 errors:0 dropped:0 overruns:0 frame:0
          TX packets:187926 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:1206755994 (1.1 GiB)  TX bytes:16406110 (15.6 MiB)

eth1      Link encap:Ethernet  HWaddr 00:25:B5:00:02:4F  
          inet6 addr: fe80::225:b5ff:fe00:24f/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:10 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 b)  TX bytes:888 (888.0 b)

eth2      Link encap:Ethernet  HWaddr 00:25:B5:00:02:5F  
          inet6 addr: fe80::225:b5ff:fe00:25f/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:9000  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:10 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 b)  TX bytes:888 (888.0 b)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

virbr0    Link encap:Ethernet  HWaddr 52:54:00:A1:FD:F2  
          inet addr:192.168.122.1  Bcast:192.168.122.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

virbr0-nic Link encap:Ethernet  HWaddr 52:54:00:A1:FD:F2  
          BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:500 
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

6. Copy Windows installer ISO and VirtIO drivers for Windows to the server. (Windows installer needs to be provided by the user.)

wget http://geni-images.renci.org/images/windows/virtio-win-0.1-22.iso

7. Create the instance:

[root@Node0 ~]# qemu-img create -f qcow2 /var/lib/libvirt/images/win7.qcow 20G
Formatting '/var/lib/libvirt/images/win7.qcow', fmt=qcow2 size=21474836480 encryption=off cluster_size=65536 

[root@Node0 ~]# /usr/libexec/qemu-kvm -m 2048 -cdrom /opt/kvm/iso/Win7Enterprise-64bit.iso -drive file=/var/lib/libvirt/images/win7.qcow,if=virtio -drive file=/opt/kvm/iso/virtio-win-0.1-22.iso,index=3,media=cdrom -net nic,model=virtio -net user -nographic -usbdevice tablet -vnc :9 -enable-kvm 

On another terminal login with X11 forwarding:

ssh -Y -i ~/.ssh/id_geni_ssh_mcevik_rsa root@139.62.242.122

Add iptables rule for VNC connection (port 5109):

iptables -A INPUT -p tcp --dport 5909 -j ACCEPT

Connect to the instance via VNC:

[root@Node0 ~]# vncviewer 127.0.0.1:9

Windows OS Installation and Customization

Install the OS by selecting VirtIO driver from the ISO image attached to the instance:

 Screen Shot 2017-06-05 at 23.40.43
 Screen Shot 2017-06-05 at 23.41.18
Screen Shot 2017-06-05 at 23.41.23
Screen Shot 2017-06-05 at 23.41.28
Screen Shot 2017-06-05 at 23.41.41
Screen Shot 2017-06-05 at 23.43.30
Screen Shot 2017-06-05 at 23.44.03
Screen Shot 2017-06-05 at 23.44.17
Screen Shot 2017-06-06 at 00.35.05
Screen Shot 2017-06-06 at 00.37.34

Switch to audit mode by CTRL+SHIFT+F3:

Screen Shot 2017-06-06 at 00.38.00 Screen Shot 2017-06-06 at 00.38.39

Install virtio network driver:

Screen Shot 2017-06-06 at 01.55.32 Screen Shot 2017-06-06 at 01.56.02 Screen Shot 2017-06-06 at 01.56.10 Screen Shot 2017-06-06 at 00.39.56

Select “Work Network”:

Screen Shot 2017-06-06 at 00.40.51

Download and install Firefox:

 Screen Shot 2017-06-06 at 00.43.13

Download and install Cloudbase-Init:

Screen Shot 2017-06-06 at 00.46.14
Screen Shot 2017-06-06 at 00.46.21 Screen Shot 2017-06-06 at 00.46.46 Screen Shot 2017-06-06 at 00.50.29

Run System Preparation Tool with the settings below:

Screen Shot 2017-06-06 at 02.01.55

Reboot the instance:

 Screen Shot 2017-06-06 at 02.03.56

Login in audit mode:

Create a user account “exogeni” (Administrator).

Configure the user account to log on automatically:
– Run netplwiz
– User Accounts dialog box: Clear “Users must enter a user name and password to use this computer” check box. Enter the user’s password.

Reboot and see automatic login:

Enable Remote Desktop, disable Remote Assistance:

 Screen Shot 2017-06-06 at 02.17.33

Configure Windows Firewall:

Advanced settings: Disable “Network Discovery” rules:

Screen Shot 2017-06-06 at 02.15.23

Create new rule to allow incoming and outgoing ICMP traffic for pinging:
– Rule Type: Custom
– Program: All programs
– Protocols and ports: ICMPv4 – All ports
– Scope: Any IP address for both local and remote IP addresses
– Action: Allow connection
– Profile: Domain, private, public selected
– Name: PING

Configure “restarts” after automatic updates: To complete installation of the network drivers when the VM is launched, enable automatic restart after updates.
– Run gpedit.msc
– Local Group Policy Editor: Local Computer Policy, Computer Configuration, Administrative Templates, Windows Components, Windows Update

Screen Shot 2017-06-06 at 02.45.58
Screen Shot 2017-06-06 at 03.16.37

Shut down the instance. All customizations are saved to the qcow image.

Image creation and deployment

After customizations, qcow2 image needs to be converted to raw image:

qemu-img convert -O raw win7.qcow win7.raw.img
gzip win7.raw.img

Generate the metadata file:

<images>
    <image>
        <type>ZFILESYSTEM</type>
        <signature>3d4013f0ce337fb619747ebed282de374de464e3</signature>
        <url>http://WEBSERVER/image-windows/win7.img.gz</url>
    </image>
</images>

After slice is created, you can connect to the VM by Remote Desktop and assign IP addresses to the dataplane interfaces. Also, activation of the OS needs to be done with a valid licence key.

Screen Shot 2017-06-06 at 10.05.43

Screen Shot 2017-06-06 at 08.42.50 Screen Shot 2017-06-06 at 08.43.12

Scalably Managing ExoGENI nodes using AWS tools

Introduction and Overview

In this post we will explore how to use Amazon AWS tools to scalably manage the infrastructure of your slices. These tools allow you to manage hybrid infrastructures consisting of EC2 instances and external nodes, in our case created in ExoGENI testbed. They allow to perform remote command executions on multiple nodes at once, inventory the state of the nodes – their software, network and other configurations and perform custom tasks, all without differentiating between EC2 nodes and ExoGENI nodes. In fact, it is possible to use these tools only on ExoGENI nodes, without having any EC2 nodes involved. The management tasks can be done from EC2 web console, or using AWS CLI tools, or programmatically, with libraries like Boto. This tutorial concentrates on using the web console and command line tools only.

In this tutorial we will be using several AWS services: EC2, CloudFormation (Infrastructure as Code), IAM (Identity and Access Management) and SSM (Simple Systems Manager). A disclaimer: IAM, SSM and CloudFormation services are included into EC2 pricing, you pay for the EC2 instances you start, S3 storage space and sometimes traffic. That means if you are starting only ExoGENI instances, there should be no costs if you do not use S3 buckets.

Prerequisites:

The tutorial follows the following workflow

  1. Start up AWS stack using CloudFormation. We will create a small ‘slice’ inside AWS with 3 instances
  2. We will demonstrate the use of the SSM Run Command on those instances
  3. We will start an ExoGENI slice, whose instances automatically join SSM
  4. We will demonstrate how to manage EC2 and ExoGENI instances together using the same tools

Starting EC2 stack using CloudFormation

We begin by downloading a CloudFormation template that starts the EC2 side of our experiment. Notice that it isn’t necessary to have EC2 instances to use the Run Command, however in the tutorial we show both EC2 instances and ExoGENI instances.

In our case the stack consists of three hosts, one bastion host with a public IP address and two other hosts in different subnets that communicate with the outside world using a NAT gateway.

screen-shot-2017-01-05-at-3-09-05-pm

The stack can be started using the following command:

$ aws cloudformation create-stack --stack-name GENIStack --template-body file:///path/to/downloaded/geni-vpc.template --parameters ParameterKey=InstanceType,ParameterValue=t2.small,ParameterKey=KeyName,ParameterValue=<Name of your SSH Key Pair> --capabilities CAPABILITY_IAM

There are several important parameters in this command we should discuss:

  • –stack-name GENIStack is the name you are giving this stack. All EC2 instances in the stack will be tagged with this name and you will be able to invoke remote commands on them based on this name
  • –template-body must be a URL of the template you are starting. In this case it is a file on a filesystem. Could also be an S3 object
  • –parameters ParameterKey=InstanceType,ParameterValue=t2.small,ParameterKey=KeyName,ParameterValue=<Name of your SSH Key Pair> specifies in Key/Value tuples several parameters. In this case we specify that our EC2 instances will be t2.small and we must name the SSH key to be used with them. The name should be visible as ‘Key Pair Name’ in EC2 console under Network & Security/Key Pairs
  • –capabilities CAPABILITY_IAM, because this template creates roles inside AWS IAM, it needs special capabilities declared explicitly

While this command is executing you can check the progress of the stack either via AWS CloudFormation web console, or using the CLI:

$ aws cloudformation describe-stacks

While the stack is being created, lets take a look at several elements of the stack template file that are critical to this tutorial.  The template is a JSON file.

Each instance in this stack is tagged with the RunCmdInstanceProfile associated with RunCmdRole IAM role that allows instances in the stack limited privilege to use the SSM service. This is an equivalent of a ‘speaks-for’ in GENI:

 "RunCmdInstanceProfile": {
    "Type": "AWS::IAM::InstanceProfile",
    "Properties": {
      "Path": "/",
      "Roles": [ { "Ref": "RunCmdRole" } ]
    }
 },
 "RunCmdRole": {
    "Type": "AWS::IAM::Role",
    "Properties": {
       "AssumeRolePolicyDocument": {
          "Version": "2012-10-17",
          "Statement": [
          {
             "Sid": "",
             "Effect": "Allow",
             "Principal": {
             "Service": "ec2.amazonaws.com"
          },
          "Action": "sts:AssumeRole"
       }
       ]
    },
    "Path": "/"
    }
 }

Each instance assumes RunCmdInstanceProfile and the role uses a policy RunCmdPolicies that allows for SSM operations (policy omitted for brevity).

Another important aspect is the startup script used by each instance that updates, then starts the SSM agent on startup:

 "IamInstanceProfile": {
    "Ref": "RunCmdInstanceProfile"
    },
 "UserData": { "Fn::Base64" : { "Fn::Join" : ["", [
    "#!/bin/bash -xe\n",
    "cd /tmp\n",
    "echo ", {"Ref": "AWS::Region"}, " > region.txt\n",
    "curl https://amazon-ssm-",{"Ref": "AWS::Region"}, ".s3.amazonaws.com/latest/linux_amd64/amazon-ssm-agent.rpm -o amazon-ssm-agent.rpm\n",
    "sudo yum install -y amazon-ssm-agent.rpm\n",
    "sudo restart amazon-ssm-agent\n"
 ]]}}

Once the stack completes, you should see something like this (adjusted for your parameters):

$ aws cloudformation describe-stacks
STACKS 2017-01-05T16:05:42.658Z False arn:aws:cloudformation:us-east-1:621231197516:stack/GENIStack/cf1a7e80-d360-11e6-ae3f-503f23fb559a GENIStack CREATE_COMPLETE
CAPABILITIES CAPABILITY_IAM
OUTPUTS Primary private IP of host 2 Host2 Private IP 192.168.2.200
OUTPUTS Primary private IP of host 1 Host1 Private IP 192.168.1.26
OUTPUTS Primary public IP of gateway host EIP IP Address 34.196.53.116 on subnet subnet-02b4df2f
PARAMETERS KeyName MyKeys
PARAMETERS InstanceType t2.small

And in CloudFormation console

screen-shot-2017-01-05-at-11-22-11-am

In the EC2 console, when you go down to ‘Systems Manager Shared Resources’ and click on ‘Managed Instances’ you should see three EC2 instances belonging to the stack you just created lit up ‘green’:

screen-shot-2017-01-05-at-11-24-01-am

Notice that the console already offers you a way to run commands on them using the ‘Run Command’ button. Run Command operation is based on a number of pre-existing JSON document templates (SSM Documents) that are selected to run a particular type of a command. AWS classifies the documents as Windows or Linux compatible.

The full list of currently available documents is available via the EC2 console in Systems Manager Shared Resources/Documents or via AWS CLI:

$ aws ssm list-documents

You can click on the Run Command button and select an ‘SSM Document’ that is a template for the command. In this case we want to run a command-line ‘ifconfig -a’, so select ‘AWS-RunShellScript’ document and fill out the form (select all instances and in the command space enter ‘ifconfig -a’). Run the command. SSM issues a GUID corresponding to this command and you can inspect the output by clicking on the GUID and looking at command output for each instance. SSM is asynchronous and you need to wait for command completion on individual instances to see the output.

AWS keeps the full history of Run Command invocations, previous invocations can be explored in the EC2 console under ‘Systems Manager Services/Run Command’ with commands listed by date and guid.

screen-shot-2017-01-05-at-11-36-23-am

We can achieve the same results from AWS CLI by doing the following:

$ aws ssm send-command --instance-ids i-0b387c665628f5f9b i-02e2a213adfa03bab i-0e6edec45f97ede23 --document-name "AWS-RunShellScript" --comment "IP config" --parameters commands=ifconfig --output text

The above command explicitly names EC2 instances on which the command needs to be executed. Alternatively you can use this form:

$ aws ssm send-command --targets "Key=tag:aws:cloudformation:stack-name,Values=GENIStack" --document-name "AWS-RunShellScript" --comment "IP config" --parameters commands=ifconfig --output text

Notice that in this case we match instances by the name of the CloudFormation stack we gave above.

The output of the command can be examined using

$ aws ssm list-command-invocations --command-id <guid of the command invocation returned by the previous command> --details

There are other commands under aws ssm toolset, you should be free to explore them (do ‘aws ssm help‘).

Starting ExoGENI slice with instances connected to AWS SSM

In this section we will add ExoGENI instances to the list of instances managed via SSM. Before you start the slice you must create a special kind of credential for your instances to be able to talk to AWS SSM.

We begin by creating a new role we will call SSMServiceRole to provide SSM credentials to hybrid (non-EC2) instances. First we must create a JSON trust file that allows principals assume that role (cut and paste the contents and call it SSMService-Trust.json):

{
 "Version": "2012-10-17",
 "Statement": {
 "Effect": "Allow",
 "Principal": {"Service": "ssm.amazonaws.com"},
 "Action": "sts:AssumeRole"
 }
}

Use the file to create the role:

$ aws iam create-role --role-name SSMServiceRole --assume-role-policy-document file://SSMService-Trust.json

Associate a standard (managed) AWS policy AmazonEC2RoleforSSM with this role that allows SSM operations:

$ aws iam attach-role-policy --role-name SSMServiceRole --policy-arn arn:aws:iam::aws:policy/service-role/AmazonEC2RoleforSSM

If you were paying attention, you might ask “Didn’t we create a similar role with the template for EC2 instances?” and you’d be right. However at present it doesn’t appear possible to use the role created as part of CloudFormation stack outside the stack.

You can inspect existing roles in your account by using AWS IAM web console and clicking on ‘Roles’ or by executing a CLI command:

$ aws iam list-roles

In this case the command should show two roles – one created by CloudFormation, and the other the role we created just now.

Now we must create temporary tokens for SSM agent on your ExoGENI instances to access the SSM service. Notice keyword is ‘temporary’ as in they have an expiration date past which the instances will not be able to communicate with SSM. Because of that, it is critical that there is only a minimal time skew between your ExoGENI instances and AWS (more on that below). Each token is associated with some number of ‘registrations’ – nodes in your slice that can use SSM (default is 1) and an expiration date (default 24 hours). We create a new activation for our slice:

$ aws ssm create-activation --default-instance-name MyXoServers --iam-role SSMServiceRole --registration-limit 10 --expiration-date 2017-01-10T20:30:00.000Z

Examining parameters above:

  • Default instance name will be a string by which your instances are known in the SSM (each will also be issued a unique instance identifier)
  • We must include the role we defined above in the activation
  • Define the max number of nodes you plan to have in your ExoGENI slice by registration limit
  • Define the expiration date (in this case using UTC)

The command provides two strings, the first one a code (20 characters), the second, a registration ID. Both are needed by the SSM agent in your ExoGENI instances to authenticate to SSM. Now we’re ready to start the ExoGENI slice.

When starting an ExpGENI slice, you can use any topology on any rack or controller, so long as you include the following post-bootscript for each instances you want managed via AWS SSM. Notice this script is for CentOS 6.X images, you may need to adapt it to Debian derivatives and systemctl-based RedHat-like distributions.

#!/bin/bash

SSMDIR=/tmp/ssm
AWSREGID="<provide the activation registration guid>"
AWSCODE="<provide the activation code>"
AWSREGION=us-east-1
NTPSERVER=clock1.unc.edu

# NO NEED TO EDIT BELOW FOR CentOS 6.x

ntpdate ${NTPSERVER} > /dev/null
/etc/init.d/ntpd restart 
mkdir ${SSMDIR}
curl https://amazon-ssm-${AWSREGION}.s3.amazonaws.com/latest/linux_amd64/amazon-ssm-agent.rpm -o ${SSMDIR}/amazon-ssm-agent.rpm
yum install -y ${SSMDIR}/amazon-ssm-agent.rpm 
stop amazon-ssm-agent
amazon-ssm-agent -register -id ${AWSREGID} -code ${AWSCODE} -region ${AWSREGION}
start amazon-ssm-agent

This file downloads SSM agent on boot, provides it with the credentials to AWS SSM service acquired in the previous step and restarts it. Notice the invocation of ntpdate – it is critical for the operation of SSM that the clocks in the instances are relatively true. If you have a significant clock skew, the agent on the instance will fail to connect to SSM.

Define a slice  topology in Flukes and be sure to cut and paste a modified version of the post-boot script above into each node you intend to manage via AWS. Notice that in addition to the code and registration id, you may beed to modify the AWS region, depending on the setting in your AWS account.

You can watch your slice come up in Flukes, but also use the EC2 Systems Manager Shared Resources/ Managed Instances console to see managed instances in your slice come up and go green. Note that node names given in Flukes show up in the console as ‘Computer Name’; also note that each ExoGENI instance also receives a unique instance ID starting with ‘mi’. Finally, note that the IP address reported for all instances (EC2 and ExoGENI) is the private address assigned to the management interface eth0.

screen-shot-2017-01-05-at-1-52-37-pm

Using Tools to Manage the Hybrid Infrastructure

Now that we have a ‘slice’ of EC2 and an ExoGENI slice that respond to AWS management tools, we can demonstrate some of the capabilities.

First off, just like in example above we can issue random commands to multiple instances in a scalable fashion, as shown above for EC2, but now we can use the AWS-issued instance IDs for our ExoGENI instances to name them. First we can list all managed instances using AWS CLI:

$ aws ssm describe-instance-information

Using instance IDs reported above we can craft a command to send to all instances:

$ aws ssm send-command --instance-ids <space separated list of instance ids from EC2 and your slice> --document-name "AWS-RunShellScript" --comment "IP config" --parameters commands=ifconfig --output text

You can check on the status of the invocations (if they completed successfully):

$ aws ssm list-command-invocations --command-id <guid of command id returned by previous command>

If you want to see the output, add –details to the previous command. You can also do

$ aws ssm get-command-invocation --command-id <guid of command id> --instance-id <id of the instance>

to inspect status of individual invocations on nodes.

We can also take inventory (software, network configuration) of the nodes and have it refresh periodically. The inventory can be visible in the web console and be saved into S3 bucket (costs will apply). This can be done from the console by clicking on ‘Setup Inventory’ button in the managed instances list. We will demonstrate doing it via CLI here. Unlike the per-command invocation shown above, inventory requires creating an association between an SSM inventory document and instances with a cron schedule so it periodically refreshes its content:

$ aws ssm create-association --name AWS-GatherSoftwareInventory --targets  "Key=instanceids,Values=<comma separated list of instance ids>" --schedule-expression "cron(0 0/30 * 1/1 * ? *)" --parameters networkConfig=Enabled,windowsUpdates=Disabled,applications=Enabled

This step takes a while (10 minutes or more) to complete, you can see in EC2 console under Managed Instances (by clicking on the instance) the state of the association. Once it completes, the inventory becomes available to view.

You can also see the state of existing associations by executing

$ aws ssm list-associations

Note that each association has a unique guid, which can be used to query for the state of association:

$ aws ssm describe-association --association-id <association guid>

After the association completes successfully we can query for inventory of the nodes:

$ aws ssm list-inventory-entries --instance-id <one of instance ids above> --type-name <inventory type>

Inventory type name is one of the following strings:

  • AWS:Application – lists installed packages
  • AWS:Network – lists interface configuration
  • AWS:AWSComponent – lists installed AWS components on the instance (typically SSM agent)
  • Other types are Windows-specific.

Conclusion

This tutorial demonstrated how to use AWS remote management tools to jointly manage EC2 and ExoGENI instances. Some of this functionality, particularly the remote execution, can be done in other ways, however the AWS approach offers several advantages:

  • Asynchronous event-driven nature makes it significantly more scalable, compared to typically serial execution of commands via remote shell (you can use psh though to speed things up)
  • Historical information about commands saved in AWS for review to ensure experiment progress log and repeatability
  • Comprehensive software inventory (also possible to keep history) per instance
  • Programmatic API available for scripting via Boto

This concludes the tutorial, the two following sections have suggestions on troubleshooting and next steps.

Troubleshooting

  • SSM agent behavior on the instances is logged under /var/log/amazon/ssm
  • If you run out of activations or your activation for SSM agent in ExoGENI nodes expires, you can create a new activation, and configure the ssm agent on each node with new credentials following the flow of the ExoGENI post-boot script above.
  • If you get stuck being unable to specify a particular CLI parameter, check this page.

Things to explore further

  • Programmatic API implementations, like Boto
  • Implementing new SSM command documents specific to your experiment

Using ExoGENI Slice Stitching capabilities

This blog entry shows how to stitch slices belonging to potentially different users together. The video demonstrates the workflow and a short discussion below outlines limitations of the current implementation.

Several API operations have been introduced to support slice-to-slice stitching:

  1. permitSliceStitch – informs controller that the owner of this reservation (node or VLAN, see below) is allowing stitching of other slices to this reservation using a specific password
  2. revokeSliceStitch – the inverse of permit, removes permission to stitch to any other slice. Existing stitches are not affected.
  3. performSliceStitch – stitch a reservation in one slice to a reservation in another slice using a password set by permitSliceStitch
  4. undoSliceStitch – undo an existing stitch between two reservations
  5. viewStitchProperties – inspect whether a given reservation allows stitching and whether there are active or inactive stitches from other slices
    1. provides information like slice stitched to, reservation stitched to, DN identifier of the owner of the other slice, when the stitch was performed and if unstitched, when.

Caveats and useful facts:

  • Slices that you are attempting to stitch must be created by the same controller
  • The reservations that you are trying to stitch together must be on the same aggregate/rack
  • You cannot stitch pieces of the same slice to itself
  • Only stitching of compute nodes (VMs and baremetal) to VLANs/links is allowed. This will not work for shared vlans or links connecting to storage (those aren’t real reservations in the ORCA sense)
  • Stitching is asymmetric (one side issues permit, the other side performs the stitch)
  • Unstitching is symmetric (either side can unstitch from the other, password is not required)
  • Each stitching operation has a unique guid and this is how it is known in the system
  • An inactive stitch is distinguished from an active stitch by the presence of ‘undone‘ property indicating date/time in RFC3399 format when the unstitching operation was performed
  • Passwords (bearer tokens) used to authorize stitching operation are stored according to best security practices (salted and transformed). They are meant to be communicated between slice owners out-of-band (phone/email/SMS/IM/pigeons).
  • Stitching to point-to-point links is allowed, however keep in mind that if you used automatic IP assignment, the two nodes that are the endpoints of the link will have a /30 netmask, which means if you stitch more nodes into that link, they won’t be able to communicate with existing nodes until you set a common broader netmask on all nodes
    • Note that ORCA enforces IP address assignment by guest-side neucad tool running in the VM. If you are reassigning IP address manually on the node, remember to kill that daemon first, otherwise it will overwrite your changes.

Known Limitations:

  • No NDL manifest support. Stitching is not visible in the manifest and can only be deduced by querying a node or a link for its stitching properties
    • Consequently no RSpec manifest support
  • No GENI API support

Securing Your Slice – Part 1

This post is intended to outline the fundamental security precautions for the virtual and baremetal servers that are launched on ExoGENI testbed.

ExoGENI provides experimenters the ability to use their own images when creating slices and launching virtual machines. Testbed infrastructure is well-isolated from the slivers, providing a lot of flexibility to the experimenters on virtual and baremetal servers. At the same time, since individual slivers are created with interfaces on hosting campus IP networks, each virtual or baremetal server should be carefully administered during or after slice creation to ensure proper security measures are taken.

Security is a highly complicated area of concern in server administration. However, we want to outline the fundamental measures for protection of the virtual or baremetal servers which have public internet access. These measures should be added on top of the default OS installation to restrict and control access to the servers, minimize the risk of being a vulnerable hot spot within the campus network.

In this post, we will describe minimally necessary security measures for a linux server regardless of the flavor (CentOS, Ubuntu, Debian etc.), and provide example scripts/commands which can be used during slice creation. (These scripts/commands are based on CentOS 6 distribution and equivalent commands need to be gathered for Ubuntu, Debian, etc.)

“Post Boot Scripts” feature of ORCA can be used to inject security configuration to the nodes or node groups. Details about Post Boot Scripts and templates on this page can be considered as a valuable resource for configuring the virtual or baremetal servers as well as rich scripting and templating capabilities of ExoGENI.

 

1. Servers should include up-to-date packages

After the VM is created, packages can be updated by using a portion of postboot script below:

yum -y update

For kernel updates, rebooting the server is needed. However, VM nodes on ExoGENI testbed are not safely rebootable. Because of some complexities introduced by the virtualization infrastructure, and udev schemes, connectivity with the virtual machines cannot be ensured after rebooting. We do not suggest rebooting the servers on ExoGENI testbed until we implement network device. This will be explained in the following posts. Rebooting may still be needed for system library updates such as glibc. If there is a security concern and an update is needed for such packages, then updating the image, saving it and using that image for slice creation may be necessary.

It is a best practice to upload updated image files (kernel, ramdisk, filesystem) to the web-server and boot up the virtual machines by using the up-to-date images.

 

2. User Authentication

SSH public keys are injected into the virtual or baremetal servers during slice creation. Besides, password authentication for remote root login through SSH should be disallowed by editing the line shown below in sshd_config file:

PermitRootLogin without-password

sshd_config file can be updated by using a portion of postboot script below:

sed -i 's/\#PermitRootLogin yes/PermitRootLogin without-password/g' /etc/ssh/sshd_config
service sshd reload

 

3. Firewall configuration

Inbound and outbound traffic can be controlled by a firewall such as iptables.

Iptables utilizes built-in tables (mangle, filter, nat) to process packets. Each table has a group of chains that represent the actions to be performed on the packet. Built-in chains in filter table are INPUT, OUTPUT and FORWARD. Rules are added to the chains. A packet is checked against each rule in turn (from top to the bottom). An action is taken to ACCEPT or DROP the packet that matches the rule. No further processing is done after a matching rule is found and packet is processed (order of the rules is significant). If a packet passes down through all of the rules in the chain and reaches the bottom without being matched against any rule, then the default policy for that chain is applied to the packet.

Hardening a server with iptables should be taken seriously. IP addresses or networks (both source and destination) as well as ports should be specified to allow traffic for the required connections and reject all other traffic. Default firewall policy and some kernel parameters need to be adjusted, too. Although many useful resources about hardening servers and firewall configuration can be found on the internet, we will prepare a dedicated post to elaborate on this topic for a baseline configuration that can be used for most of the slices. This page can be used as a good starting point to learn about iptables.

Below, we provide a basic set of rules that allows all outgoing connections and blocks all unwanted incoming connections:

# Flush all existing rules
iptables –F

# Set default policy on each chain
iptables -P INPUT DROP
iptables -P FORWARD DROP
iptables -P OUTPUT ACCEPT

# Accept incoming packets destined for localhost interface
iptables -A INPUT -i lo -j ACCEPT

# Accept incoming packets that are part of/related to an already established connection
iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT

# Accept incoming packets for SSH connections
iptables -A INPUT -p tcp --dport 22 -j ACCEPT

# Accept incoming packets that belong to icmp protocol
iptables -A INPUT -p icmp -j ACCEPT

If no firewall rules are present, then a basic set of rules can be created and activated using a portion of postboot script below:

iptables –F
iptables -P INPUT DROP
iptables -P FORWARD DROP
iptables -P OUTPUT ACCEPT
iptables -A INPUT -i lo -j ACCEPT
iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -A INPUT -p tcp --dport 22 -j ACCEPT
iptables -A INPUT -p icmp -j ACCEPT
service iptables save

Firewall rules can be checked as below:

-bash-4.1# iptables -nvL
Chain INPUT (policy DROP 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 ACCEPT     all  --  lo     *       0.0.0.0/0            0.0.0.0/0           
   38  4071 ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0           state RELATED,ESTABLISHED 
    0     0 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0           tcp dpt:22 
    0     0 ACCEPT     icmp --  *      *       0.0.0.0/0            0.0.0.0/0           

Chain FORWARD (policy DROP 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain OUTPUT (policy ACCEPT 24 packets, 2254 bytes)
 pkts bytes target     prot opt in     out     source               destination     

 

4. Network services access control

Controlling access to network services can be an important task to provide a balance between flexibility and security within the research-oriented test environment. TCP wrappers is a mechanism used to allow or deny hosts access to the network services. Access files (/etc/hosts.allow and /etc/hosts.deny) are used to determine whether a client is allowed or not. Details on this page can be considered as a valuable resource for configuration.

One common use-case is to restrict access to the portmap service when NFS is being utilized within the slice. Since NFS relies on portmap service which is a dynamic port assignment daemon for RPC services, information about the running services is revealed and can be obtained by an “rpcinfo” request from the servers.

It is critical to restrict access to portmap service through the public interface, and allow access only from the data-plane network. Also, data-plane IP addresses and not public IP addresses should be used in /etc/hosts, /etc/exports, /etc/fstab and other relevant files for NFS configuration.

Steps below should be taken to configure an NFS server:

– On the NFS server host, add the line below to /etc/hosts.deny to allow “rpcinfo” queries only from the data-plane network

rpcbind: ALL EXCEPT <DATAPLANE NETWORK>

Example:

rpcbind: ALL EXCEPT 172.16.0.0/255.255.0.0

– On NFS clients, there is no need to run rpcbind service, and it can be disabled.

Network service access rule can be added by using a portion of postboot script below (Note that data-plane network should be replaced with the network within the slice):

cat << EOF >> /etc/hosts.deny
rpcbind: ALL EXCEPT 172.16.0.0/255.255.0.0
EOF

– Check RPC information that NFS server reveals from the client

# No RPC information is returned for the query from the public interface of the NFS server (192.1.242.62)

bash-4.1# rpcinfo -p 192.1.242.62
rpcinfo: can't contact portmapper: RPC: Authentication error; why = Client credential too weak

# RPC information is returned for the query from the data-plane interface of the NFS server (172.16.0.5)

-bash-4.1# rpcinfo -p 172.16.0.5
   program vers proto   port  service
    100000    4   tcp    111  portmapper
    100005    1   udp  54381  mountd
    100003    3   tcp   2049  nfs
    100227    3   udp   2049  nfs_acl
    100021    1   udp  45462  nlockmgr
    (Sample output shown. Some output omitted)

It is evident that all ExoGENI Testbed users are well aware of the importance of security. A major concern is that security-related problems trigger some restrictions and degradations on the infrastructure. Proposed measures should be applied to every slice as a primary task. Taking precautions to secure your slices will accommodate a rather secure environment both for your precious data/work and the rest of the world.

 

Appendix:

#!/bin/bash

sed -i 's/\#PermitRootLogin yes/PermitRootLogin without-password/g' /etc/ssh/sshd_config
service sshd reload

iptables –F
iptables -P INPUT DROP
iptables -P FORWARD DROP
iptables -P OUTPUT ACCEPT
iptables -A INPUT -i lo -j ACCEPT
iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -A INPUT -p tcp --dport 22 -j ACCEPT
iptables -A INPUT -p icmp -j ACCEPT
service iptables save

cat << EOF >> /etc/hosts.deny
rpcbind: ALL EXCEPT 172.16.0.0/255.255.0.0
EOF

yum -y update