Cluster Management in VisLab/Archive

From ISRWiki
Jump to navigation Jump to search

Some obsolete information is kept here, for the sake of history.

Cortex and the uname command

Prior to 2009-05-12, the command

  uname -n

(which normally outputs the unique, alphabetic name of the current machine you are logged in) was not working as we wanted on Cortex. This happened because all the machines in the cluster share the same network disk. In particular, the file /etc/hostname is also shared among all of them; it contains the string "source", which is the real result of "/bin/uname -n" but it is of little information for our "yarp run" script, which sometimes needs to answer the question "where am I?".

Forcing uname to give the desired output

Since we want each machine of the cluster to provide its unique name (i.e., cortex[1..5]) as the output of "uname -n", we can use the following command to do the job:

  ifconfig eth0 | grep "inet addr" | awk '{print $2}' | awk -F: '{print $2}' | awk -F. '{print $4}'

Basically, this command extracts the last byte from the current machine IP address. For example, if you type it on cortex3 you will get "3" (the last byte of 10.10.1.3) as output.

The idea is to enforce that custom command as a system-wide behaviour in the cluster, by setting an alias called "uname -n" (to be more accurate, the alias is just called "uname", with every additional parameter to be ignored).

We added the following lines in user icub's ~/.bashrc:

  shopt -s expand_aliases
  alias uname="echo -n cortex; ifconfig eth0 | grep 'inet addr' | awk '{print \$2}' | awk -F: '{print \$2}' | awk -F. '{print \$4}'; echo > /dev/null"

The shopt line makes aliases valid in non-interactive shells. The uname alias means this:

  • "echo -n cortex;" prints the word cortex without newline
  • "ifconfig eth0 | grep 'inet addr' | awk '{print \$2}' | awk -F: '{print \$2}' | awk -F. '{print \$4}'" extracts the last byte of the current machine IP address;
  • "echo > /dev/null" ignores everything that is typed after "uname", such as "-n"

Old workaround

Before 2009-05-12 we were customizing $ICUB_DIR/scripts/yarprun.sh and saving our own version as yarprunVislab.sh. Since "uname -n" outputted "source" on all of the 5 Cortex computers, we changed the line

 ID=/`uname -n`

to:

 ID=/`uname -n`
 if [ $ID == "/source" ];
  then
    ID=/cortex` ifconfig eth0 | grep "inet addr" | awk '{print $2}' | awk -F: '{print $2}' | awk -F. '{print $4}'`;
 fi;

of course, we had to make a copy of $ICUB_DIR/scripts/icub-cluster.py to $ICUB_DIR/scripts/icub-clusterVislab.py and change all the invocation of yarprun.sh to yarprunVislab.sh in the latter.

Besides, we had to copy the yarprunVislab.sh on all of the machines (Chico2, pc104). To copy it to pc104, we actually had to copy it to icubsrv, in the correct location (see pc104):

 scp yarprunVislab.sh 10.10.1.51:/exports/code-pc104/iCub/scripts/

This method, while cumbersome, was working: from Chico2 we could control "yarp run" on Chico2, pc104, Cortex1..5. We didn't bother to have it running on Cortex6 nor icubsrv, as those computers are very rarely used.