Cluster Management in VisLab: Difference between revisions

From ISRWiki
Jump to navigation Jump to search
(add sections)
Line 1: Line 1:
Cluster Manager is a python-based GUI that lets the user check and influence the status of "yarp run" on a cluster of computers.
Cluster Manager ([http://eris.liralab.it/wiki/Cluster_management description] on LIRA-Lab wiki) is a Python-based GUI that lets the user check and influence the status of "yarp run" on a cluster of computers.


This [http://eris.liralab.it/wiki/Cluster_management] is the link to the page on Cluster Manager in the LiraLab wiki.
== Usage and configuration of Cluster Manager ==


To run Cluster Manager, just go to $ICUB_DIR/app/default/scripts and run
To run Cluster Manager, just go to $ICUB_DIR/app/default/scripts and run
   ./icub-cluster.py  
   ./icub-cluster.py  
by default, it reads configuration information from $ICUB_DIR/app/default/icub-cluster.xml (I GUESS, NOT SURE).
by default, it reads configuration information from $ICUB_DIR/app/default/icub-cluster.xml (NOT SURE). We changed it to read configuration information from $ICUB_DIR/app/default/vislab-cluster.xml (TODO: make it so that we don't need to change icub-cluster.py, possibly by passing the XML filename as a command-line parameter).
We changed it to read configuration information from $ICUB_DIR/app/default/vislab-cluster.xml.


We had to write these two lines to ~/.bash_env on the Cortex machines (user icub) and on Chico2 in order to have remote execution with ssh to work:
We had to write these two lines to ~/.bash_env on the Cortex machines (user icub) and on Chico2 in order to have remote execution (i.e., non-interactive) with ssh to work:
   export ICUB_DIR=/home/icub/iCub
   export ICUB_DIR=/home/icub/iCub
   export ICUB_ROOT=$ICUB_DIR
   export ICUB_ROOT=$ICUB_DIR


We had to create a /tmp/run directory on the Cortex machines (each one of them: these directories are different for each one), belonging to user icub (it should be writable by anyone, but I guess it's not at the moment), so that "yarp run" could run on those machines.
We had to create a /tmp/run directory on the Cortex machines (each one of them: these directories are different for each machine in the cluster!), belonging to user icub (it should be writable by anyone, but it probably isn't at the moment), so that "yarp run" could run there. I have no clue why this is needed: on pc104, such directory doesn't seem to exist.
I have no clue why this is needed: on pc104 such directory doesn't seem to exist.
WARNING: the content of this directory might disappear when we restart the cortex computers, possibly causing problems.
WARNING: the content of this directory might disappear when we restart the cortex computers, possibly causing problems.


The scripts yarprun.sh assumes every machine has a unique name, obtainable with the command: "uname -n".
== yarprun.sh script ==
 
The script $ICUB_DIR/scripts/yarprun.sh assumes that every machine has a unique name, obtainable with the command: "uname -n".
 
In Cortex it's not like that: you get "source" on all of the 5 cortex computers.
In Cortex it's not like that: you get "source" on all of the 5 cortex computers.
So we copied the script to yarprunVislab.sh and changed a line from:
So we copied the script to yarprunVislab.sh and changed a line from:

Revision as of 14:12, 12 May 2009

Cluster Manager (description on LIRA-Lab wiki) is a Python-based GUI that lets the user check and influence the status of "yarp run" on a cluster of computers.

Usage and configuration of Cluster Manager

To run Cluster Manager, just go to $ICUB_DIR/app/default/scripts and run

 ./icub-cluster.py 

by default, it reads configuration information from $ICUB_DIR/app/default/icub-cluster.xml (NOT SURE). We changed it to read configuration information from $ICUB_DIR/app/default/vislab-cluster.xml (TODO: make it so that we don't need to change icub-cluster.py, possibly by passing the XML filename as a command-line parameter).

We had to write these two lines to ~/.bash_env on the Cortex machines (user icub) and on Chico2 in order to have remote execution (i.e., non-interactive) with ssh to work:

 export ICUB_DIR=/home/icub/iCub
 export ICUB_ROOT=$ICUB_DIR

We had to create a /tmp/run directory on the Cortex machines (each one of them: these directories are different for each machine in the cluster!), belonging to user icub (it should be writable by anyone, but it probably isn't at the moment), so that "yarp run" could run there. I have no clue why this is needed: on pc104, such directory doesn't seem to exist. WARNING: the content of this directory might disappear when we restart the cortex computers, possibly causing problems.

yarprun.sh script

The script $ICUB_DIR/scripts/yarprun.sh assumes that every machine has a unique name, obtainable with the command: "uname -n".

In Cortex it's not like that: you get "source" on all of the 5 cortex computers. So we copied the script to yarprunVislab.sh and changed a line from:

 ID=/`uname -n`

to:

 ID=/`uname -n`
 if [ $ID == "/source" ];
  then
    ID=/cortex` ifconfig eth0 | grep "inet addr" | awk '{print $2}' | awk -F: '{print $2}' | awk -F. '{print $4}'`;
 fi;

of course, we had to make a copy of $ICUB_DIR/scripts/icub-cluster.py to $ICUB_DIR/scripts/icub-clusterVislab.py and change all the invocation of yarprun.sh to yarprunVislab.sh in the latter.

and we had to copy the yarprunVislab.sh on all of the machines (Chico2, pc104). to copy it to pc104, we actually had to copy it to icubsrv, in the correct location (see pc104):

 scp yarprunVislab.sh 10.10.1.51:/exports/code-pc104/iCub/scripts/

it would be easier if we could have the cortex return "cortex1", etc., instead of source, when uname -n is run.

STATUS: it is working: from Chico2 we can control "yarp run" on Chico2, pc104, Cortex1..5. We didn't bother to have it running on Cortex6 nor icubsrv as those computers are very rarely used.