user's guide pleiades (whep users)
First Login and annual password renewal
Before login in to any machine, you will need to change the initial password:
With a web browser (e.g. Firefox, from inside the university) open the web site
Log in with your username and password. Then click "Actions" → "Reset Password" and fill the popup-form to change your password.
There are two login machines running CentOS 7 from which the cluster can be operated. These are:
And there are still three login machines left running CentOS 6:
These nodes can be used to develop and test code and once this is finished jobs can be submitted to the pleiades cluster. You can login on them using your username, which will be provided by us. The whole cluster runs CentOS 7.
SSH access is open from any IP, however, a protection system is used that blocks IP numbers which have been used with several unsuccessful logins. So if you mistype your credentials too often, you will be locked out for a while.
A good practice for using ssh regularly from the same machine is to setup ssh-keys on your local machine and use
ssh-copy-id USERNAME@higgs.pleiades.uni-wuppertal.de to enable a key-based login on the frontend.
If you leave your shell open more than 24hours, your kerberos token will expire and you will loose write access to your files. To renew your token without login out, type "kinit -f" and enter your password at the prompt.
A lustre cluster file system is installed on the login machines and on all cluster worker nodes, i.e. the file system is shared among all nodes and can be used to develop code and to save output files from cluster jobs. In general, no „copy constructions“ are needed. A group quota according to the share of each participating group has been applied. If needed additional user quotas can also be applied. You find your home directory at:
if you need real local space on the worker nodes, use „/tmp“, but please clean up inside your jobs scripts, otherwise you will overload the nodes.
There is no backup for the /lustre file system. However, this file system is running on raid systems.
The batch system needs passwordless ssh access from the worker nodes to the head nodes and vice versa. So please generate a passwordless ssh key pair once after the first login (this has to be done only once!) by the following commands:
(in your /lustre/username home directory)
ssh-keygen -t rsa -N ''
Press ENTER at the following questions.
cat id_rsa.pub > authorized_keys
There are two different NIS servers for the home directories and the cluster. Hence, you need to have the same files in
/common/home/USERNAME/.ssh and /lustre/USERNAME/.ssh
This can easily be achieved by copying the whole directory.
Batch system usage
On the cluster runs the Grid Engine batch system. A fair share according to the share of your group has been assigned.
Because only the home directory in
/lustre/USERNAME is shared between frontend and worker nodes, make sure the program you whish to call is located there, otherwise the worker nodes might not be able to find it.
To submit jobs to the cluster you need to prepare shell scripts. These could look for example like:
./program -option xyz
In this shell script you should:
- cd into the program directory,
- initialize your environment (load required libraries, modify $PATH, etc),
- execute the program
- clean up (delete temp files etc).
These shell script needs to be executable.
The final submission to the cluster then is done by:
qsub -q all.q myjob.sh
The status of the jobs you can check with
The .out and the .err files of the submitted jobs will be written to your home directory:
They are called < scriptname.sh>.e<jobnumber> and <scriptname.sh>.o<jobnumber> for stderr and stdout respectively.
There is an installation of openmpi on the system available to all users. The program mpicc can be invoked directly from the terminal. The program mpirun should be executed within the submit script. In order for the batch system to assign your job to more than one node, you need to add
#PBS -l nodes=<nodecount>:ppn=<procs-per-node>
to the submit script.
At the moment mpi will complain at the beginning of its output, that it cannot find a relevant network interface, and therefore will be communicating slower than usual.
different compiler versions using singularity
The container management program singularity is installed on the system. If you need a different compiler version in order to be able to compile your program, you can download them as an image from the docker hub by using the command
singularity pull docker://gcc:<version>
This will create a .sif file in your current directory. You can then use the command
singularity shell <your-.sif-file>
to get an interactive shell using the specified compiler version. Compile your program in the way you need to and log out of the container in the usual way.