Technical Documentation

  1. Getting Started
  2. Software Modules
  3. Building a job script
  4. Submitting a job
  5. Interactive jobs
  6. Monitoring jobs
  7. Compiling Programs
  8. Lustre usage and Benefits

 

  1. Getting started
    • Basic linux
      The user should be comfortable working in the LINUX operating system environment. You should know how to list files and directories, copy and move files, displaying and searching contents of files, set environment variables, access rights and so on.
      Some examples –

      • cd – cd directory_name means changing the current working directory to directory_name
      • cat – cat text.txt displays the contents of the file text.txt
      • grep – grep ‘keyword’ filename searches filename for keyword
      • chmod – chmod go-wx filename removes write and execute permissions on filename for group and others

      Refer to this tutorial for better understanding.

    • Requesting account
      To use our services you must first create an account with us. Please submit a formal request from this page New user creation
    • Connection to cluster
      1. Use an SSH client to connect to the service. For Unix, Linux, and OS X systems, run “ssh login-1.extreme.uic.edu -l <netid>” (where <netid> is your netID, and extreme is the cluster name). To securely transfer files between two remote servers, SCP is used. If you run Windows and do not have SSH and SCP clients, you may download PuTTY and PSCP for free here.
      2. Access using Windows based systems(using X11 Forwarding)-
        “X forwarding” is a feature of X where a graphical program runs on one computer, but the user interacts with it on another computer. All you need is an X server that runs on Windows, and an SSH client, both of which are freely available.
        Download Xming and Xming fonts to get started with installing an X server.
        Install PuTTY.
        Follow these steps to configure PuTTY,
        1. Enter the hostname you want to connect to: login-1.extreme.uic.edu on port 22. Make sure the connection type is SSH.(extreme is the cluster name here)
        2. Scroll to Connection > SSH > X11. Check the box next to Enable X11 Forwarding. By default the X Display location is empty. You can enter localhost:0. The remote authentication should be set to MIT-Magic-Cookie-1.
        3. Finally go back to Session. You can save your session too, and load it each time you want to connect.
        4. Click Open to bring up the terminal and login using your netid/password .
  2. Software modules
    • Available softwares
      The list of all available softwares can be seen here. Type in module avail to see the list from your bash terminal
    • How to load or unload a software/module
      Use module load in your bash terminal.
      Naming conventions that will be useful are added with the name of the software.
      If intel is added then load the compilers/intel module along with the software package. If a version of python is added then load it from the compilers. Likewise, these provide you with dependencies that are required for the proper functioning of the software package.
      To unload a package use module unload in your bash terminal.
      To list out the current modules you are working on, use module list.
    • New Software installation request
      Please check the list of available software before sending a request. To submit a request for installation, please submit a formal request through this page. Also provide instructions and download links to the package.
  3. Job management basics
    We employ Moab and Torque to manage and control jobs on the cluster. Cluster Resources Moab Workload Manager, however Torque is used as the backend resource manager

      • Submission
        Building a job script

        • Just like other batch systems, users submit jobs to Moab for scheduling by means of a job script
        • A job script is a plain text file that you create with your favorite editor.
        • Job scripts can include any/all of the following:
          • Commands, directives and syntax specific to a given batch system
          • Shell scripting
          • References to environment variables
          • Names of executable(s) to run
          • Comment lines and white space
        • Moab supports the following types of batch job scripts:
          • LSF (Platform Computing)
          • PBS (Altaire)
          • LoadLeveler (IBM)
          • Torque (Moab – very similar to PBS)
          • SSS XML (Scalable Systems Software Job Object Specification)
        • Recommendation: Always include your preferred shell as the first line in your batch script. Otherwise, you may inherit the default /bin/sh shell. For example:
          #!/bin/csh
          #!/bin/tcsh
          #!/bin/ksh
        • Batch scheduler syntax is parsed upon job submission. Shell scripting is parsed at runtime. Therefore, it is entirely possible to successfully submit a job to Moab that has shell script errors which won’t cause problems until later when the job actually runs.
        • Submitting binary executables directly (without a script) is not advised.
        • A basic Moab job control script appears below.

          #!/bin/csh
          ##### These lines are for Moab
          #PBS -l nodes=16
          #PBS -l partition=cab
          #PBS -l walltime=2:00:00
          #PBS -q pbatch
          #PBS -m be
          #PBS -V
          #PBS -o /p/lscratchb/joeuser/par_solve/myjob.out
          <p>##### These are shell commands
          date
          cd /p/lscratchb/joeuser/par_solve
          srun -n128 a.out
          echo ‘Done’

        • NOTE: All #PBS lines must come before shell script commands.

        Submitting a job

          • The qsub command is used to submit your job script to Moab. Upon successful submission, Moab returns the job’s ID and spools it for execution. For example:

            % qsub myjobscript

            226783

            % qsub -q pdebug myjobscript

            227243

          • The qsub command has a number of options that can be used in either your job script or from the command line. Some of the more common/useful options are shown below – see the Moab documentation for a full discussion.
          • PBS options
            Script Command line Description notes
            #PBS -a -a Declares the time after which the job is eligible for execution.
            Syntax: (brackets delimit optional items with the default being current date/time):[CC][YY][MM][DD]hhmm[.SS]
            #PBS -A account -A account Defines the account associated with the job.
            #PBS -d path -d path Specifies the directory in which the job should begin executing.
            #PBS -e filename -e filename Defines the file name to be used for stderr.
            #PBS -h -h Put a user hold on the job at submission time.
            #PBS -j oe -j oe Combine stdout and stderr into the same output file. This is the default. If you want to give the combined stdout/stderr file a specific name, include the -o path flag also
            #PBS -l string -l string Defines the resources that are required by the job. See the discussion below for this important flag.
            #PBS -m option(s) -m option(s) Defines the set of conditions (a=abort,b=begin,e=end) when the server will send a mail message about the job to the user.
            #PBS -N name -N name Gives a user specified name to the job. Note that job names do not appear in all Moab job info displays, and do not determine how your job’s stdout/stderr files are named.
            #PBS -o filename -o filename Defines the file name to be used for stdout
            #PBS -p priority -p priority Assigns a user priority value to a job. See the discussion under Setting Job Priority.
            #PBS -q queue
            #PBS -q queue@host
            -q queue Run the job in the specified queue (pdebug, pbatch, etc.). A host may also be specified if it is not the local host.
            #PBS -r y -r y Automatically rerun the job if there is a system failure. The default behavior at LC is to NOT automatically rerun a job in such cases.
            #PBS -v list -v list Specifically adds a list (comma separated) of environment variables that are exported to the job.
            #PBS -V -V Declares that all environment variables in the qsub environment are exported to the batch job.

        Usage Notes:

         

          • You can submit jobs for any machine within your Moab grid, but not machines outside your grid. See Moab Grid Configurations for details.
          • After you submit your job script, changes to the contents of the script file will have no effect on your job because Moab has already spooled it to system file space.
          • Users may submit and queue as many jobs as they like, up to a reasonable configuration defined limit. The actual number of running jobs per user is usually a lower limit, however. These limits may vary between machines.
          • If your command line qsub option conflicts with the same option in your script file, the script option will override what is specified on the command line (in most cases).
          • In your job script, the #PBS token is case insensitive, however the parameters it specifies are case sensitive.
          • The default directory is where you submit your job from. If you need to be in another directory, then your job script will need to cd to that directory or use the -d flag.

         

        Discussion on the -l Option:

         

          • The -l option is one of the most important qsub options. It is used to specify a number of resource requirements for your job.
          • Syntax for the -l option is very specific, and the Moab documentation can be difficult to find. Currently, the “Resource Manager Extensions” section of the Moab Administrator’s Guide is the best place to look (see References).
          • Examples:
            Examples Resources
            -l depend=jobid Dependency upon completion of another job. jobid is the Moab jobid for the job that must complete first. See Setting Up Dependent Jobs for details.
            -l feature=lustre
            -l feature=32GB
            Requirement for a specific node feature. Use the mdiag -t command to see what features are available on a node.
            -l gres=filesystem
            -l gres=filesystem, filesystem
            -l gres=ignore
            Job requires the specified parallel Lustre file system(s). Valid labels are the names of LC Lustre parallel file systems, such as lscratchrza, lscratchb, lscratch1 …. The purpose of this option is to prevent jobs from being scheduled if the specified file system is unavailable. The default is to require all mounted lscratch file systems. The ignore descriptor can be used for jobs that don’t require a parallel file system, enabling them to be scheduled even if there are parallel file system problems. More information is available HERE.
            -l nodes=256 Number of nodes. The default is one.
            -l partition=cab
            -l partition=zin|juno
            -l partition=hera:cab
            -l partition=ALL
            Run job on a specific cluster in a Moab grid
            Run a job on either cluster in a Moab grid
            Run a job on either cluster in a Moab grid
            Run a job on any cluster in a Moab grid
            -l procs=256 Number of processes. This option can be used instead of the nodes= option and Moab will automatically figure out how many nodes to use.
            -l qos=standby Quality of service (standby, expedite)
            -l resfailpolicy=ignore
            -l resfailpolicy=requeue
            Try to keep job running if a node fails

            Requeue the job automatically if it fails

            -l signal=14@120
            -l signal=SIGHUP@2:00
            Signaling – specifies the pre-termination signal to be sent to a job at the desired time before expiration of the job’s wall clock limit. Default time is 60 seconds.
            -l ttc=8 Stands for “total task count”. Discussed in the Running on the Aztec and Inca Clusters section of this tutorial.
            -l walltime=600
            -l walltime=12:00:00
            Wall clock time. Default units are seconds. HH:MM:SS format is also accepted.
          • If more than one resource needs to be specified, the best thing to do is to use a separate #PBS -l line for each resource. For example:
            #PBS -l nodes=64
            #PBS -l qos=standby
            #PBS -l walltime=2:00:00
            #PBS -l partition=cab
          • Alternately you can include all resources on a single #PBS -l line separated with commas and NO white space, For example:
            #PBS -l nodes=64,qos=standby,walltime=2:00:00,partition=cab

         

        Node configuration

        • For nodes, submit the number of nodes that your queue has permission to access. E.g., nodes=10 will reserve 10 nodes for your job. It may not even use as many resources, but it will reserve this for your job.
        • Specify the number of cores needed per processor using ppn. Similarly, you can assign cores that you require with ppn. Assigning ppn is not necessary, as the scheduler can decide that for you.
        • Keep in mind that we have different types of nodes in our cluster where G1 nodes have 16 cores, G2 nodes have 20 cores and Highmem nodes have 32 cores. So If your queue has only G1 nodes then you can not have ppn>16.
        • After you submit your job script, changes to the contents of the script file will have no effect on your job as Torque has already spooled a copy to a separate file system.
        • If your job request too many resources, showq will classify it as idle until resources become available.
        • We recommend you to always leave your email address in your scripts so you are alerted of any status changes in the job.

        Submit an Interactive job
        HPCC staff recommends jobs normally be submitted using a script and the qsub. However, qsub will also allow interactive jobs, which are useful when debugging scripts and applications.

        To run an interactive job, you must include the -I (capital i) flag to qsub. Additionally, any job submission parameters in your script file with #PBS prefixes should be included at the command line.

        Syntax:

        [login1]$ qsub –I –q batch

        qsub: waiting for job 123.admin.extreme.uic.edu to start

        qsub: job 123.admin.extreme.uic.edu ready

        [compute-0-1]$This command assigns a compute node to the user to run their jobs. Please note that if you logout or exit your interactive session, your job will be marked as completed by the scheduler. (Here, extreme is the cluster name.)
        To pass multiple options with your interactive job script, use the -l (lowercase L) option.Example:[login1]$ qsub -I -l nodes=1:ppn=16 -q queue_name -N job_name
        Flags used at the command line follow the same syntax as those flags listed in the table above.

     

    • Job monitoring
      There are several different job monitoring commands. They’re following:
      showq

      • Jobs for all partitions in the Moab grid will be displayed – not just for the machine you issue the command from. See Moab Grid Configurations for details.
      • The showq command has several options. A few that may prove useful include:
        • -r shows only running jobs plus additional information such as partition, qos, account and start time.
        • -i shows only idle jobs plus additional information such as priority, qos, account and class.
        • -b shows only blocked jobs
        • -p partition shows only those jobs on a specified partition. Can be combined with -r, -i and -b to further narrow the scope of the display.
        • -c shows recently completed jobs.
      • Possible output fields:

        JOBID = unique job identifier
        S = state, where R means Running, etc.
        PAR = partition
        EFFIC = cpu efficiency of the job. Not applicable at LC.
        XFACTOR = expansion factor (QueueTime + WallClockLimit) / WallClockLimit
        Q = quality of service, abbreviated to 2 characters; no=normal, st=standby, etc.
        USERNAME = owner of job
        ACCNT = account
        MHOST = master host running primary task for job
        NODES = number of nodes used
        REMAINING = time left to run
        STARTTIME = when job started
        CCODE = user defined exit/completion code.
        WALLTIME = wall clock time used by job
        COMPLETIONTIME = when the job finished

      • See the Moab documentation for other options and further explanation of the output fields.
      • Examples below.
        % showq -r
        active jobs ————————%showq -c
        completed jobs ———————

      checkjob:

      • Displays detailed job state information and diagnostic output for a selected job.
      • The checkjob command is probably the most useful user command for troubleshooting your job, especially if used with the -v flag. Sometimes, additional diagnostic information can be viewed by using multiple “v”s: -vv or -v -v.
      • This command can also be used for completed jobs.
      • See the Moab documentation for other options and an explanation of the output fields.
      • Examples below.

        % checkjob 258506 (shows a completed job)
        job 258506

        AName: mxterm
        State: Completed
        Completion Code: 0 Time: Wed Sep 19 11:35:18
        Creds: user:d33f group:d33f account:lc class:pdebug qos:normal
        WallTime: 00:00:07 of 00:05:00
        SubmitTime: Wed Sep 19 11:34:26
        (Time Queued Total: 00:00:45 Eligible: 00:00:00)

        Job Templates: fs
        Total Requested Tasks: 1

        Req[0] TaskCount: 1 Partition: aztec
        Dedicated Resources Per Task: PROCS: 1 lscratchb: 1 lscratchc: 1 lscratchd: 1

        Allocated Nodes:[aztec7:1]

        SystemID: omoab
        SystemJID: 258506

        IWD: /g/g0/d33f
        Executable: /var/opt/moab/spool/moab.job.bE1Clh

        Execution Partition: aztec
        Flags: PREEMPTOR,GLOBALQUEUE,PROCSPECIFIED
        Variables: RHOME=/g/g0/donf
        StartPriority: 0

        % checkjob 303 (shows a job with a problem)
        job 303

        State: Idle
        Creds: user:blaise group:blaise account:cs class:pbatch qos:standby
        WallTime: 00:00:00 of 00:30:00
        SubmitTime: Tue Mar 13 09:57:55
        (Time Queued Total: 00:01:02 Eligible: 00:00:00)

        Total Requested Tasks: 2

        Req[0] TaskCount: 2 Partition: ALL
        Memory >= 1M Disk >= 1M Swap >= 0
        Opsys: — Arch: — Features: —
        NodesRequested: 2

        IWD: $HOME/moab
        Executable: /var/opt/moab/spool/moab.job.cUOwWF
        Partition Mask: [hera] Flags: PREEMPTEE,IGNIDLEJOBRSV,GLOBALQUEUE
        Attr: PREEMPTEE
        StartPriority: 2188189
        Holds: Batch:NoResources
        NOTE: job cannot run (job has hold in place)
        BLOCK MSG: job hold active – Batch (recorded at last scheduling iteration)

      showstart
      showstart is used to guesstimate when the job will start

      • Displays the estimated start time for a selected job.
      • For jobs that are on hold or have specified a future start date, this command will provide an estimate for the earliest time the job might possibly run if it wasn’t on hold or have a future start time specified.
      • This command can also be used with an alternative syntax to get a “what if” type of estimate. For example, the command syntax showstart 256@1200 asks what time an imaginary job (no jobid) would start if it needed 256 processors for 1200 seconds.
      • See the Moab documentation for other options and an explanation of the output fields.
      • Examples below.

        % showstart 212
        job 212 requires 220 procs for 00:30:00

        Estimated Rsv based start in 23:59:54 on Thu Jun 19 13:25:25
        Estimated Rsv based completion in 1:00:29:54 on Thu Jun 19 13:55:25

        Best Partition: cab

      mdiag -j

      • Provides a one line display of job information per job.
      • Jobs for all partitions in the Moab grid will be displayed – not just for the machine you issue the command from. See Moab Grid Configurations for details.
      • The mdiag -j -v syntax provides additional information.
      • Examples below.

        % mdiag -j 75025
        JobID      State     Proc     WCLimit     User     Opsys Class Features

        75025      Running     32     365days     st78itz        – pdebug –

    • Example scripts – https://wiki.scinet.utoronto.ca/wiki/images/5/54/SciNet_Tutorial.pdf
  4. Compiling programs
      • Compiling with Intel compiler
        The cluster uses GNU’s gcc by default at login. The Intel Fortran and C++ Composer XE 2013 suite is provided to maximize performance from the Intel architecture. HPCC staff recommends using the Intel compilers whenever possible.
        To load the Intel compiler module,
        module load compilers/intel
        You can invoke the Intel® C++ Compiler on the command line either to compile C source files, to compile C++ source files or to compile Fortran source files.

        • When you invoke the compiler with icc, the compiler builds C source files using C libraries and C include files. If you use icc with a C++ source file, it is compiled as a C++ file. Use icc to link C object files.
        • When you invoke the compiler with icpc the compiler builds C++ source files using C++ libraries and C++ include files. If you use icpc with a C source file, it is compiled as a C++ file. Use icpc to link C++ object files.

        The icc or icpc command does the following:

        • Compiles and links the input source file(s).
        • Produces one executable file, a.out, in the current directory.

        Syntax{icc|icpc} [options] file1 [file2 . . .]
        where file is any of the following:

        • C or C++ source file (.c, .cc, .ccp, .cxx, .i)
        • assembly file (.asm),
        • object (.obj)
        • static library (.lib)

        Appropriate file name extensions are required for each compiler. By default, the executable filename is “a.out”, but it may be renamed with the “-o” option. The compiler command performs two operations: it makes a compiled object file (having a .o suffix) for each file listed on the command-line, and then combines them with system library files in a link step to create an executable. To compile without the link step, use the “-c” option.

        C Prog Example:$ icc -xhost -O2 -o flamec.exe prog.c

        Fortran Example:$ ifort -xhost -O2 -o flamef.exe prog.f90

        For more information on each of the compiler flags, use

        • $ icc -help
        • $ icpc -help
        • $ ifort -help
      • Compiling with MPICH2
        MPICH is an open source implementation of MPI (Message Passing Interface). Similar to Intel’s MPI, this is an alternative implementation of MPI.
        To get started, load the MPICH module before working on it.
        We have two version installed on the system for your use. MPICH2 and MPICH3, load them using

        $ module load tools/mpich2-1.5-gcc (MPICH2)
        $ module load tools/mpich-3.0.4-icc (MPICH3)

        Once either of the above commands are executed, they will automatically add environment variables required to use Mpich, i.e PATH, MPICH2_HOME (MPICH3_HOME in the case of MPICH3) and LD_LIBRARY_PATH.
        To run or compile programs with MPICH, run mpiexec.

        The following scripts are available to compile and link your mpi programs:

        Script Language
        mpicc GNU C
        mpiCC GNU C++
        mpif77 GNU Fortran 77

        Each script will invoke the appropriate compiler.
        Make a job script to reserve nodes for your job to run on.
        To compile or run a program with MPICH,
        $ mpiexec mpi-program-name.out > output.log
        The ’machinefile’ is of the form:

        host1
        host2:2
        host3:4
        host4:1

        host1’, ’host2’, ’host3’ and ’host4’ are the hostnames of the machines you want to run the job on.
        Example :$ mpiexec mpi-program-name.out > output.log
        For more information about MPICH– See the MPICH Manual

     

      • Compiling with gcc
        write-up in word document, not yet finalized

     

    • Compiling with OpenMPI
      OpenMPI is another open source MPI implementation, similar to MPICH.
      Its usage is similar to MPICH and Intel MPI.
      To get started with OpenMPI, you do not have to load the module. It is the default MPI implementation for Rocks OS.
      The following scripts are available to compile and link your mpi programs:

      Script Language
      mpicc GNU C
      mpiCC GNU C++
      mpif77 GNU Fortran 77

      Each script will invoke the appropriate compiler.
      Syntax

      mpicc
      mpiCC
      mpif77
      mpif90
      To get more information on specific compiler wrappers in OpenMPI, use -help with each wrapper.

      Example

      login1$ mpicc -help
      login1$ mpiCC -help
      login1$ mpif90 -help
      login1$ mpirun -help

  5. Lustre usage and benefits: ACER’s HPC resources rely on Intel’s Lustre platform to provide high performance parallel file system to our users that can provide much faster writes during the computations compared to NFS. We highly recommend our users to stage out data to Lustre environment to achieve better performance during run times.
    1. The main advantage of Lustre, a global parallel file system, over NFS and SAN file systems is that it provides; wide scalability, both in performance and storage capacity; a global name space, and the ability to distribute very large files across many nodes. Because large files are shared across many nodes in the typical cluster environment, a parallel file system, such as Lustre, is ideal for high end HPC cluster I/O systems.
    2. There are two main Lustre server components of a Lustre file system; Object Storage Servers (OSS) nodes and Meta Data Servers (MDS) nodes. File system Meta data is stored on the Lustre MDS nodes and file data is stored on the Object Storage Servers. The data for the MDS server is stored on a Meta Data Target (MDT), which essentially corresponds to any LUN being used to store the actual Meta data. The data for the OSS servers are stored on hardware LUNs called Object Storage Targets (OSTs). The OST targets can currently be a maximum size of 16TB. Since we usually configure OSS/OST data LUNs in an 8+2 RAID6 configuration, the common LUN consists of ten 2TB SATA drives, or in the case of 1TB drives ten 1TB SATA drives. These are the most common configurations, but some large sites do use SAS or Fibre Channel drives for their OSTs.