HOME PAGE FOR VMI21

[NCSA]
Virtual Machine Interface 2.1

VMI2/MPICH-VMI Users Manual
Avneesh Pant( apant@ncsa.uiuc.edu)

Table of Contents

  1. VMI2/MPICH-VMI Installation Instructions
    1. Prerequisites
    2. Binary install via RPMs
    3. Obtaining the source code
    4. Preparing to build
    5. Building VMI2
    6. Building MPICH-VMI
  2. VMI2/MPICH-VMI Configuration Instructions
    1. Configuring MPICH-VMI
    2. Running VMI Daemons
  3. VMI2/MPICH-VMI Usage Instructions
    1. Running codes with VMI2
    2. Running codes with MPICH-VMI

VMI2/MPICH-VMI Installation Instructions

Prerequisites:

VMI2 and MPICH-VMI depend on several other packages to build and function properly. If these are not present on your system, install them before proceeding.
Required packages:

Binary install via RPMs:

If you use the binary RPMs provided on the main VMI website, it is possible to skip compiling the VMI2, MPICH-VMI and CRM sourcecode. However please take note that the RPMs are non-relocatable and hence will always install under /opt.
Currently VMI2 and MPICH-VMI consist of the following RPMs.
vmi-base-2.1.0-1..rpm The VMI runtime libraries and applications. This is needed on ALL systems.
vmi-devel-2.1.0-1..rpm The VMI development headers and libraries. This is only needed on the system on which VMI and MPI codes are compiled on.
mpich-vmi-{gcc|intel}-2.1.0-1..rpm The MPICH-VMI runtime libraries and suport scripts. This is needed on systems which run or compile MPI codes.
mpich-vmi-{gcc|intel}-devel-2.1.0-1..rpm The MPICH-VMI development headers and libraries. This is only needed on the system on which MPI codes are compiled on.
There are several things to keep in mind when installing the RPMs. It has been reported that several newer versions of the curl RPMs do not make a symlink "/usr/lib/libcurl.so" to "/usr/lib/libcurl.so.1". This may cause a failure when attempting to run VMI binaries that came from RPMs. If this happens try adding the symlink manualy.
Once you have installed the RPMs, you should move to the "Running VMI Daemons" section of this document.

Obtaining the source code:

There are a couple of ways to obtain the VMI2 source code. By downloading it from the web site or via anonymous CVS.
Downloading via Web:
Go to The download page to grab the source tarballs.
Obtaining via CVS:
  1. Set your CVSROOT enviorment variable to ":pserver:anonymous@vmi.ncsa.uiuc.edu:/space/cvs/cvsrep"
  2. Type "cvs login"
  3. When prompted for a password, just hit enter.
  4. Type "cvs co -N VMI21" to check out VMI2.
  5. Type "cvs co -N MPI126" to check out MPICH-VMI.

Preparing to build:

If you have downloaded the source code via the web page untar the archives now.
If you have installed the latest autoconf tools into a location which is not in your path, put the location at the head of your PATH enviorment variable now.

Building VMI2:

  1. Go to the VMI20 directory.

  2. Run the freshbuild script. Options can be passed to freshbuild script to specify the VMI devices to be built and overide configure defaults. The following options are currently supported:

    --prefix=DIR Install VMI2 under DIR.
    --enable-dlmalloc Enable use of doug lea's memory allcator. Default is the more optimized Ptmalloc for threaded applications.
    --enable-ptmalloc Ignore auto detect of glibc and force ptmalloc to be used as the memory allocator. (Use with glibc-2.2.x)
    --enable-ptmalloc2 Ignore auto detect of glibc and force ptmalloc2 to be used as the memory allocator. (Use with glibc-2.3.x)
    --with-efence link against Electric Fence if available
    --disable-cache Disable the use of pin down cache within VMI for memory registration. This entails a severe performance penalty. Use at your own risk!
    --with-gm=DIR Build GM device. (Specify location of GM installation directory)
    --with-vapi=DIR Build VAPI/Infiniband (MST) device. (Specify location of VAPI. Default is /usr/mellanox).
    --with-openib=DIR Build OpenIB device. (Specify location of OpenIB installation. Currently tested with Mellanox HPC Gold distribution of OpenIB. Default is /usr/local/ib_hpc).
    --with-ibal=DIR Build IBAL/Infiniband device. (Specify location of IBAL installation. Note: This is currently experimental and not tested thoroughly!)
    --with-ccil=DIR Build Ammasso CCIL Device (iWarp) (Specify location of starcore software installation)
    For example to build VMI2 with support for OpenIB, using the version of OpenIB installed under /opt/ib_hpc and have it install in /opt/vmi-2.1.0-1-gcc you need to execute "./freshbuild --prefix=/opt/vmi-2.1.0-1-gcc --with-openib=/opt/ib_hpc".

    If the freshbuild script bails out with "autoconf: configure.in: No such file or directory" you probably did not install the required version of autoconf. Go back and make sure that your version of autoconf is up to date and in your PATH, before reruning the freshbuild script.

  3. Type 'make' and VMI2 will build.

  4. As root, type 'make install' this will install VMI2 (make -i install for non root users)

Congratulations, you now have VMI2 installed on your system.

Building MPICH-VMI:

This assumes that you have installed VMI2 on your system already. If not do that now.
MPICH-VMI installation can be done in 2 ways either through an automatic script called "chvmi.make.{gcc|ecc|icc}" or manually. The manual installation makes use of several environmental variables detailed below which needs to be set by the user through the shell. In case of the automatic script based installation the default values shown below are used. However they can be overridden and is explained in Automatic Installation Section

Environmental Variables Setup

Set the VMI_INSTALL_PATH environmental variable to the path where VMI2 has been installed. $VMI_INSTALL_PATH/lib will be added to $LD_LIBRARY_PATH.

Optional Environmental Variables
Set the MPICH_VMI_FLAGS environmental variable to
enable-crc Enables message transfer consistency check. (This lowers performance significantly)
non-root-mpd Build mpd to be run by a user.
enable-access-check Use PAM for access control on job launch for MPDs.
Multiple options have to be space delimited and enclosed in quotes. If Myrinet device is used then set the GM_INSTALL_PATH environment variable. $GM_INSTALL_PATH/lib will be added to $LD_LIBRARY_PATH.
The default values for the environmental variables are detailed below.
CC System C compiler - Default - gcc
CXX System C++ compiler - Default - g++
F77
System Fortran compiler - Default - g77
PREFIX
MPICH-VMI installation directory
CFLAGS
Compilation Flags Default -  -O3 -g -DNO_LOCK

Automatic Installation

  1. Go to the MPI126 directory.

  2. Automated configuring and building MPICH-VMI is via the "./chvmi.make.{gcc|ecc|icc}" script. You can pass an argument "./chvmi.make.gcc inherit" if you want to inherit the values of the environmental variables set via the shell. The $PREFIX variable has a default value of /opt/mpich-vmi-2.1.0-1-{gcc|intel}. You can also choose to modify the script to change the default settings.  This completes the building and installation of MPICH-VMI. 

Manual Installation

  1. Make sure that the mandatory environmental variables are set correctly. Go to the MPI12 directory.

  2. If you would like to configure MPICH-VMI manually you can do that  via the "./configure" script. Presently MPICH-VMI honors all of the standard MPICH configure options, and additionally honors the following option which is mandatory if you want to use MPICH-VMI.

    --prefix=DIR Install MPICH-VMI under DIR.
    --with-device=ch_vmi The option -with-vmi applies to the ch_vmi device

    For example, to install MPICH-VMI in "/opt/mpich-vmi-2.1.0-1-gcc", linking against the VMI install in "/opt/vmi-2.1.0-1-gcc", you would set VMI_INSTALL_PATH to "/opt/vmi-2.1.0-1-gcc", and type "./configure --prefix=/opt/mpich-vmi-2.1.0-1-gcc --with-device=ch_vmi" to configure MPICH-VMI.

  3. Type 'make' to build MPICH-VMI.

  4. As root, type 'make install' this will install MPICH-VMI under $PREFIX


VMI2/MPICH-VMI Configuration Instructions

Configuring MPICH-VMI:

MPD ring setup:
There are two files which need to be propagated to every node in a given mpd ring.
/etc/mpd.conf Required configuration: copy to all nodes Upon installation, /etc/mpd.conf is populated with a random password  string.  This needs to be common among all the nodes in an mpd ring.  You can take any node's version of this file, and copy it to all the others.  It doesn't matter which one, only that they are the same.
/etc/init.d/mpd Required configuration:  set MPD_LISTENER_HOST (default not set)
Optional configuration:  set EXEC_ON_LISTENER (default = 0)
Optional configuration:  set MPD_PORT (default = 666)
Required configuration:  copy to all nodes
This is the rc init script for the mpd daemon.  In an mpd ring, all mpds are launched to connect to a "listener host".  Whichever node you specify as the listener host will automatically be launched in the listener mode.  The default setting is *not* to allow execution on the listener host.  e.g. If your head node is operating as the listener.  Any configuration changes need to be made before the init script is propagated of course.

Default machines file: mpich_vmi2_dir/share/machines.list
Optional configuration:  list hostnames
The machines.list file may be populated with a list of hosts to use when the -machinefile option is not used on the mpirun command line.  Format of each line is one of the following:
HOSTNAME
-or-
HOSTNAME:[num_procs]

Defaults in mpirun:
  mpich_vmi2_dir/bin/mpirun.ch_vmi
Optional configuration: set values for custom settings At the top of the file, the following set of variables is defined.  Those prefixed by "DF_" are only set if the value isn't set explicitly.  e.g.  If -specfile isn't specified on the mpirun command line, then the value in $DF_VMI_SPECFILE is used.

LD_LIBRARY_PATH=$VMI_INSTALL_PATH/lib:$LD_LIBRARY_PATH
DF_rshcmd=ssh
DF_LOG_ENABLE=0
DF_LOGFILE=$MPIRUN_HOME/../log/mpirun.ch_vmi.log
DF_LOG_DIRECTIVES=$MPIRUN_HOME/mpirun.ch_vmi.logger
DF_machineFile=$MPIRUN_HOME/../share/machines.list
DF_VMI_SPECFILE=$VMI_INSTALL_PATH/specfiles/tcp.xml
DF_VMI_SPECFILE_PATH=$VMI_INSTALL_PATH/specfiles
DF_VMI_LAUNCHER=1          # Detected launcher overrides this value
DF_VMI_VERBOSE=0

Logging mpirun:
By default, mpirun logging is disabled.  This can be changed globally in mpirun.ch_vmi (above), but can still be force-disabled on the mpirun command line with the -nolog option.  The default logfile is also set at the top of mpirun.ch_vmi (above).  Logging is mainly useful to sysadmins for debugging and tracking mpirun use.  It is quite verbose, and if enabled, the size of the logfile should be monitored.

Running VMI Daemons:

You will need to start a copy of "vmieyes" daemon on all nodes that can run a MPI process. You can either do this via the init script bundled with the VMI runtime package, or by hand. The vmieyes daemon is installed under $VMI_INSTALL_PATH/sbin directory.

It is recommended though not neccessary to run the vmieyes daemon as root. The vmieyes daemon is used only during job startup and hence does not run consume any significant compute resources when running. It is recommended that the vmieyes daemon be chkconfig to start automatically on reboot. MPICH-VMI jobs will fail to run if a node does not have the vmieyes daemon running on it.

VMI2/MPICH-VMI Usage Instructions

Running codes with VMI2:

If you get this error while running a VMI2 sample program:
      "error while loading shared libraries: libcurl.so.1: cannot open shared object file: No such file or directory"
You need to create a symbolic link libcurl.so.1 to libcurl.so or libcurl.so.2

Running codes with MPICH-VMI:

To compile a code with MPICH-VMI, the standard "mpicc", "mpiCC", "mpif77" scripts are used, and "mpirun" is used to launch jobs.
The mpirun which ships with MPICH-VMI supports launhing jobs via MPD/ssh.
In addition to the standard "mpirun" options, MPICH-VMI has the following additional options to better accomadate VMI2.
-nolog Disables admin logging.
-specfile Specifies xml specfile (Used to switch the underlying transport)
-key [string]
Character string address for VMI process. Automatically allocated by mpirun.ch_vmi, except within grid jobs
-job-sync-timeout
Maximum number of seconds allowed for all processes to start (defaults to 300 seconds).
-max-multicast-msglen
Maximum message size for which optimized MPI collectives will use multicast. Default is 16K.
-disable-topology-colls
Disables use of topology aware collectives
-enable-multicast-collectives
Enables multicast implementation of MPI collectives. Warning! This is EXPERIMENTAL.
-disable-rdma-barrier
Disables use of RDMA based optimized barrier implementation
-disable-rndvz-get
Disables use of RDMA Get Protocol when using rendezvous for large asynchronous sends.
-disable-regcache
Disable the use of VMI registration cache for buffers. Can lead to a decrease in performance. Advised to use as a debug option for memory related errors.
-enable-active-connections
Enables use of active connections on startup. Processes connect only to peers they communicate with. Faster startup and minimizes resource usage.
-force-shell
Overrides detected launch mehanism with shell (ssh)
-grid-procs [#]
Overrides detected VMI_PROCS for separated cluster run time environments (grid environments)
-debugger [gdb] [totalview]
Select debugger to run with
-mmapthreshold
Allocation at which mmap is used. Default is 4 MB.
-rdmachunk
Base chunk for large RDMA transfers used for Rendezvous protocol. Default is 256k
-rdmapipeline
Max number of RDMA chunks in flight. Default is 3.
-eagerlen
Message size at which to switch from short to rendezvous protocol. Default is 16k.
-eagerrunexcount
Maximum number of unexpected short messages before allocating memory for subsequent receives. Default is 16
-eagerisendcopy
Size of the largest message that can be copied in asynchronous eager send to finish the send immediately.
-grid-procs
Total number of procs in a grid job.
-grid-crm
Specify grid CRM host.
-disable-rdma-short
Disable the use of RDMA protocol for short messages.
-disable-shmem-comm
Disable the use of Shared Memory for intra node communication
-disable-profiling
Disables collection of communication statistics.
-profile-server [HOST]
Host running profile daemeon.
-eagerisendcopy
Size of the largest message that can be copied in asynchronous eager send to finish the send immediately.
-v
-vv
-vvv
-vvvv
Verbose level 1 - MPIRUN verbose & VMI startup
Verbose level 2 - Warning messages
Verbose level 3 - Error messages
Verbose level 10 - Excess debug (Everything)
If mpirun is configured fully for the local setup, NONE of these options are mandatory.
Note: The "-specfile" option takes full paths (and URLS) or simple file prefixes (gm, tcp, etc.). When specifying a prefix, mpirun assumes that you have a corresponding XML file in VMIINSTALLPATH/specfiles which contains information about the transport(s) which you wish to use. This automatically supports new XML files added to that directory, for example if the site administrator adds a XML specfile called "newtransport.xml", mpirun will support "-specfile newtransport".

By setting up the machine file in $MPI_INSTALL_PATH/share/machines.list you can avoid specifying the machine file option in mpirun.

Profiling data is used for self tuning subsequent runs of application. The data collected includes
- job duration
- job size (MPI world size)
- executable name
- transport used
- one way hash of userid (i.e NOT the userid)
- communications stats (send/receive counters per rank)

 


[NCSA]