원본글(https://wiki.duke.edu/display/SCSC/MPICH+Parallel+Library)
소개:
MPI(Message Passing Interface)는 병령 실행 프로그램에서 노드에 분산된 프로세스 사이에 정보를 교환하는 메시지를 전달하는 방법을 정의한 규약 중 하나이다.
MPICH
는 MPI 표준규약이 적용된 프리, 오픈소스이다. 이 프로그램은 Argonne National Lab 과 Mississippi
State University 에서 작성하였으며 MPI-1.1을 따른 MPICH 와 MPI-2 에 따른 MPICH2 가 있다.
MPICH 이외에도 다양한 MPI 패키지가 존재하며, LAM/MPI, MPI/Pro 등이 있다.
현재 MPICH 의 버전은 05.11.4일 배포된 1.2.7 버전을 마지막으로 배포를 중단하였으며, MPICH 2를 이용할 것을 권장하고 있다.
Compiling with MPICH
To compile an MPI code for MPICH, use:
mpicc -O -o runme runme.c
where -O is the usual compiler optimization switch, and runme is the
program to be compiled. Since mpicc is just a wrapper to a sequential
compiler (maybe gcc or Intel's icc), it accepts all the usual switches
and options. It will add the necessary search paths for the MPI
libraries and headers, so you don't need to worry about where they are.
If you are doing it "by hand," then you may need something like the following:
gcc -I/opt/mpich/include -o runme runme.c -L/opt/mpich/lib -lmpich
Of course, you'll need to add any other compiler optimizations or libraries that your program may need.
To use MPICH with the Intel compilers you can do one of two things:
At the compile line:
mpicc -cc=/opt/intel/bin/icc ...args...
mpif77 -fc=/opt/intel/bin/ifc ...args...
In your .cshrc file:
setenv MPICH_CC /opt/intel/bin/icc
setenv MPICH_CLINKER /opt/intel/bin/icc
setenv MPICH_F77 /opt/intel/bin/ifc
setenv MPICH_FLINKER /opt/intel/bin/ifc
To force the use of the Gnu compilers, use:
mpicc -cc=/usr/bin/gcc ...args...
mpif77 -fc=/usr/bin/g77 ...args...
Or similar changes to your .cshrc file.
Running
To run an MPI code under SGE, you'll need to make an SQE submission
script first. See the SGE Queuing System web page for details on how to
create such a script. You will use the following to launch your
parallel job:
mpirun -np $NSLOTS -machinefile $TMPDIR/machines runme -arg1 -arg2
where -arg1 and -arg2 are arguments to your code (if needed). The
-np and -machinefile arguments are for the mpirun program itself. The
variable $NSLOTS is set automatically by SGE to the number of "slots"
or CPUs that you were actually given (recall that you can request a
range of CPUs from SGE, $NSLOTS says how many you were actually given).
A list of the actual machines you were given is in a file in
$TMPDIR/machines. This is a simple text file with one hostname per
line, it is used by MPICH to spawn your jobs on the proper machines
(otherwise, our automated job-killer program might remove your remote
processes).
MPICH Version 2
The default MPICH library, v.1.2, is in /usr/bin, /usr/lib, and /usr/include.
This should be adequate for most users and is in the "normal" execution
path so it should be visible without modifications to your .cshrc or bashrc file. We also have installed MPICH Version 2 in /opt/mpich2. To use it, you must explicitly add the following directory to your path (and add it before /usr/bin):
set path = ( /opt/mpich2/bin $path )
When you type 'which mpicc' you should see /opt/mpich2/bin/mpicc (if you see /usr/bin/mpicc, then you are still using MPICH v.1.2). You use 'mpicc' and 'mpif77' as with MPICH v.1.2, but to run your code, you must use 'mpiexec' instead of 'mpirun' in your job scripts:
mpiexec -rsh -nopm -n $NSLOTS -machinefile $TMPDIR/machines mpiprog
The arguments are needed by MPICH-2 in order to properly interact with SGE.
P4 Memory Errors
If you get errors from MPICH saying something like:
p4_shmalloc returned NULL
or:
p4_error: alloc_p4_msg: Message size exceeds P4s maximum message size: 344530944
then you may need to alter the amount of memory used by MPICH. The P4_GLOBMEMSIZE variable can be made larger by changing your .cshrc file and adding:
setenv P4_GLOBMEMSIZE 1073741824
or some other amount of memory that you need. Increasing this amount
reduces the amount of memory left for your program to use, so do not
raise P4_GLOBMEMSIZE unless you need to. Note that you may
need to set it to 2x or 3x the max message size that your program sends
– experiment with the value and use the minimum P4_GLOBMEMSIZE value that works for you.
Note also that some of the DSCR machines have only 1GB of memory, shared between 2 CPUs – if you set P4_GLOBMEMSIZE to 512MB or more, you should also add a memory request to SGE (see SGE Queueing System:
where '2G' would represent your P4_GLOBMEMSIZE amount PLUS the main memory needed by your program (so it may need to be much more than 2G).