MPI(Message Passing Interface)는 병령 실행 프로그램에서 노드에 분산된 프로세스 사이에 정보를 교환하는 메시지를 전달하는 방법을 정의한 규약 중 하나이다.

MPICH 는 MPI 표준규약이 적용된 프리, 오픈소스이다. 이 프로그램은 Argonne National Lab 과 Mississippi State University 에서 작성하였으며 MPI-1.1을 따른 MPICH 와 MPI-2 에 따른 MPICH2 가 있다.

MPICH 이외에도 다양한 MPI 패키지가 존재하며, LAM/MPI, MPI/Pro 등이 있다.

현재 MPICH 의 버전은 05.11.4일 배포된 1.2.7 버전을 마지막으로 배포를 중단하였으며, MPICH 2를 이용할 것을 권장하고 있다.

Compiling with MPICH

To compile an MPI code for MPICH, use:

mpicc -O -o runme runme.c

where -O is the usual compiler optimization switch, and runme is the program to be compiled. Since mpicc is just a wrapper to a sequential compiler (maybe gcc or Intel's icc), it accepts all the usual switches and options. It will add the necessary search paths for the MPI libraries and headers, so you don't need to worry about where they are.

If you are doing it "by hand," then you may need something like the following:

gcc -I/opt/mpich/include -o runme runme.c -L/opt/mpich/lib -lmpich

Of course, you'll need to add any other compiler optimizations or libraries that your program may need.

To use MPICH with the Intel compilers you can do one of two things:

At the compile line:

mpicc -cc=/opt/intel/bin/icc ...args...
mpif77 -fc=/opt/intel/bin/ifc ...args...

In your .cshrc file:

setenv MPICH_CC /opt/intel/bin/icc
setenv MPICH_CLINKER /opt/intel/bin/icc
setenv MPICH_F77 /opt/intel/bin/ifc
setenv MPICH_FLINKER /opt/intel/bin/ifc

To force the use of the Gnu compilers, use:

mpicc -cc=/usr/bin/gcc ...args...
mpif77 -fc=/usr/bin/g77 ...args...

Or similar changes to your .cshrc file.


To run an MPI code under SGE, you'll need to make an SQE submission script first. See the SGE Queuing System web page for details on how to create such a script. You will use the following to launch your parallel job:

mpirun -np $NSLOTS -machinefile $TMPDIR/machines runme -arg1 -arg2

where -arg1 and -arg2 are arguments to your code (if needed). The -np and -machinefile arguments are for the mpirun program itself. The variable $NSLOTS is set automatically by SGE to the number of "slots" or CPUs that you were actually given (recall that you can request a range of CPUs from SGE, $NSLOTS says how many you were actually given).

A list of the actual machines you were given is in a file in $TMPDIR/machines. This is a simple text file with one hostname per line, it is used by MPICH to spawn your jobs on the proper machines (otherwise, our automated job-killer program might remove your remote processes).

MPICH Version 2

The default MPICH library, v.1.2, is in /usr/bin, /usr/lib, and /usr/include. This should be adequate for most users and is in the "normal" execution path so it should be visible without modifications to your .cshrc or bashrc file. We also have installed MPICH Version 2 in /opt/mpich2. To use it, you must explicitly add the following directory to your path (and add it before /usr/bin):

set path = ( /opt/mpich2/bin $path )

When you type 'which mpicc' you should see /opt/mpich2/bin/mpicc (if you see /usr/bin/mpicc, then you are still using MPICH v.1.2). You use 'mpicc' and 'mpif77' as with MPICH v.1.2, but to run your code, you must use 'mpiexec' instead of 'mpirun' in your job scripts:

mpiexec -rsh -nopm -n $NSLOTS -machinefile $TMPDIR/machines mpiprog

The arguments are needed by MPICH-2 in order to properly interact with SGE.

P4 Memory Errors

If you get errors from MPICH saying something like:

p4_shmalloc returned NULL


p4_error: alloc_p4_msg: Message size exceeds P4s maximum message size: 344530944

then you may need to alter the amount of memory used by MPICH. The P4_GLOBMEMSIZE variable can be made larger by changing your .cshrc file and adding:

setenv P4_GLOBMEMSIZE 1073741824

or some other amount of memory that you need. Increasing this amount reduces the amount of memory left for your program to use, so do not raise P4_GLOBMEMSIZE unless you need to. Note that you may need to set it to 2x or 3x the max message size that your program sends – experiment with the value and use the minimum P4_GLOBMEMSIZE value that works for you.

Note also that some of the DSCR machines have only 1GB of memory, shared between 2 CPUs – if you set P4_GLOBMEMSIZE to 512MB or more, you should also add a memory request to SGE (see SGE Queueing System:

#$ -l mem_free=2G

where '2G' would represent your P4_GLOBMEMSIZE amount PLUS the main memory needed by your program (so it may need to be much more than 2G).

mpich 이용시 (어떤 컴파일러건... lam 이던.. pgi 이용하던..) 아무 이유없이 발생하는 p4error ....

참 찾기힘든 에러;; 물론 프로그램에러가 아니니깐...

여러번 프로그램 강제종료시 심심치 않게 등장하죠..ㅎㅎㅎ

이때는 모든 노드의 세마포어를 삭제해주어야 합니다...(자세한건 넘어갈께요..)

다음과 같이 해당 mpich 경로에 보시면 cleanipcs 라는게 있습니다..

이것을 실행시켜 주시면 됩니다...

$pexec /usr/local/mpich-intel/sbin/cleanipcs

이건 제 이용환경이 mpich-intel 컴파일러 상황이니깐 이런거에요.. 물론 cleanipcs 는 해당 디렉토리의 sbin 에 있어요..

mpich 는 병렬 환경에서 이용하는 것이니깐.. 당연히 pexec 명령으로 모든 노드에 명령을 하달해야 합니다.
