I currently have a problem when trying to run mpiexec or mpiexec-hydra on a cluster using Sun Grid Engine to schedule jobs.
The errors that come up after running the mpiexec-hydra are the following:
Traceback (most recent call last):
File "<stdin>", line 973, in ?
File "<stdin>", line 465, in mpdboot
ValueError: need more than 1 value to unpack
error: commlib error: access denied (client IP resolved to host name "localhost.localdomain". This is not identical to clients host name "node038.cm.cluster")
error: executing task of job 1046000 failed: failed sending task to execd@localhost: can't find connection
And this is a job-script that I am using to run the mpi Job:
#!/bin/sh
#
# Your job name
#$ -N My_Job
#
# Use current working directory
#$ -cwd
#
# pe (Parallel environment) request. Set your number of processors here.
#$ -pe impi 24
#
# Run job through bash shell
#$ -S /bin/bash
# If modules are needed, source modules environment:
. /etc/profile.d/modules.sh
# Add any modules you might require:
module add shared
module load intel/compiler/64/11.1/046
module load intel/mpi/4.0.0.028
# The following output will show in the output file
echo "Got $NSLOTS processors."
cat $PE_HOSTFILE | awk '{print $1}' | sort -u | head -n 2 > hostfile.txt
# Run your application
env
export I_MPI_MPD_RSH=ssh
export I_MPI_HYDRA_DEBUG=on
export I_MPI_HYDRA_BOOTSTRAP=ssh
mpdboot -n 2 --verbose -r /usr/bin/ssh -f $PE_HOSTFILE
mpiexec.hydra -np 2 pingpong2 > output_file.txt
Also I did not setup an .mpd.conf and mpd.hosts in my home account. Not sure if these files are generated when calling intel mpi.
When i run ps aux | mpd, no mpd is listed as running.
Thanks,
Kris