Wednesday: migrating codes into MPI

Parellelization/Scaling of code

MPI on farm cluster

Got my MPI codes running.
Much better way to get jobs into the queue, asking for 16 threads that don’t have to be on the same node is much faster. Can also ask for 161 threads, but will wait longer in the queue.

The trick to getting this to run was mostly getting the library set correctly. Setting the library path at the top of the script did the trick:


RLIBS="~/R/x86_64-redhat-linux-gnu-library/2.13"
.libPaths(c(RLIBS, .libPaths()))

From there I can initialize an MPI instance, set up exit conditions, load variables and functions to the slave nodes and then have them all execute the desired function:



N <- 10 # n-1 from the threads allocated(?) 

require(Rmpi)
mpi.spawn.Rslaves(nslaves=N)

## Clean-up 
.Last <- function(){
    if (is.loaded("mpi_initialize")){
        if (mpi.comm.size(1) > 0){
            print("Please use mpi.close.Rslaves() to close slaves.")
            mpi.close.Rslaves()
        }
        print("Please use mpi.quit() to quit R")
        .Call("mpi_finalize")
    }
}

mpi.bcast.Robj2slave(monkey)
mpi.bcast.Robj2slave(new_world)
mpi.bcast.Robj2slave(chains)

mcmc_out <- mpi.remote.exec(chains())


# close slaves and exit
mpi.close.Rslaves()
mpi.quit(save = "no")

Where “chains” is the function earlier defined that runs on each node.

Hmm… seem to run into trouble scaling this out to 161 nodes. Meanwhile better to run MCMCs as separate jobs, write data to a tmp folder, and then read in with a different script for the analysis, a la Eastman’s approach.

NERSC machine

  • setup for github: copy .ssh/rda and .ssh/rda.pub

  • installing R packages: load the R module and swap in the intel compilers first


module load R/2.12.1
module swap pgi intel
module swap openmpi openmpi-intel 

Note that format for the batch jobs differs substantially between architectures. Hopper is a Cray, and launches commands with aprun, see example.

Carver is an IBM, and seems to use mpirun instead, which appears to be a more general architecture command. See the Carver example program.

Testing:

Run brute_force example script on Carver and Hopper. Can’t launch R on compute nodes, perhaps a module loading issue. Will need to explore further. Otherwise might be better to move all code down into C and write the MPI there….

Misc