The information included in this page has been updated on March 2018 and it is valid for the release version 0.7.0.
Building for the Intel Knights Landing
The following configuration is recommended for the Intel Knights Landing platform:
../configure --enable-precision=double\
--enable-simd=KNL \
--enable-comms=mpi-auto \
--with-gmp=<path> \
--with-mpfr=<path> \
--enable-mkl \
CXX=icpc MPICXX=mpiicpc
where <path>
is the UNIX prefix where GMP and MPFR are installed. If you are working on a Cray machine that does not use the mpiicpc
wrapper, please use:
../configure --enable-precision=double\
--enable-simd=KNL \
--enable-comms=mpi \
--with-gmp=<path> \
--with-mpfr=<path> \
--enable-mkl \
CXX=CC CC=cc
Building for the Intel Haswell
The following configuration is recommended for the Intel Haswell platform:
../configure --enable-precision=double\
--enable-simd=AVX2 \
--enable-comms=mpi-auto \
--enable-mkl \
CXX=icpc MPICXX=mpiicpc
The MKL flag enables use of BLAS and FFTW from the Intel Math Kernels Library.
If gmp and mpfr are NOT in standard places (/usr/
) these flags may be needed:
--with-gmp=<path> \
--with-mpfr=<path>
where <path>
is the UNIX prefix where GMP and MPFR are installed.
If you are working on a Cray machine that does not use the mpiicpc
wrapper, please use:
../configure --enable-precision=double\
--enable-simd=AVX2 \
--enable-comms=mpi \
--enable-mkl \
CXX=CC CC=cc
If using the Intel MPI library, threads should be pinned to NUMA domains using:
export I_MPI_PIN=1
This is the default.
Building for the Intel Skylake
The following configuration is recommended for the Intel Skylake platform:
../configure --enable-precision=double\
--enable-simd=AVX512 \
--enable-comms=mpi-auto \
--enable-mkl \
CXX=mpiicpc
The MKL flag enables use of BLAS and FFTW from the Intel Math Kernels Library.
If gmp and mpfr are NOT in standard places (/usr/
) these flags may be needed:
--with-gmp=<path> \
--with-mpfr=<path> \
where <path>
is the UNIX prefix where GMP and MPFR are installed.
If you are working on a Cray machine that does not use the mpiicpc
wrapper, please use:
../configure --enable-precision=double\
--enable-simd=AVX512 \
--enable-comms=mpi \
--enable-mkl \
CXX=CC CC=cc
If using the Intel MPI library, threads should be pinned to NUMA domains using:
export I_MPI_PIN=1
This is the default.
Building for the AMD Epyc
The AMD EPYC is a multichip module comprising 32 cores spread over four distinct chips each with 8 cores. So, even with a single socket node there is a quad-chip module. Dual socket nodes with 64 cores total are common. Each chip within the module exposes a separate NUMA domain. There are four NUMA domains per socket and we recommend one MPI rank per NUMA domain. MPI-3 is recommended with the use of four ranks per socket, and 8 threads per rank.
The following configuration is recommended for the AMD EPYC platform:
../configure --enable-precision=double\
--enable-simd=AVX2 \
--enable-comms=mpi3 \
CXX=mpicxx
If gmp
and mpfr
are NOT in standard places (/usr/
) these flags may be needed::
--with-gmp=<path> \
--with-mpfr=<path>
where <path>
is the UNIX prefix where GMP and MPFR are installed.
Using MPICH and g++ v4.9.2, best performance can be obtained using explicit GOMP_CPU_AFFINITY flags for each MPI rank. This can be done by invoking MPI on a wrapper script omp_bind.sh to handle this.
It is recommended to run 8 MPI ranks on a single dual socket AMD EPYC, with 8 threads per rank using MPI3 and shared memory to communicate within this node:
mpirun -np 8 ./omp_bind.sh ./Benchmark_dwf --mpi 2.2.2.1 --dslash-unroll --threads 8 --grid 16.16.16.16 --cacheblocking 4.4.4.4
Where omp_bind.sh does the following:
#!/bin/bash
numanode=` expr $PMI_RANK % 8 `
basecore=`expr $numanode \* 16`
core0=`expr $basecore + 0 `
core1=`expr $basecore + 2 `
core2=`expr $basecore + 4 `
core3=`expr $basecore + 6 `
core4=`expr $basecore + 8 `
core5=`expr $basecore + 10 `
core6=`expr $basecore + 12 `
core7=`expr $basecore + 14 `
export GOMP_CPU_AFFINITY="$core0 $core1 $core2 $core3 $core4 $core5 $core6 $core7"
echo GOMP_CUP_AFFINITY $GOMP_CPU_AFFINITY
$@
Build setup for laptops, other compilers, non-cluster builds
Many versions of g++
and clang++
work with Grid, and involve merely replacing CXX
(and MPICXX
),
and omit the enable-mkl
flag.
Single node, non MPI builds are enabled with:
--enable-comms=none
FFTW support that is not in the default search path may then enabled with:
--with-fftw=<installpath>
BLAS will not be compiled in by default, and Lanczos will default to Eigen diagonalisation.