wiki:Documentation/UserGuide/OrchideeVampir

Profiling with Vampir

Objective

Background of this item: Vampir is a type of tool called a profiler. It was available on Curie and it has been successfully used to analyze single processor jobs. Profiling allows you to learn where your program spent its time and which functions called which other functions while it was executing. This information can show you which pieces of your program are slower than you expected and might be candidates for rewriting to make your program execute faster.

Vampir on Curie

Authors: D. Solyga
Last revision: D. Solyga (2013/06/18)

Global instructions

Before using Vampir, you have to load it by module command :

    > module load vampirtrace ; module load vampir

vampirtrace is the library. vampir let you visualize the output files produced by vampir (the "traces") in otf format. .

  1. Install ORCHIDEE on $WORKDIR at Curie.
  1. If you use modipsl, you have to modify AA_make.gdef. Look for curie and make the following changes :
#-Q- curie  #-
#-Q- curie  #- Global definitions for Curie at TGCC
#-Q- curie LIB_MPI = MPI1
#-Q- curie LIB_MPI_BIS = MPI1
#-Q- curie PRISM_ARCH = X64
#-Q- curie PRISM_NAME = curie
#-Q- curie FCM_ARCH = X64_CURIE
#-Q- curie  M_K = gmake
#-Q- curie  P_C = cpp
#-Q- curie  P_O = -P -C $(P_P)
#-Q- curie  F_C = vtf90 -vt:mpi -vt:f90 mpif90 -c -cpp
#-Q- curie  #-D- MD    F_D = -g
#-Q- curie  #-D- MN    F_D =
#-Q- curie  #-P- I4R4  F_P = -i4
#-Q- curie  #-P- I4R8  F_P = -i4 -r8
#-Q- curie  #-P- I8R8  F_P = -i8 -r8
#-Q- curie  F_O = -DCPP_PARA -xHost -O3 $(F_D) $(F_P) -I$(MODDIR) -module $(MODDIR)
#-Q- curie  F_L = vtf90 -vt:mpi -vt:f90 mpif90
#-Q- curie  M_M = 0
#-Q- curie  L_X = 0
#-Q- curie  L_O =
#-Q- curie  A_C = ar -r
#-Q- curie  A_G = ar -x
#-Q- curie  C_C = cc -c
#-Q- curie  C_O =
#-Q- curie  C_L = cc
#-Q- curie  #-
#-Q- curie  NCDF_INC = $(NETCDF_INC_DIR)
#-Q- curie  NCDF_LIB = -L$(NETCDF_LIB_DIR) -lnetcdff -lnetcdf
#-Q- curie  #-

  1. Execute ins_make script
    >  ./ins_make
  1. Compile with gmake
  1. Script to launch on Curie
#!/bin/bash

#MSUB -r TAG196_IO
#MSUB -n 48
#MSUB -T 1800
#MSUB -o orchidee.%I
#MSUB -e orchidee.%I
#MSUB -q large
#MSUB -x
#MSUB -A gen6328

set -x
cd ${BRIDGE_MSUB_PWD}

module load netcdf/3.6.3
module load vampirtrace

export VT_MAX_FLUSHES=0

date 
time ccc_mprun ./orchidee_ol
date 

  1. Once your job is completed, you will find some files with the extension "*.z" and one file "*.otf". You need to open the otf file with vampir :

    > vampir  file_to_visualize.otf

  1. Vampir is intuitive, play with the different options. Look at the official website if needed http://www.vampir.eu/tutorial/manual.

Isolate part of the code

If you suspect some part of your code to be time-consuming, you can isolate it with Vampirtrace. In your code, add the instructions :

    VT_USER_START('Name_to_give_to_this_part_of_code')
     
     "Code Fortran" 

    VT_USER_END('Name_to_give_to_this_part_of_code')

You have to give a name to each part of code you isolated see ParallelismPerformances for an example. Then reinstall your makefiles by using the following AA_make.gdef :

#-Q- curie  #-
#-Q- curie  #- Global definitions for Curie at TGCC
#-Q- curie LIB_MPI = MPI1
#-Q- curie LIB_MPI_BIS = MPI1
#-Q- curie PRISM_ARCH = X64
#-Q- curie PRISM_NAME = curie
#-Q- curie FCM_ARCH = X64_CURIE
#-Q- curie  M_K = gmake
#-Q- curie  P_C = cpp
#-Q- curie  P_O = -P -C $(P_P)
#-Q- curie  F_C = vtf90 -vt:mpi -vt:f90 mpif90 -c -cpp
#-Q- curie  #-D- MD    F_D = -g
#-Q- curie  #-D- MN    F_D =
#-Q- curie  #-P- I4R4  F_P = -i4
#-Q- curie  #-P- I4R8  F_P = -i4 -r8
#-Q- curie  #-P- I8R8  F_P = -i8 -r8
#-Q- curie  F_O = -DCPP_PARA -DVTRACE -xHost -O3 $(F_D) $(F_P) -I$(MODDIR) -module $(MODDIR)
#-Q- curie  F_L = vtf90 -vt:mpi -vt:f90 mpif90
#-Q- curie  M_M = 0
#-Q- curie  L_X = 0
#-Q- curie  L_O =
#-Q- curie  A_C = ar -r
#-Q- curie  A_G = ar -x
#-Q- curie  C_C = cc -c
#-Q- curie  C_O =
#-Q- curie  C_L = cc
#-Q- curie  #-
#-Q- curie  NCDF_INC = $(NETCDF_INC_DIR)
#-Q- curie  NCDF_LIB = -L$(NETCDF_LIB_DIR) -lnetcdff -lnetcdf
#-Q- curie  #-

Follow the steps 3 to 7 described above.

NB (03/12/2012) : This test was done on Curie before the last two maintenance on Curie performed on october/november 2012.
The large nodes are replaced now by xlarge nodes (I haven't try yet). If you download modipsl now, the option -xHost has disappeared because it could give wrong results if the code is launched on thin nodes. The compilation option xHost was used because the code was compiled on large nodes and launched on the large nodes (the code is better optimized, TGGC personal communication). Try the following script (large is replaced by xlarge) :

#!/bin/bash

#MSUB -r TAG196_IO
#MSUB -n 48
#MSUB -T 1800
#MSUB -o orchidee.%I
#MSUB -e orchidee.%I
#MSUB -q xlarge
#MSUB -x
#MSUB -A gen6328

set -x
cd ${BRIDGE_MSUB_PWD}

module load netcdf/3.6.3
module load vampirtrace

export VT_MAX_FLUSHES=0

date 
time ccc_mprun ./orchidee_ol
date 

A trick for profiling LMDZ

  • You need to use vampirtrace_5.14.3
    module load vampirtrace/5.14.3
    
  • you need to change MPI_THREAD_SERIALIZED by MPI_THREAD_SINGLE in lmdz (if you are running in MPI without Openmp)
  • if all files <tracename>.*.events.z exists but not the one .otf, you can recreate him with the command :
     vtunify tracename
    
Last modified 4 years ago Last modified on 2020-03-19T16:25:39+01:00