Form 60 (in 2017WP/HPC-03_Silvia Mocavero_globcomm)

Saved Values

in subcontext 'abstract'

implementation: 'Step 1 Analysis and classification have been completed. For each global communication, filename, routine, some notes about its function and the impact of the global communication on the routine execution time as well as the routine runtime percentage have been listed (see the attached file). Some notes: 1. SETTE configs have been used for the investigation 2. the domain decomposition has been set in order to have subdomains ~ 40x40 3. tests have been executed on more than one node to also check internode communications 4. the only global communication which seems to affect the total execution time is the trc_rad_sms, which spends about 40% of the routine execution time (which is ~4% of the total execution time) to execute 4 glob_sum. Step 2 • Regroup the three global communications of stp_ctl in a single one. ===>>> dev_r7832_HPC08_lbclnk_3rd_dim) • solver.stat file becomes run.stat file • instead of global sum of ssh2 put in solver.stat, we now write the maximum of |ssh| over the global domain • abort tests become: \ssh\_max >10m ; |U|_max > 10 m/s ; SSS_min < 0 • the mpi_allreduce is now performed on a vector of size=3 containing the local max of |U|, -SSS, and ssh2. Therefore only 1 phase of global communication is performed in stpctl.F90 instead of 3. ' by gm2018-01-06T10:51:59+01:00
manual: 'Using part 1 and 2, define the summary of changes to be done in the NEMO reference manual (tex files), and in the content of web pages.' by mocavero2017-06-19T17:10:31+02:00
description: 'The goal of this action is to remove some global communications (e.g. mpp_sum, mpp_max etc) in order to speedup the NEMO execution time. Step 1 (done in 2016): The plan is to list and classify the global communications, then to analyse which of them could be safely removed (with the help of Tim on stp_ctl). The listing can be easily performed by using a grep on the whole code. The classification allows understanding the function of each global communication (diagnostic, allocation error control, ....) and discussing which ones could be removed or conditioned. Step 2: The second step is to discuss with the previewer and (if needed) with the NEMO System Team which communications can be safely removed. • In the case of stp_ctl, the current 3 mpp_max calls have been reduced to one single. (see dev_r7832_HPC08_lbclnk_3rd_dim) • a isolated mpp_max applied on nstop in memo_gcm routine has been combined to the mpp_max called in stp_ctl (see commit #9210 in dev_merge_2017)' by gm2018-01-11T16:42:13+01:00

Change History

Changed on 2018-01-11T16:42:13+01:00 by gm:

  • description changed from
    The goal of this action is to remove some global communications (e.g. mpp_sum, mpp_max etc) in order to speedup the NEMO execution time. Step 1 (done in 2016): The plan is to list and classify the global communications, then to analyse which of them could be safely removed (with the help of Tim on stp_ctl). The listing can be easily performed by using a grep on the whole code. The classification allows understanding the function of each global communication (diagnostic, allocation error control, ....) and discussing which ones could be removed or conditioned. Step 2: The second step is to discuss with the previewer and (if needed) with the NEMO System Team which communications can be safely removed. • In the case of stp_ctl, the current 3 mpp_max calls have been reduced to one single.
    to
    The goal of this action is to remove some global communications (e.g. mpp_sum, mpp_max etc) in order to speedup the NEMO execution time. Step 1 (done in 2016): The plan is to list and classify the global communications, then to analyse which of them could be safely removed (with the help of Tim on stp_ctl). The listing can be easily performed by using a grep on the whole code. The classification allows understanding the function of each global communication (diagnostic, allocation error control, ....) and discussing which ones could be removed or conditioned. Step 2: The second step is to discuss with the previewer and (if needed) with the NEMO System Team which communications can be safely removed. • In the case of stp_ctl, the current 3 mpp_max calls have been reduced to one single. (see dev_r7832_HPC08_lbclnk_3rd_dim) • a isolated mpp_max applied on nstop in memo_gcm routine has been combined to the mpp_max called in stp_ctl (see commit #9210 in dev_merge_2017)

Changed on 2018-01-06T10:51:59+01:00 by gm:

  • implementation changed from
    Step 1 Analysis and classification have been completed. For each global communication, filename, routine, some notes about its function and the impact of the global communication on the routine execution time as well as the routine runtime percentage have been listed (see the attached file). Some notes: 1. SETTE configs have been used for the investigation 2. the domain decomposition has been set in order to have subdomains ~ 40x40 3. tests have been executed on more than one node to also check internode communications 4. the only global communication which seems to affect the total execution time is the trc_rad_sms, which spends about 40% of the routine execution time (which is ~4% of the total execution time) to execute 4 glob_sum. Step 2 Discussion with reviewer is needed.
    to
    Step 1 Analysis and classification have been completed. For each global communication, filename, routine, some notes about its function and the impact of the global communication on the routine execution time as well as the routine runtime percentage have been listed (see the attached file). Some notes: 1. SETTE configs have been used for the investigation 2. the domain decomposition has been set in order to have subdomains ~ 40x40 3. tests have been executed on more than one node to also check internode communications 4. the only global communication which seems to affect the total execution time is the trc_rad_sms, which spends about 40% of the routine execution time (which is ~4% of the total execution time) to execute 4 glob_sum. Step 2 • Regroup the three global communications of stp_ctl in a single one. ===>>> dev_r7832_HPC08_lbclnk_3rd_dim) • solver.stat file becomes run.stat file • instead of global sum of ssh2 put in solver.stat, we now write the maximum of |ssh| over the global domain • abort tests become: \ssh\_max >10m ; |U|_max > 10 m/s ; SSS_min < 0 • the mpi_allreduce is now performed on a vector of size=3 containing the local max of |U|, -SSS, and ssh2. Therefore only 1 phase of global communication is performed in stpctl.F90 instead of 3.
  • description changed from
    The goal of this action is to remove some global communications (e.g. mpp_sum, mpp_max etc) in order to speedup the NEMO execution time. Step 1 (done in 2016): The plan is to list and classify the global communications, then to analyse which of them could be safely removed (with the help of Tim on stp_ctl). The listing can be easily performed by using a grep on the whole code. The classification allows understanding the function of each global communication (diagnostic, allocation error control, ....) and discussing which ones could be removed or conditioned. Step 2: The second step is to discuss with the previewer and (if needed) with the NEMO System Team which communications can be safely removed. In the case of stp_ctl it is possible to reduce the existing 4 or 5 mpp_max calls to one single call by restructuring the code slightly.
    to
    The goal of this action is to remove some global communications (e.g. mpp_sum, mpp_max etc) in order to speedup the NEMO execution time. Step 1 (done in 2016): The plan is to list and classify the global communications, then to analyse which of them could be safely removed (with the help of Tim on stp_ctl). The listing can be easily performed by using a grep on the whole code. The classification allows understanding the function of each global communication (diagnostic, allocation error control, ....) and discussing which ones could be removed or conditioned. Step 2: The second step is to discuss with the previewer and (if needed) with the NEMO System Team which communications can be safely removed. • In the case of stp_ctl, the current 3 mpp_max calls have been reduced to one single.

Changed on 2017-11-14T12:24:58+01:00 by mocavero:

  • implementation changed from
    Analysis and classification have been completed. For each global communication, filename, routine, some notes about its function and the impact of the global communication on the routine execution time as well as the routine runtime percentage have been listed (see the attached file). Some notes: 1. SETTE configs have been used for the investigation 2. the domain decomposition has been set in order to have subdomains ~ 40x40 3. tests have been executed on more than one node to also check internode communications 4. the only global communication which seems to affect the total execution time is the trc_rad_sms, which spends about 40% of the routine execution time (which is ~4% of the total execution time) to execute 4 glob_sum.
    to
    Step 1 Analysis and classification have been completed. For each global communication, filename, routine, some notes about its function and the impact of the global communication on the routine execution time as well as the routine runtime percentage have been listed (see the attached file). Some notes: 1. SETTE configs have been used for the investigation 2. the domain decomposition has been set in order to have subdomains ~ 40x40 3. tests have been executed on more than one node to also check internode communications 4. the only global communication which seems to affect the total execution time is the trc_rad_sms, which spends about 40% of the routine execution time (which is ~4% of the total execution time) to execute 4 glob_sum. Step 2 Discussion with reviewer is needed.
  • description changed from
    The goal of this action is to remove some global communications (e.g. mpp_sum, mpp_max etc) in order to increase the speed of NEMO. Step 1 (done in 2016): The plan is to list and classify the global communications, then to analyse which of them could be safely removed (with the help of Tim on stp_ctl). The listing can be easily performed by using a grep on the whole code. The classification allows understanding the function of each global communication (diagnostic, allocation error control, ....) and discussing which ones could be removed or conditioned. Step 2: The second step is to discuss with the previewer and (if needed) with the NEMO System Team which communications can be safely removed. In the case of stp_ctl it is possible to reduce the existing 4 or 5 mpp_max calls to one single call by restructuring the code slightly.
    to
    The goal of this action is to remove some global communications (e.g. mpp_sum, mpp_max etc) in order to speedup the NEMO execution time. Step 1 (done in 2016): The plan is to list and classify the global communications, then to analyse which of them could be safely removed (with the help of Tim on stp_ctl). The listing can be easily performed by using a grep on the whole code. The classification allows understanding the function of each global communication (diagnostic, allocation error control, ....) and discussing which ones could be removed or conditioned. Step 2: The second step is to discuss with the previewer and (if needed) with the NEMO System Team which communications can be safely removed. In the case of stp_ctl it is possible to reduce the existing 4 or 5 mpp_max calls to one single call by restructuring the code slightly.

Changed on 2017-11-14T12:10:16+01:00 by mocavero:

  • implementation changed from
    Analysis and classification have been completed. For each global communication, filename, routine, some notes about its function and the impact of the global communication on the routine execution time as well as the routine runtime percentage have been listed. Some notes: 1. SETTE configs have been used for the investigation 2. the domain decomposition has been set in order to have subdomains ~ 40x40 3. tests have been executed on more than one node to also check internode communications 4. the only global communication which seems to affect the total execution time is the trc_rad_sms, which spends about 40% of the routine execution time (which is ~4% of the total execution time) to execute 4 glob_sum.
    to
    Analysis and classification have been completed. For each global communication, filename, routine, some notes about its function and the impact of the global communication on the routine execution time as well as the routine runtime percentage have been listed (see the attached file). Some notes: 1. SETTE configs have been used for the investigation 2. the domain decomposition has been set in order to have subdomains ~ 40x40 3. tests have been executed on more than one node to also check internode communications 4. the only global communication which seems to affect the total execution time is the trc_rad_sms, which spends about 40% of the routine execution time (which is ~4% of the total execution time) to execute 4 glob_sum.

Changed on 2017-06-22T17:56:41+02:00 by mocavero:

  • description changed from
    The goal of this action is to remove some global communications (e.g. mpp_sum, mpp_max etc) in order to increase the speed of NEMO. Part 1 (done in 2016): The plan is to list and classify the global communications, then to analyse which of them could be safely removed (with the help of Tim on stp_ctl). The listing can be easily performed by using a grep on the whole code. The classification allows understanding the function of each global communication (diagnostic, allocation error control, ....) and discussing which ones could be removed or conditioned. Part 2: The second step is to discuss with the previewer and (if needed) with the NEMO System Team which communications can be safely removed. In the case of stp_ctl it is possible to reduce the existing 4 or 5 mpp_max calls to one single call by restructuring the code slightly.
    to
    The goal of this action is to remove some global communications (e.g. mpp_sum, mpp_max etc) in order to increase the speed of NEMO. Step 1 (done in 2016): The plan is to list and classify the global communications, then to analyse which of them could be safely removed (with the help of Tim on stp_ctl). The listing can be easily performed by using a grep on the whole code. The classification allows understanding the function of each global communication (diagnostic, allocation error control, ....) and discussing which ones could be removed or conditioned. Step 2: The second step is to discuss with the previewer and (if needed) with the NEMO System Team which communications can be safely removed. In the case of stp_ctl it is possible to reduce the existing 4 or 5 mpp_max calls to one single call by restructuring the code slightly.

Changed on 2017-06-19T19:24:03+02:00 by mocavero:

Changed on 2017-06-19T19:24:03+02:00 by mocavero:

  • implementation changed from
    Describe flow chart of the changes in the code. List the .F90 files and modules to be changed. Analysis and classification have been completed The list will be provided after the discussion with the previwer. Detailed list of new variables (including namelists) to be defined. Give for each the chosen name (following coding rules) and definition. There should be no new variables
    to
    Analysis and classification have been completed. For each global communication, filename, routine, some notes about its function and the impact of the global communication on the routine execution time as well as the routine runtime percentage have been listed. Some notes: 1. SETTE configs have been used for the investigation 2. the domain decomposition has been set in order to have subdomains ~ 40x40 3. tests have been executed on more than one node to also check internode communications 4. the only global communication which seems to affect the total execution time is the trc_rad_sms, which spends about 40% of the routine execution time (which is ~4% of the total execution time) to execute 4 glob_sum.
  • description changed from
    The goal of this action is to remove some global communications (e.g. mpp_sum, mpp_max etc) in order to increase the speed of NEMO. Part 1 (done in 2016): The plan is to list and classify the global communications, then to analyse which of them could be safely removed (with the help of Tim on stp_ctl). The listing can be easily performed by using a grep on the whole code. The classification allows understanding the function of each global communication (diagnostic, allocation error control, ....) and discussing which ones could be removed or conditioned. Part 2: The second step is to discuss with the previwer and (if needed) with the NEMO System Team which communications can be safely removed. In the case of stp_ctl it is possible to reduce the existing 4 or 5 mpp_max calls to one single call by restructuring the code slightly.
    to
    The goal of this action is to remove some global communications (e.g. mpp_sum, mpp_max etc) in order to increase the speed of NEMO. Part 1 (done in 2016): The plan is to list and classify the global communications, then to analyse which of them could be safely removed (with the help of Tim on stp_ctl). The listing can be easily performed by using a grep on the whole code. The classification allows understanding the function of each global communication (diagnostic, allocation error control, ....) and discussing which ones could be removed or conditioned. Part 2: The second step is to discuss with the previewer and (if needed) with the NEMO System Team which communications can be safely removed. In the case of stp_ctl it is possible to reduce the existing 4 or 5 mpp_max calls to one single call by restructuring the code slightly.

Changed on 2017-06-19T17:10:31+02:00 by mocavero: