New URL for NEMO forge!   http://forge.nemo-ocean.eu

Since March 2022 along with NEMO 4.2 release, the code development moved to a self-hosted GitLab.
This present forge is now archived and remained online for history.
#1726 (HPC-4-Scalability with MPI-OPENMP) – NEMO

Opened 8 years ago

Closed 7 years ago

Last modified 6 years ago

#1726 closed Task (fixed)

HPC-4-Scalability with MPI-OPENMP

Reported by: mocavero Owned by: mocavero
Priority: low Milestone:
Component: OCE Version: trunk
Severity: Keywords:
Cc: timgraham

Description

Development branch related to HPC-4 shared action of 2016 work plan.

Starting from trunk at revision r6519.

Commit History (0)

(No commits)

Change History (7)

comment:1 Changed 7 years ago by clem

Dear all,

I had a look at the changes in the trunk resulting from changeset #7698, and I am very much concerned by the increased complexity and the lack of readability it generates (at least in LIM3).
I understand we need to perform better in terms of CPU but then the code has become much longer and much more complex to develop. I am not sure it is the good strategy. Do we really have to stick on that or could we find a better way because it will not make my life easier.
For example, all the WHERE statements have disappeared for a DO loop with a IF, all the calculations on whole sub-domains have been erased etc. (see below)

A simple line like that:

asum(:,:) = ato_i(:,:) + SUM( a_i, dim=3 ) 

has been replaced by:

!$OMP PARALLEL
!$OMP DO schedule(static) private(jj,ji)
         DO jj = 1, jpj
            DO ji = 1, jpi
               asum(ji,jj) = 0._wp
               z_ai(ji,jj) = 0._wp
            END DO
         END DO
         DO jl = 1, jpl
!$OMP DO schedule(static) private(jj,ji)
            DO jj = 1, jpj
               DO ji = 1, jpi
                  z_ai(ji,jj) = z_ai(ji,jj) + a_i(ji,jj,jl)
               END DO
            END DO
         END DO
!$OMP DO schedule(static) private(jj,ji)
         DO jj = 1, jpj
            DO ji = 1, jpi
               asum(ji,jj) = ato_i(ji,jj) + z_ai(ji,jj)
            END DO
         END DO

Another example:

      zalb(:,:) = 0._wp 
      WHERE     ( at_i_b <= epsi06 )  ;  zalb(:,:) = 0.066_wp 
      ELSEWHERE                       ;  zalb(:,:) = SUM( alb_ice * a_i_b, dim=3 ) / at_i_b 
      END WHERE 

has been replaced by

!$OMP PARALLEL
!$OMP DO schedule(static) private(jj,ji)
      DO jj = 1, jpj
         DO ji = 1, jpi
            zalb(ji,jj) = 0._wp
         END DO
      END DO
!$OMP DO schedule(static) private(jj,ji,jl)
      DO jj = 1, jpj
         DO ji = 1, jpi
            IF ( at_i_b(ji,jj) <= epsi06 ) THEN
               zalb(ji,jj) = 0.066_wp
            ELSE
               DO jl = 1, jpl
                  zalb(ji,jj) = zalb(ji,jj) + ( alb_ice(ji,jj,jl) * a_i_b(ji,jj,jl) ) / at_i_b(ji,jj)
               END DO
            END IF
          END DO
      END DO
!$OMP END PARALLEL

comment:2 Changed 7 years ago by mocavero

Dear Clement,

the increase of complexity and the lack of readability is mainly due to the explicitation of loop indexes. Indeed, if you look at the code examples without considering 'do loop' and OMP directives, the code is almost the same. Explicit loops are needed to introduce the hybrid parallelization since the OpenMP workshare construct is not supported by all the compilers.

I understand that this requires an extra effort for the developers and we discussed about this during the last Merge Party.

We could discuss again during the next ST videoconference.

comment:3 Changed 7 years ago by aumont

Dear all,

Following Clement's comment, I am also concerned by the changes that were made in the code. As Clément, these changes do change the readability of the code and add some additional constraints to the way coding should be performed. Furthermore, after rapidly parsing through the changes, I am wondering whether everything has been tested out since some changes may not be fully neutral (it's an impression, I don't claim that there are potential bugs). All those changes would be perfectly justified in case of a substantial computing gain, but that does not seem to be the case. Thus the code is longer, less readable, and requires some new coding rules and the CPU gain is modest if not inexistant (so far). That's why I am wondering whether it should stay in the trunk for now.

Cheers
Olivier

comment:4 Changed 7 years ago by cetlod

Dear all, 

I had a carefully look of the changes made in the code -  especially in TOP-PISCES - to introduce the hybrid parallelization. I imagine this has required a hard work but my opinion is that it is very intrusive. Given that PISCES is a model, shared with some of others OGCM and their developers community, it becomes very hard to the "PISCES consortium" to keep in the model  the way the hybrid parallelization has been implemented for now. I also agree with Olivier's analysis about the readability and the need of some expertise in new codings rules.  I guess that the LIM team has also the same constraints and concerns ...

Cheers

Christian

comment:5 Changed 7 years ago by cbricaud

Hello

Effectively, there are a lot of small changes ( https://forge.ipsl.jussieu.fr/nemo/changeset/7698 ).
But I don't think that we have not really lost something readability, the changes in the code are clean.

At this stage of the tests, I am not sure that someone has shown that we will gain some CPU ( we conserve the performance of "MPI-only" with hybrid parallelisation )

But:

-In some case, with MPI, we know that we need to need to depopulate nodes to decrease communications in the nodes and improve perfomance ; the problem in that case is we need to increase the number of nodes and the CPU cost is higher.
Perphaps using openMP could avoid to depopulate nodes ; I can ask Mondher to make some new tests.

-A lot of work has been done by Silvia during the last years and I think we can not decide to remove openMP just for a question of readability.

So, I think we should keep openMP for the moment and continue the tests.

Clement B

comment:6 Changed 7 years ago by mocavero

  • Resolution set to fixed
  • Status changed from new to closed

The development has been completed even if it was not included in the trunk due to the increase of complexity and the lack of readability. Alternative strategies for NEMO hybridisation will be investigated.

comment:7 Changed 6 years ago by nemo

  • Type changed from Development to Task

Remove 'Development' type

Note: See TracTickets for help on using tickets.