Custom Query (2547 matches)
Results (25 - 27 of 2547)
Ticket | Resolution | Summary | Owner | Reporter |
---|---|---|---|---|
#1057 | fixed | Bug in mppini_2.h90 which can result in communication deadlock with some partitioning (mainly evident at high processor counts) | acc | acc |
Description |
There appears to be a small error in mppini_2.h90 which results in the wrong northern neighbour being identified for the northernmost row of processors. This is a slightly redundant calculation anyway because the north-fold communications are dealt with separately and do not rely on the identified northern neighbour (nono). However, the northern neighbour is used to set the nbondj value which determines whether a region communicates: just to the north; both north and south; just to the south or neither way. At very high processor counts it is possible to end up with regions on the jpnj-1 row which send to the north but whose northern neighbour has been assigned a nbondj value of 2 (neither way). This results in deadlock at the first lbc_lnk call (usually in iom_get called by hgr_read) with the jpni-1 row processor waiting for a message that is never sent. The error (TBC) appears to be in this block of code: ipolj(ii,ij) = 0 IF( jperio == 3 .OR. jperio == 4 ) THEN ijm1 = jpni*(jpnj-1) imil = ijm1+(jpni+1)/2 IF( jarea > ijm1 ) ipolj(ii,ij) = 3 IF( MOD(jpni,2) == 1 .AND. jarea == imil ) ipolj(ii,ij) = 4 IF( ipolj(ii,ij) == 3 ) iono(ii,ij) = jpni*jpnj-jarea+ijm1 ENDIF which applies a north-fold condition to identify the northern neighbour. I believe the error is that the iono values should be MPI process numbers not the narea vaules as calculated. The iono array is referenced later during the elimination of land-only regions: DO jarea = 1, jpni*jpnj iproc = jarea-1 ii = 1 + MOD(jarea-1,jpni) ij = 1 + (jarea-1)/jpni IF( ipproc(ii,ij) == -1 .AND. iono(ii,ij) >= 0 & .AND. iono(ii,ij) <= jpni*jpnj-1 ) THEN iino = 1 + MOD(iono(ii,ij),jpni) ijno = 1 + (iono(ii,ij))/jpni IF( ibondj(iino,ijno) == 1 ) ibondj(iino,ijno)=2 IF( ibondj(iino,ijno) == 0 ) ibondj(iino,ijno) = -1 ENDIF and the mis-identification can lead to the problem described. The occurrence is rare ( e.g. 1 process out of 9014 resulting from a 110x120 partitioning of ORCA_R12) but catastrophic and difficult to trace. Fortunately, if this diagnosis is correct, the solution is trivial, simply replace: IF( ipolj(ii,ij) == 3 ) iono(ii,ij) = jpni*jpnj-jarea+ijm1 with IF( ipolj(ii,ij) == 3 ) iono(ii,ij) = jpni*jpnj-jarea+ijm1 - 1 Tests of this hypothesis are currently queued. |
|||
#1060 | fixed | minor alterations to iceberg trajectory component | acc | acc |
Description |
A couple of small corrections to the iceberg trajectory modules courtesy of Vladimir Ivchenko.
Code now reflects the version in active use. |
|||
#1216 | fixed | Temporary development branch for hosting surface wave components from ECMWF | acc | acc |
Description |
Working branch to import surface wave components from ECMWF. Changes will have to be ported to a proper 2014 branch of the trunk at a later date once v3.6alpha has been moved to the trunk. |