Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPI_Irecv and MPI_Send use the same buffer at the same time #18

Open
mawi2017 opened this issue Apr 25, 2023 · 1 comment · May be fixed by #19
Open

MPI_Irecv and MPI_Send use the same buffer at the same time #18

mawi2017 opened this issue Apr 25, 2023 · 1 comment · May be fixed by #19

Comments

@mawi2017
Copy link

Hi,

I ran miniFE's ref version with Intel MPI under the message checker from ITAC (Intel Trace Analyzer and Collector). The message checker detected issues LOCAL:MEMORY:OVERLAP and further LOCAL:MEMORY:ILLEGAL_MODIFICATION in ref/src/make_local_matrix.hpp where the same buffers are used for sending and receiving at the same time. From what I saw all other minFE's version should also be affected if they execute the corresponding code.

The affected code from ref/src/make_local_matrix.hpp is in lines 257ff:

  std::vector<MPI_Request> request(num_send_neighbors);
  for(int i=0; i<num_send_neighbors; ++i) {
    MPI_Irecv(&tmp_buffer[i], 1, mpi_dtype, MPI_ANY_SOURCE, MPI_MY_TAG,
              MPI_COMM_WORLD, &request[i]);
  }

  // send messages

  for(int i=0; i<num_recv_neighbors; ++i) {
    MPI_Send(&tmp_buffer[i], 1, mpi_dtype, recv_list[i], MPI_MY_TAG,
             MPI_COMM_WORLD);
  }

If both loops have a trip count > 0 then some buffers pointed to by the tmp_buffer array are used at the same time for sending and receiving.

The complete output and commands for reproducing:

$ git clone https://github.com/Mantevo/miniFE.git
$ cd miniFE/ref/src
$ # loaded module for intelmpi and itac
$ make
$ mpiexec -check-mpi -n 2 ./miniFE.x
...
      creating/filling mesh...0.000828028s, total time: 0.000828981
generating matrix structure...0.00868297s, total time: 0.00951195
         assembling FE data...0.00850797s, total time: 0.0180199
      imposing Dirichlet BC...0.00221992s, total time: 0.0202398
      imposing Dirichlet BC...0.00244904s, total time: 0.0226889
making matrix indices local...
[0] WARNING: LOCAL:MEMORY:OVERLAP: warning
[0] WARNING:    New send buffer overlaps with currently active receive buffer at address 0x17f0730.
[0] WARNING:    Control over active buffer was transferred to MPI at:
[0] WARNING:       MPI_Irecv(*buf=0x17f0730, count=1, datatype=MPI_INT, source=MPI_ANY_SOURCE, tag=99, comm=MPI_COMM_WORLD, *request=0x1c04470)
[0] WARNING:       _ZN6miniFE17make_local_matrixINS_9CSRMatrixIdiiEEEEvRT_ (/home/xyz/projects/miniFE/ref/src/./make_local_matrix.hpp:259)
[0] WARNING:       _ZN6miniFE6driverIdiiEEiRK3BoxRS1_RNS_10ParametersER8YAML_Doc (/home/xyz/projects/miniFE/ref/src/./driver.hpp:228)
[0] WARNING:       main (/home/xyz/projects/miniFE/ref/src/main.cpp:154)
[0] WARNING:       __libc_start_main (/usr/lib64/libc-2.28.so)
[0] WARNING:       _start (/home/xyz/projects/miniFE/ref/src/miniFE.x)
[0] WARNING:    Control over new buffer is about to be transferred to MPI at:
[0] WARNING:       MPI_Send(*buf=0x17f0730, count=1, datatype=MPI_INT, dest=1, tag=99, comm=MPI_COMM_WORLD)
[0] WARNING:       _ZN6miniFE17make_local_matrixINS_9CSRMatrixIdiiEEEEvRT_ (/home/xyz/projects/miniFE/ref/src/./make_local_matrix.hpp:266)
[0] WARNING:       _ZN6miniFE6driverIdiiEEiRK3BoxRS1_RNS_10ParametersER8YAML_Doc (/home/xyz/projects/miniFE/ref/src/./driver.hpp:228)
[0] WARNING:       main (/home/xyz/projects/miniFE/ref/src/main.cpp:154)
[0] WARNING:       __libc_start_main (/usr/lib64/libc-2.28.so)
[0] WARNING:       _start (/home/xyz/projects/miniFE/ref/src/miniFE.x)

[1] WARNING: LOCAL:MEMORY:OVERLAP: warning
[1] WARNING:    New send buffer overlaps with currently active receive buffer at address 0x11d48a0.
[1] WARNING:    Control over active buffer was transferred to MPI at:
[1] WARNING:       MPI_Irecv(*buf=0x11d48a0, count=1, datatype=MPI_INT, source=MPI_ANY_SOURCE, tag=99, comm=MPI_COMM_WORLD, *request=0x1219dc0)
[1] WARNING:       _ZN6miniFE17make_local_matrixINS_9CSRMatrixIdiiEEEEvRT_ (/home/xyz/projects/miniFE/ref/src/./make_local_matrix.hpp:259)
[1] WARNING:       _ZN6miniFE6driverIdiiEEiRK3BoxRS1_RNS_10ParametersER8YAML_Doc (/home/xyz/projects/miniFE/ref/src/./driver.hpp:228)
[1] WARNING:       main (/home/xyz/projects/miniFE/ref/src/main.cpp:154)
[1] WARNING:       __libc_start_main (/usr/lib64/libc-2.28.so)
[1] WARNING:       _start (/home/xyz/projects/miniFE/ref/src/miniFE.x)
[1] WARNING:    Control over new buffer is about to be transferred to MPI at:
[1] WARNING:       MPI_Send(*buf=0x11d48a0, count=1, datatype=MPI_INT, dest=0, tag=99, comm=MPI_COMM_WORLD)
[1] WARNING:       _ZN6miniFE17make_local_matrixINS_9CSRMatrixIdiiEEEEvRT_ (/home/xyz/projects/miniFE/ref/src/./make_local_matrix.hpp:266)
[1] WARNING:       _ZN6miniFE6driverIdiiEEiRK3BoxRS1_RNS_10ParametersER8YAML_Doc (/home/xyz/projects/miniFE/ref/src/./driver.hpp:228)
[1] WARNING:       main (/home/xyz/projects/miniFE/ref/src/main.cpp:154)
[1] WARNING:       __libc_start_main (/usr/lib64/libc-2.28.so)
[1] WARNING:       _start (/home/xyz/projects/miniFE/ref/src/miniFE.x)
1.09176s, total time: 1.11445
Starting CG solver ...
Initial Residual = 11.0289
Iteration = 20   Residual = 1.23424e-08
Final Resid Norm: 2.06977e-16

[0] INFO: LOCAL:MEMORY:OVERLAP: found 2 times (0 errors + 2 warnings), 0 reports were suppressed
[0] INFO: Found 2 problems (0 errors + 2 warnings), 0 reports were suppressed.

If I use more then 2 processes, e.g. 72, then some OVERLAP warnings turn into ILLEGAL_MODIFICATION errors:

[54] ERROR: LOCAL:MEMORY:ILLEGAL_MODIFICATION: error
[54] ERROR:    Read-only buffer was modified while owned by MPI.
[54] ERROR:    Control over buffer was transferred to MPI at:
[54] ERROR:       MPI_Send(*buf=0x9693c4, count=1, datatype=MPI_INT, dest=22, tag=99, comm=MPI_COMM_WORLD)
[54] ERROR:       _ZN6miniFE17make_local_matrixINS_9CSRMatrixIdiiEEEEvRT_ (/home/xyz/projects/miniFE/ref/src/./make_local_matrix.hpp:266)
[54] ERROR:       _ZN6miniFE6driverIdiiEEiRK3BoxRS1_RNS_10ParametersER8YAML_Doc (/home/xyz/projects/miniFE/ref/src/./driver.hpp:228)
[54] ERROR:       main (/home/xyz/projects/miniFE/ref/src/main.cpp:154)
[54] ERROR:       __libc_start_main (/usr/lib64/libc-2.28.so)
[54] ERROR:       _start (/home/xyz/projects/miniFE/ref/src/miniFE.x)
[54] ERROR:    Modified buffer detected at:
[54] ERROR:       MPI_Send(*buf=0x9693c4, count=1, datatype=MPI_INT, dest=22, tag=99, comm=MPI_COMM_WORLD)
[54] ERROR:       _ZN6miniFE17make_local_matrixINS_9CSRMatrixIdiiEEEEvRT_ (/home/xyz/projects/miniFE/ref/src/./make_local_matrix.hpp:266)
[54] ERROR:       _ZN6miniFE6driverIdiiEEiRK3BoxRS1_RNS_10ParametersER8YAML_Doc (/home/xyz/projects/miniFE/ref/src/./driver.hpp:228)
[54] ERROR:       main (/home/xyz/projects/miniFE/ref/src/main.cpp:154)
[54] ERROR:       __libc_start_main (/usr/lib64/libc-2.28.so)
[54] ERROR:       _start (/home/xyz/projects/miniFE/ref/src/miniFE.x)
@maherou
Copy link
Member

maherou commented Apr 25, 2023

@mawi2017 Thank you for reporting this issue. If you have a proposed fix, please feel free to submit a pull-request for review. We would appreciate your assistance in this way.

Thank you.

Mike

@mawi2017 mawi2017 linked a pull request May 2, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants