OpenMP (www.openmp.org) is an emerging standard syntax for writing parallel loop programs for shared-memory machines, and is supported by optimising compilers from most major vendors. It is also used as a target for compilers which attempt automatic parallelisation.
Here is an extract from a Jacobi solve which should give the idea:
!$omp parallel * Copy new solution into old !$omp do do j=1,m do i=1,n uold(i,j) = u(i,j) enddo enddo * Compute stencil, residual, & update !$omp do private(resid) reduction(+:error) do j = 2,m-1 do i = 2,n-1 * Evaluate residual resid = (ax*(uold(i-1,j) + uold(i+1,j)) & + ay*(uold(i,j-1) + uold(i,j+1)) & + b * uold(i,j) - f(i,j))/b * Update solution u(i,j) = uold(i,j) - omega * resid * Accumulate residual error error = error + resid*resid end do enddo !$omp enddo nowait !$omp end parallel(this is an extract from http://www.openmp.org/index.cgi?samples+samples/jacobi.html). Explanation:
!$omp do private(resid) reduction(+:error)tells the compiler that the loop can be executed in parallel, but that
The performance differences for write-invalidate and write-update schemes can arise from both bandwidth consumption and latency. Assume a memory system with 64-byte cache blocks.
Consider the following program fragment:
/* processor 1 */ |
A = 0; |
![]() |
A = 1; |
if (B == 0) { |
P(); |
} |
/* processor 2 */ |
B = 0; |
![]() |
B = 1; |
if (A == 0) { |
P(); |
} |
Can both processors execute P simultaneously? Explain your answer carefully.
Can both processors execute P simultaneously? Explain your answer carefully.
What precautions are needed to preserve the expected behaviour?
Can both processors execute P simultaneously? Explain your answer carefully.
What precautions are needed to preserve the expected behaviour?
Paul Kelly, Imperial College, December 2001
The performance differences for write-invalidate and write-update schemes can arise from both bandwidth consumption and latency. Assume a memory system with 64-byte cache blocks.
do it=1,niters !$omp parallel !$omp do do i=1,m do j=1,n A(i,j) = A(i,j)*2 enddo enddo !$omp do do i=1,m do j=1,n B(i,j) = A(j,i)+B(i,j) enddo enddo !$omp end parallel enddoAssume that each processor executes the same subset of i iterations at each it iteration. The second loop uses A in transposed layout. In the first loop each processor generates a column A(i,1:n). In the second loop, each processor uses a word from each column, thus receiving data from all the processors.
To construct an example for which invalidate is the better policy, we need to modify the program above so that several assignments are made to each element of A before the data is read by another processor. How about:
do it=1,niters !$omp parallel do it2=1,3 !$omp do do i=1,m do j=1,n A(i,j) = A(i,j)*2 enddo enddo enddo !$omp do do i=1,m do j=1,n B(i,j) = A(j,i)+B(i,j) enddo enddo !$omp end parallel enddoI have added a loop around the first loop nest, repating it three times. Note that these three iterations are executed one after the other: the i loops remain the only parallel loops.
Consider the following program fragment:
/* processor 1 */ |
A = 0; |
![]() |
A = 1; |
if (B == 0) { |
P(); |
} |
/* processor 2 */ |
B = 0; |
![]() |
B = 1; |
if (A == 0) { |
P(); |
} |
Can both processors execute P simultaneously? Explain your answer carefully. Answer: No. If processor 2 reaches its if first, it must have executed B=1 - so when processor 2 gets to its if it will find B!=0. Ditto if processor 1 is first. In the unlikely event that the processors are in step with one another precisely, it is possible that neither of them executes P.
This reasoning relies on the idea of ``strong sequential'' memory consistency - that execution is a serial interleaving of the two processors operations consistent with the programorder of them both.
Can both processors execute P simultaneously? Explain your answer carefully.
What precautions are needed to preserve the expected behaviour? Answer: Yes, both processors can execute P simultaneously, if you are not careful. Sending the invalidation message is not enough - you need to make sure the invalidation has been done before proceeding.
You could say that the test read must stall until all invalidates/updates on their way have arrived. This isn't very helpful. Smarter is to focus on when the instruction after the write can proceed: it must wait til the invalidate has been acknowledged. But why is this what counts? It depends on a careful examination of the logic behind the program's behaviour. If the write and the read were executed/effected out of order, the mutual exclusion would fail.
Can both processors execute P simultaneously? Explain your answer carefully.
What precautions are needed to preserve the expected behaviour? Answer: This is a bit of a trick question - the answer is the same as above.
(Paul Kelly, Imperial College, November 2001)