更新时间:2022-06-21 21:31:13
I essentially already answer this question parallelizing-matrix-times-a-vector-by-columns-and-by-rows-with-openmp.
写入results[y]
时,您处于竞争状态.要解决此问题并仍然并行化内部循环,您必须制作私有版本的results[y]
,并行填充它们,然后将其合并到关键部分.
You have a race condition when you write to results[y]
. To fix this, and still parallelize the inner loop, you have to make private versions of results[y]
, fill them in parallel, and then merge them in a critical section.
在下面的代码中,我假设您正在使用double
,将其替换为float
或int
或您使用的任何数据类型(请注意,您的内部循环遍历了matrix[i][y]
的第一个索引缓存不友好).
In the code below I assume you're using double
, replace it with float
or int
or whatever datatype you're using (note that your inner loop goes over the first index of matrix[i][y]
which is cache unfriendly).
#pragma omp parallel num_threads(4)
{
int y,i;
double* results_private = (double*)calloc(matrix_size, sizeof(double));
for(y = 0; y < matrix_size ; y++) {
#pragma omp for
for(i = 0; i < matrix_size; i++) {
results_private[y] += vector[i]*matrix[i][y];
}
}
#pragma omp critical
{
for(y=0; y<matrix_size; y++) results[y] += results_private[y];
}
free(results_private);
}
如果这是家庭作业,并且您想给老师留下深刻的印象,则可以在没有关键部分的情况下进行合并.请参阅此链接以获取有关操作的想法
If this is homework assignment and you want to really impress your instructor then it's possible to do the merging without a critical section. See this link to get an idea on what to do fill-histograms-array-reduction-in-parallel-with-openmp-without-using-a-critic though I can't promise it will be faster.