You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by yi...@apache.org on 2023/01/21 02:34:37 UTC
[arrow] branch master updated: GH-33655: [C++][Parquet] Fix occasional failure in TestArrowReadWrite.MultithreadedWrite (#33739)
This is an automated email from the ASF dual-hosted git repository.
yibocai pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git
The following commit(s) were added to refs/heads/master by this push:
new 0d9d132e91 GH-33655: [C++][Parquet] Fix occasional failure in TestArrowReadWrite.MultithreadedWrite (#33739)
0d9d132e91 is described below
commit 0d9d132e9140f26578369b5ef0b44d25c501e45d
Author: Gang Wu <us...@gmail.com>
AuthorDate: Sat Jan 21 10:34:28 2023 +0800
GH-33655: [C++][Parquet] Fix occasional failure in TestArrowReadWrite.MultithreadedWrite (#33739)
### Rationale for this change
This [commit](https://github.com/apache/arrow/commit/c8d6110a26c41966e539e9fa2f5cb8c31dc2f0fe) implements parallel column writing in the parquet writer. However, occasional test failure was observed from unit test `TestArrowReadWrite.MultithreadedWrite`. The root cause is an unintentional call of the copy constructor of `ArrowWriteContext` which results in the buffer sharing across all threads.
### What changes are included in this PR?
This issue is fixed by inserting each context individually to avoid sharing buffers.
### Are these changes tested?
This issue was observed by occasional `TestArrowReadWrite.MultithreadedWrite`. Make sure the test is recovered.
Authored-by: Gang Wu <us...@gmail.com>
Signed-off-by: Yibo Cai <yi...@arm.com>
---
cpp/src/parquet/arrow/writer.cc | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/cpp/src/parquet/arrow/writer.cc b/cpp/src/parquet/arrow/writer.cc
index 9b51407234..6d22f318f6 100644
--- a/cpp/src/parquet/arrow/writer.cc
+++ b/cpp/src/parquet/arrow/writer.cc
@@ -291,8 +291,13 @@ class FileWriterImpl : public FileWriter {
arrow_properties_(std::move(arrow_properties)),
closed_(false) {
if (arrow_properties_->use_threads()) {
- parallel_column_write_contexts_.resize(schema_->num_fields(),
- {pool, arrow_properties_.get()});
+ parallel_column_write_contexts_.reserve(schema_->num_fields());
+ for (int i = 0; i < schema_->num_fields(); ++i) {
+ // Explicitly create each ArrowWriteContext object to avoid unintentional
+ // call of the copy constructor. Otherwise, the buffers in the type of
+ // sharad_ptr will be shared among all contexts.
+ parallel_column_write_contexts_.emplace_back(pool, arrow_properties_.get());
+ }
}
}