You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by yi...@apache.org on 2023/01/21 02:34:37 UTC

[arrow] branch master updated: GH-33655: [C++][Parquet] Fix occasional failure in TestArrowReadWrite.MultithreadedWrite (#33739)

This is an automated email from the ASF dual-hosted git repository.

yibocai pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
     new 0d9d132e91 GH-33655: [C++][Parquet] Fix occasional failure in TestArrowReadWrite.MultithreadedWrite (#33739)
0d9d132e91 is described below

commit 0d9d132e9140f26578369b5ef0b44d25c501e45d
Author: Gang Wu <us...@gmail.com>
AuthorDate: Sat Jan 21 10:34:28 2023 +0800

    GH-33655: [C++][Parquet] Fix occasional failure in TestArrowReadWrite.MultithreadedWrite (#33739)
    
    ### Rationale for this change
    
    This [commit](https://github.com/apache/arrow/commit/c8d6110a26c41966e539e9fa2f5cb8c31dc2f0fe) implements parallel column writing in the parquet writer. However, occasional test failure was observed from unit test `TestArrowReadWrite.MultithreadedWrite`. The root cause is an unintentional call of the copy constructor of `ArrowWriteContext` which results in the buffer sharing across all threads.
    
    ### What changes are included in this PR?
    
    This issue is fixed by inserting each context individually to avoid sharing buffers.
    
    ### Are these changes tested?
    
    This issue was observed by occasional `TestArrowReadWrite.MultithreadedWrite`. Make sure the test is recovered.
    
    Authored-by: Gang Wu <us...@gmail.com>
    Signed-off-by: Yibo Cai <yi...@arm.com>
---
 cpp/src/parquet/arrow/writer.cc | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/cpp/src/parquet/arrow/writer.cc b/cpp/src/parquet/arrow/writer.cc
index 9b51407234..6d22f318f6 100644
--- a/cpp/src/parquet/arrow/writer.cc
+++ b/cpp/src/parquet/arrow/writer.cc
@@ -291,8 +291,13 @@ class FileWriterImpl : public FileWriter {
         arrow_properties_(std::move(arrow_properties)),
         closed_(false) {
     if (arrow_properties_->use_threads()) {
-      parallel_column_write_contexts_.resize(schema_->num_fields(),
-                                             {pool, arrow_properties_.get()});
+      parallel_column_write_contexts_.reserve(schema_->num_fields());
+      for (int i = 0; i < schema_->num_fields(); ++i) {
+        // Explicitly create each ArrowWriteContext object to avoid unintentional
+        // call of the copy constructor. Otherwise, the buffers in the type of
+        // sharad_ptr will be shared among all contexts.
+        parallel_column_write_contexts_.emplace_back(pool, arrow_properties_.get());
+      }
     }
   }