You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/05/27 01:50:48 UTC

[GitHub] [arrow] westonpace opened a new pull request, #13245: ARROW-16574: [C++] TSAN failure in arrow-ipc-read-write-test

westonpace opened a new pull request, #13245:
URL: https://github.com/apache/arrow/pull/13245

   When reading multiple batches at the same time it was possible for concurrent attempts to increment `ReadStats` counters.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] cyb70289 merged pull request #13245: ARROW-16574: [C++] TSAN failure in arrow-ipc-read-write-test

Posted by GitBox <gi...@apache.org>.
cyb70289 merged PR #13245:
URL: https://github.com/apache/arrow/pull/13245


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on a diff in pull request #13245: ARROW-16574: [C++] TSAN failure in arrow-ipc-read-write-test

Posted by GitBox <gi...@apache.org>.
lidavidm commented on code in PR #13245:
URL: https://github.com/apache/arrow/pull/13245#discussion_r885540943


##########
cpp/src/arrow/ipc/reader.cc:
##########
@@ -1121,6 +1121,29 @@ class ARROW_EXPORT SelectiveIpcFileRecordBatchGenerator {
   int index_;
 };
 
+struct AtomicReadStats {
+  std::atomic<int64_t> num_messages{0};
+  std::atomic<int64_t> num_record_batches{0};
+  std::atomic<int64_t> num_dictionary_batches{0};
+  std::atomic<int64_t> num_dictionary_deltas{0};
+  std::atomic<int64_t> num_replaced_dictionaries{0};
+
+  /// \brief Capture a copy of the current counters
+  ///
+  /// It's possible to get inconsistent values.  For example, if
+  /// this method is called in the middle of a read you might have
+  /// a case where num_messages != num_record_batches + num_dictionary_batches

Review Comment:
   In my limited understanding, since the values all likely reside in the same cache line, there's likely false sharing anyways right? So we may as well use the mutex.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] cyb70289 commented on a diff in pull request #13245: ARROW-16574: [C++] TSAN failure in arrow-ipc-read-write-test

Posted by GitBox <gi...@apache.org>.
cyb70289 commented on code in PR #13245:
URL: https://github.com/apache/arrow/pull/13245#discussion_r886347121


##########
cpp/src/arrow/ipc/reader.cc:
##########
@@ -1121,6 +1121,29 @@ class ARROW_EXPORT SelectiveIpcFileRecordBatchGenerator {
   int index_;
 };
 
+struct AtomicReadStats {
+  std::atomic<int64_t> num_messages{0};
+  std::atomic<int64_t> num_record_batches{0};
+  std::atomic<int64_t> num_dictionary_batches{0};
+  std::atomic<int64_t> num_dictionary_deltas{0};
+  std::atomic<int64_t> num_replaced_dictionaries{0};
+
+  /// \brief Capture a copy of the current counters
+  ///
+  /// It's possible to get inconsistent values.  For example, if
+  /// this method is called in the middle of a read you might have
+  /// a case where num_messages != num_record_batches + num_dictionary_batches

Review Comment:
   > In my limited understanding, since the values all likely reside in the same cache line, there's likely false sharing anyways right? So we may as well use the mutex.
   
   Tested the effect of incrementing two different variables in same cache line concurrently.
   No obvious penalty for normal variables, I think most of the time the two varible are in each cpu's store buffer, most load/store never reaches the L1 cache.
   Big penalty for atomic fetch add, memory access is prefixed with `lock` operand and have to reach L1, results in heavy cache line contentions. It's much better if forcing the two variables in different cache line.
   
   Benchmark result available in the source code.
   https://gist.github.com/cyb70289/c9500d7f962a6f1d9b594ace9a9a358a
   
   AFAIK, false sharing may cause trouble if global data is mutated by cpus in different numa nodes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on a diff in pull request #13245: ARROW-16574: [C++] TSAN failure in arrow-ipc-read-write-test

Posted by GitBox <gi...@apache.org>.
westonpace commented on code in PR #13245:
URL: https://github.com/apache/arrow/pull/13245#discussion_r886238647


##########
cpp/src/arrow/ipc/reader.cc:
##########
@@ -1121,6 +1121,29 @@ class ARROW_EXPORT SelectiveIpcFileRecordBatchGenerator {
   int index_;
 };
 
+struct AtomicReadStats {

Review Comment:
   Done.



##########
cpp/src/arrow/ipc/reader.cc:
##########
@@ -1121,6 +1121,29 @@ class ARROW_EXPORT SelectiveIpcFileRecordBatchGenerator {
   int index_;
 };
 
+struct AtomicReadStats {
+  std::atomic<int64_t> num_messages{0};
+  std::atomic<int64_t> num_record_batches{0};
+  std::atomic<int64_t> num_dictionary_batches{0};
+  std::atomic<int64_t> num_dictionary_deltas{0};
+  std::atomic<int64_t> num_replaced_dictionaries{0};
+
+  /// \brief Capture a copy of the current counters
+  ///
+  /// It's possible to get inconsistent values.  For example, if
+  /// this method is called in the middle of a read you might have
+  /// a case where num_messages != num_record_batches + num_dictionary_batches
+  ReadStats poll() const {
+    ReadStats stats;
+    stats.num_messages = num_messages.load();

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] cyb70289 commented on a diff in pull request #13245: ARROW-16574: [C++] TSAN failure in arrow-ipc-read-write-test

Posted by GitBox <gi...@apache.org>.
cyb70289 commented on code in PR #13245:
URL: https://github.com/apache/arrow/pull/13245#discussion_r884444944


##########
cpp/src/arrow/ipc/reader.cc:
##########
@@ -1121,6 +1121,29 @@ class ARROW_EXPORT SelectiveIpcFileRecordBatchGenerator {
   int index_;
 };
 
+struct AtomicReadStats {
+  std::atomic<int64_t> num_messages{0};
+  std::atomic<int64_t> num_record_batches{0};
+  std::atomic<int64_t> num_dictionary_batches{0};
+  std::atomic<int64_t> num_dictionary_deltas{0};
+  std::atomic<int64_t> num_replaced_dictionaries{0};
+
+  /// \brief Capture a copy of the current counters
+  ///
+  /// It's possible to get inconsistent values.  For example, if
+  /// this method is called in the middle of a read you might have
+  /// a case where num_messages != num_record_batches + num_dictionary_batches
+  ReadStats poll() const {
+    ReadStats stats;
+    stats.num_messages = num_messages.load();

Review Comment:
   Nit: also use `memory_order_relaxed`?



##########
cpp/src/arrow/ipc/reader.cc:
##########
@@ -1121,6 +1121,29 @@ class ARROW_EXPORT SelectiveIpcFileRecordBatchGenerator {
   int index_;
 };
 
+struct AtomicReadStats {

Review Comment:
   Can we move this struct inside `RecordBatchFileReaderImpl`?



##########
cpp/src/arrow/ipc/reader.cc:
##########
@@ -1121,6 +1121,29 @@ class ARROW_EXPORT SelectiveIpcFileRecordBatchGenerator {
   int index_;
 };
 
+struct AtomicReadStats {
+  std::atomic<int64_t> num_messages{0};
+  std::atomic<int64_t> num_record_batches{0};
+  std::atomic<int64_t> num_dictionary_batches{0};
+  std::atomic<int64_t> num_dictionary_deltas{0};
+  std::atomic<int64_t> num_replaced_dictionaries{0};
+
+  /// \brief Capture a copy of the current counters
+  ///
+  /// It's possible to get inconsistent values.  For example, if
+  /// this method is called in the middle of a read you might have
+  /// a case where num_messages != num_record_batches + num_dictionary_batches
+  ReadStats poll() const {
+    ReadStats stats;
+    stats.num_messages = num_messages.load();

Review Comment:
   Nit: also use `memory_order_relaxed`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on a diff in pull request #13245: ARROW-16574: [C++] TSAN failure in arrow-ipc-read-write-test

Posted by GitBox <gi...@apache.org>.
westonpace commented on code in PR #13245:
URL: https://github.com/apache/arrow/pull/13245#discussion_r885060427


##########
cpp/src/arrow/ipc/reader.cc:
##########
@@ -1121,6 +1121,29 @@ class ARROW_EXPORT SelectiveIpcFileRecordBatchGenerator {
   int index_;
 };
 
+struct AtomicReadStats {
+  std::atomic<int64_t> num_messages{0};
+  std::atomic<int64_t> num_record_batches{0};
+  std::atomic<int64_t> num_dictionary_batches{0};
+  std::atomic<int64_t> num_dictionary_deltas{0};
+  std::atomic<int64_t> num_replaced_dictionaries{0};
+
+  /// \brief Capture a copy of the current counters
+  ///
+  /// It's possible to get inconsistent values.  For example, if
+  /// this method is called in the middle of a read you might have
+  /// a case where num_messages != num_record_batches + num_dictionary_batches

Review Comment:
   Does it make a difference that this is not new behavior?  We increment `num_messages` as soon as we figure out there is a new message.  We don't increment `num_record_batches` until we have confirmed that the new message is, in fact, a record batch.  This happens in an entirely different method altogether.
   
   I can move things around to fix this but I also think this is mainly an internal counter used for unit tests and so I'm not very motivated to do so.  Mainly I made this comment because I was trying to figure out if this was an invariant we already maintained (we didn't) that I needed to keep consistent with the new approach.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #13245: ARROW-16574: [C++] TSAN failure in arrow-ipc-read-write-test

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #13245:
URL: https://github.com/apache/arrow/pull/13245#issuecomment-1139204191

   https://issues.apache.org/jira/browse/ARROW-16574


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on a diff in pull request #13245: ARROW-16574: [C++] TSAN failure in arrow-ipc-read-write-test

Posted by GitBox <gi...@apache.org>.
westonpace commented on code in PR #13245:
URL: https://github.com/apache/arrow/pull/13245#discussion_r886245878


##########
cpp/src/arrow/ipc/reader.cc:
##########
@@ -1121,6 +1121,29 @@ class ARROW_EXPORT SelectiveIpcFileRecordBatchGenerator {
   int index_;
 };
 
+struct AtomicReadStats {
+  std::atomic<int64_t> num_messages{0};
+  std::atomic<int64_t> num_record_batches{0};
+  std::atomic<int64_t> num_dictionary_batches{0};
+  std::atomic<int64_t> num_dictionary_deltas{0};
+  std::atomic<int64_t> num_replaced_dictionaries{0};
+
+  /// \brief Capture a copy of the current counters
+  ///
+  /// It's possible to get inconsistent values.  For example, if
+  /// this method is called in the middle of a read you might have
+  /// a case where num_messages != num_record_batches + num_dictionary_batches

Review Comment:
   Looking at this further I'm more convinced this isn't needed.  For example, when reading rows, the number of messages will be incremented but the number of record batches will not.  So this isn't a very reliable invariant anyways.
   
   I went ahead and removed the comment itself to avoid confusion in the future.
   
   I'm still open to using a mutex instead of an atomic if we want to.  I only chose atomics for simplicity and not for performance.
   
   There does appear to be a slight impact to performance for very simple IPC reads from buffers which is unfortunate:
   
   Before:
   ```
   ReadBuffer/num_cols:1/is_partial:0/real_time_mean         2372 ns         2372 ns          100 bytes_per_second=411.778G/s
   ```
   After:
   ```
   ReadBuffer/num_cols:1/is_partial:0/real_time_mean         2525 ns         2525 ns          100 bytes_per_second=386.79G/s
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on a diff in pull request #13245: ARROW-16574: [C++] TSAN failure in arrow-ipc-read-write-test

Posted by GitBox <gi...@apache.org>.
westonpace commented on code in PR #13245:
URL: https://github.com/apache/arrow/pull/13245#discussion_r887322632


##########
cpp/src/arrow/ipc/reader.cc:
##########
@@ -1121,6 +1121,29 @@ class ARROW_EXPORT SelectiveIpcFileRecordBatchGenerator {
   int index_;
 };
 
+struct AtomicReadStats {
+  std::atomic<int64_t> num_messages{0};
+  std::atomic<int64_t> num_record_batches{0};
+  std::atomic<int64_t> num_dictionary_batches{0};
+  std::atomic<int64_t> num_dictionary_deltas{0};
+  std::atomic<int64_t> num_replaced_dictionaries{0};
+
+  /// \brief Capture a copy of the current counters
+  ///
+  /// It's possible to get inconsistent values.  For example, if
+  /// this method is called in the middle of a read you might have
+  /// a case where num_messages != num_record_batches + num_dictionary_batches

Review Comment:
   Thanks for the share.  We could maybe solve this particular problem with thread local stat counters that get accumulated when the stats are asked for but I agree with your earlier comment that we don't want to obsess over this too much until it becomes a problem for someone.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] cyb70289 commented on a diff in pull request #13245: ARROW-16574: [C++] TSAN failure in arrow-ipc-read-write-test

Posted by GitBox <gi...@apache.org>.
cyb70289 commented on code in PR #13245:
URL: https://github.com/apache/arrow/pull/13245#discussion_r886278338


##########
cpp/src/arrow/ipc/reader.cc:
##########
@@ -1121,6 +1121,29 @@ class ARROW_EXPORT SelectiveIpcFileRecordBatchGenerator {
   int index_;
 };
 
+struct AtomicReadStats {
+  std::atomic<int64_t> num_messages{0};
+  std::atomic<int64_t> num_record_batches{0};
+  std::atomic<int64_t> num_dictionary_batches{0};
+  std::atomic<int64_t> num_dictionary_deltas{0};
+  std::atomic<int64_t> num_replaced_dictionaries{0};
+
+  /// \brief Capture a copy of the current counters
+  ///
+  /// It's possible to get inconsistent values.  For example, if
+  /// this method is called in the middle of a read you might have
+  /// a case where num_messages != num_record_batches + num_dictionary_batches

Review Comment:
   > There does appear to be a slight impact to performance for very simple IPC reads from buffers which is unfortunate:
   
   Don't be obsessed to microbenchmark :-)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #13245: ARROW-16574: [C++] TSAN failure in arrow-ipc-read-write-test

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #13245:
URL: https://github.com/apache/arrow/pull/13245#discussion_r884962825


##########
cpp/src/arrow/ipc/reader.cc:
##########
@@ -1121,6 +1121,29 @@ class ARROW_EXPORT SelectiveIpcFileRecordBatchGenerator {
   int index_;
 };
 
+struct AtomicReadStats {
+  std::atomic<int64_t> num_messages{0};
+  std::atomic<int64_t> num_record_batches{0};
+  std::atomic<int64_t> num_dictionary_batches{0};
+  std::atomic<int64_t> num_dictionary_deltas{0};
+  std::atomic<int64_t> num_replaced_dictionaries{0};
+
+  /// \brief Capture a copy of the current counters
+  ///
+  /// It's possible to get inconsistent values.  For example, if
+  /// this method is called in the middle of a read you might have
+  /// a case where num_messages != num_record_batches + num_dictionary_batches

Review Comment:
   This sounds a bit undesirable. Would it be too expensive to use a mutex instead?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #13245: ARROW-16574: [C++] TSAN failure in arrow-ipc-read-write-test

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #13245:
URL: https://github.com/apache/arrow/pull/13245#discussion_r885296407


##########
cpp/src/arrow/ipc/reader.cc:
##########
@@ -1121,6 +1121,29 @@ class ARROW_EXPORT SelectiveIpcFileRecordBatchGenerator {
   int index_;
 };
 
+struct AtomicReadStats {
+  std::atomic<int64_t> num_messages{0};
+  std::atomic<int64_t> num_record_batches{0};
+  std::atomic<int64_t> num_dictionary_batches{0};
+  std::atomic<int64_t> num_dictionary_deltas{0};
+  std::atomic<int64_t> num_replaced_dictionaries{0};
+
+  /// \brief Capture a copy of the current counters
+  ///
+  /// It's possible to get inconsistent values.  For example, if
+  /// this method is called in the middle of a read you might have
+  /// a case where num_messages != num_record_batches + num_dictionary_batches

Review Comment:
   You're right, it is not new behaviour so we may not want to fix this anyway. OTOH, using a mutex instead of atomics should be an easy changes, a potential perf problem sounds like the only reasonable reason not to do it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on pull request #13245: ARROW-16574: [C++] TSAN failure in arrow-ipc-read-write-test

Posted by GitBox <gi...@apache.org>.
westonpace commented on PR #13245:
URL: https://github.com/apache/arrow/pull/13245#issuecomment-1143069477

   CI failures appear unrelated (although this is the first time I've seen [this Travis error](https://app.travis-ci.com/github/apache/arrow/jobs/571998348)).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org