You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/07/01 08:27:21 UTC

[GitHub] [arrow] westonpace commented on pull request #10568: ARROW-11889: [C++] Add parallelism to streaming CSV reader

westonpace commented on pull request #10568:
URL: https://github.com/apache/arrow/pull/10568#issuecomment-872039323


   I've rebased in the changes from #10509.  The behavior is only slightly different.  Opening the streaming CSV reader reads in the first record batch so the bytes_read will reflect that before any batch is read.  After that each time a batch is read in the next batch will be read in.  This means the read will not increment bytes_read.  If reading in parallel then bytes_read could potentially be even further ahead of the consumer since it will be doing decoding in readahead.  It should still match the spirit of the feature which is to report how many bytes have been decoded.
   
   @n3world @pitrou review is welcome.  The CI failure is unrelated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org