You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/07/27 04:50:10 UTC

[GitHub] [arrow] westonpace commented on a change in pull request #10794: ARROW-13441: [C++][CSV] Skip empty batches in column decoder

westonpace commented on a change in pull request #10794:
URL: https://github.com/apache/arrow/pull/10794#discussion_r677115339



##########
File path: cpp/src/arrow/csv/column_decoder.cc
##########
@@ -198,6 +198,11 @@ Result<std::shared_ptr<Array>> InferringColumnDecoder::RunInference(
 
 Future<std::shared_ptr<Array>> InferringColumnDecoder::Decode(
     const std::shared_ptr<BlockParser>& parser) {
+  if (parser->num_rows() == 0) {

Review comment:
       ```suggestion
     // Note: Care should be taken to discard these batches as the data type will be null
     // and won't match future decoded arrays.
     if (parser->num_rows() == 0) {
   ```
   
   Nit: Not necessary but might help explain to future readers.  Optionally you could also return `nullptr` here so it is clear it is invalid but then that will cause some confusion below between EOF and invalidly parsed batch.  I'll defer to whatever you think is best.

##########
File path: cpp/src/arrow/csv/reader.cc
##########
@@ -901,13 +901,31 @@ class StreamingReaderImpl : public ReaderMixin,
 
     auto self = shared_from_this();
     return rb_gen().Then([self, rb_gen, max_readahead](const DecodedBlock& first_block) {
-      return self->InitAfterFirstBatch(first_block, std::move(rb_gen), max_readahead);
+      return self->InitFromBlock(first_block, std::move(rb_gen), max_readahead, 0);
     });
   }
 
-  Status InitAfterFirstBatch(const DecodedBlock& first_block,
-                             AsyncGenerator<DecodedBlock> batch_gen, int max_readahead) {
-    schema_ = first_block.record_batch->schema();
+  Future<> InitFromBlock(const DecodedBlock& block,
+                         AsyncGenerator<DecodedBlock> batch_gen, int max_readahead,
+                         int64_t prev_bytes_processed) {
+    if (!block.record_batch) {
+      // End of file just return null batches
+      record_batch_gen_ = []() { return std::shared_ptr<RecordBatch>(); };

Review comment:
       ```suggestion
         record_batch_gen_ = MakeEmptyGenerator<std::shared_ptr<RecordBatch>>();
   ```
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org