You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/01/04 10:42:49 UTC

[GitHub] [arrow] zhixingheyi-tian commented on a change in pull request #11763: ARROW-14153: [C++][Dataset] Add support for batch_size in the ORC Scanner

zhixingheyi-tian commented on a change in pull request #11763:
URL: https://github.com/apache/arrow/pull/11763#discussion_r777983208



##########
File path: cpp/src/arrow/adapters/orc/adapter.h
##########
@@ -231,6 +231,19 @@ class ARROW_EXPORT ORCFileReader {
   Status NextStripeReader(int64_t batch_size, const std::vector<int>& include_indices,
                           std::shared_ptr<RecordBatchReader>* out);
 
+  /// \brief Get a stripe level record batch iterator with specified row count
+  ///         in each record batch. NextStripeReader serves as a fine grain
+  ///         alternative to ReadStripe which may cause OOM issue by loading
+  ///         the whole stripes into memory.
+  ///
+  /// \param[in] batch_size Get a stripe level record batch iterator with specified row
+  /// count in each record batch.
+  ///
+  /// \param[in] include_names the selected field names to read
+  /// \param[out] out the returned stripe reader
+  Status NextBatchReader(int64_t batch_size, const std::vector<std::string>& include_names,

Review comment:
       To show the distinction, comparing the original  NextStripeReader  interface




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org