You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "wgtmac (via GitHub)" <gi...@apache.org> on 2023/07/08 14:29:45 UTC

[GitHub] [arrow] wgtmac commented on a diff in pull request #36510: PARQUET-2321: [C++] allow customized buffer size when creating ArrowInputStream for a column PageReader

wgtmac commented on code in PR #36510:
URL: https://github.com/apache/arrow/pull/36510#discussion_r1257275770


##########
cpp/src/parquet/properties.h:
##########
@@ -64,7 +64,8 @@ class PARQUET_EXPORT ReaderProperties {
   MemoryPool* memory_pool() const { return pool_; }
 
   std::shared_ptr<ArrowInputStream> GetStream(std::shared_ptr<ArrowInputFile> source,
-                                              int64_t start, int64_t num_bytes);
+                                              int64_t start, int64_t num_bytes,
+                                              int64_t buffer_size = -1);

Review Comment:
   What about `std::optional<int64_t> buffer_size`?



##########
cpp/src/parquet/file_reader.cc:
##########
@@ -66,7 +66,7 @@ static constexpr int64_t kMaxDictHeaderSize = 100;
 RowGroupReader::RowGroupReader(std::unique_ptr<Contents> contents)
     : contents_(std::move(contents)) {}
 
-std::shared_ptr<ColumnReader> RowGroupReader::Column(int i) {
+std::shared_ptr<ColumnReader> RowGroupReader::Column(int i, int64_t buffer_size) {

Review Comment:
   TBH, this additional parameter looks a little bit weird here.



##########
cpp/src/parquet/file_reader.h:
##########
@@ -189,6 +190,9 @@ class PARQUET_EXPORT ParquetFileReader {
   ::arrow::Future<> WhenBuffered(const std::vector<int>& row_groups,
                                  const std::vector<int>& column_indices) const;
 
+  /// Return the range of the specified column chunk.
+  ::arrow::io::ReadRange GetColumnChunkRange(int row_group_index, int column_index);

Review Comment:
   This is not used any where?



##########
cpp/src/parquet/file_reader.h:
##########
@@ -44,7 +44,8 @@ class PARQUET_EXPORT RowGroupReader {
   // An implementation of the Contents class is defined in the .cc file
   struct Contents {
     virtual ~Contents() {}
-    virtual std::unique_ptr<PageReader> GetColumnPageReader(int i) = 0;
+    virtual std::unique_ptr<PageReader> GetColumnPageReader(int i,

Review Comment:
   I didn't see how the file reader deal with this new parameter. Is it intended for the caller to pass a good buffer_size?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org