You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/04/22 13:32:42 UTC

[GitHub] [arrow] lidavidm commented on a change in pull request #6744: PARQUET-1820: [C++] pre-buffer specified columns of row group

lidavidm commented on a change in pull request #6744:
URL: https://github.com/apache/arrow/pull/6744#discussion_r412985177



##########
File path: cpp/src/parquet/file_reader.h
##########
@@ -117,6 +117,15 @@ class PARQUET_EXPORT ParquetFileReader {
   // Returns the file metadata. Only one instance is ever created
   std::shared_ptr<FileMetaData> metadata() const;
 
+  /// Pre-buffer the specified column indices in all row groups.
+  ///
+  /// Only has an effect if ReaderProperties.is_coalesced_stream_enabled is set;
+  /// otherwise this is a no-op. The reader internally maintains a cache which is
+  /// overwritten each time this is called. Intended to increase performance on
+  /// high-latency filesystems (e.g. Amazon S3).
+  void PreBuffer(const std::vector<int>& row_groups,
+                 const std::vector<int>& column_indices);

Review comment:
       I changed this to accept the CacheOptions struct here, which gives the caller full control each time they invoke it.

##########
File path: cpp/src/parquet/file_reader.h
##########
@@ -117,6 +117,15 @@ class PARQUET_EXPORT ParquetFileReader {
   // Returns the file metadata. Only one instance is ever created
   std::shared_ptr<FileMetaData> metadata() const;
 
+  /// Pre-buffer the specified column indices in all row groups.
+  ///
+  /// Only has an effect if ReaderProperties.is_coalesced_stream_enabled is set;

Review comment:
       Yes - that makes more sense, combined with moving the flag to ArrowReaderProperties.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org