You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/05/03 08:14:23 UTC

[GitHub] [arrow] emkornfield opened a new pull request #7089: ARROW-8657: [C++][Python] Add separate configuration for data pages

emkornfield opened a new pull request #7089:
URL: https://github.com/apache/arrow/pull/7089


   - Adds a separate write config to determine which version of
     data page to use.
   - Plumb this throught to python.
   - At the moment version and data page version are completely
     independent.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7089: ARROW-8657: [C++][Python] Add separate configuration for data pages

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7089:
URL: https://github.com/apache/arrow/pull/7089#issuecomment-624166234


   Added more verbose comments. I'll merge this pending CI


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on a change in pull request #7089: ARROW-8657: [C++][Python] Add separate configuration for data pages

Posted by GitBox <gi...@apache.org>.
wesm commented on a change in pull request #7089:
URL: https://github.com/apache/arrow/pull/7089#discussion_r420236495



##########
File path: cpp/src/parquet/properties.h
##########
@@ -34,10 +34,17 @@
 
 namespace parquet {
 
+/// Control for data types in parquet.
 struct ParquetVersion {
   enum type { PARQUET_1_0, PARQUET_2_0 };
 };
 
+/// Controls layout of data pages.
+/// parquet-format v2.0.0 introduced a data page metadata

Review comment:
       A "new" missing here




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on a change in pull request #7089: ARROW-8657: [C++][Python] Add separate configuration for data pages

Posted by GitBox <gi...@apache.org>.
wesm commented on a change in pull request #7089:
URL: https://github.com/apache/arrow/pull/7089#discussion_r419458260



##########
File path: cpp/src/parquet/properties.h
##########
@@ -34,10 +34,14 @@
 
 namespace parquet {
 
+/// Control for data types in parquet.
 struct ParquetVersion {
   enum type { PARQUET_1_0, PARQUET_2_0 };
 };
 
+/// Controls layout of data pages.

Review comment:
       parquet-format v2.0.0 introduced a data page metadata and serialized page structure (for example, encoded levels are no longer compressed)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] fsaintjacques commented on a change in pull request #7089: ARROW-8657: [C++][Python] Add separate configuration for data pages

Posted by GitBox <gi...@apache.org>.
fsaintjacques commented on a change in pull request #7089:
URL: https://github.com/apache/arrow/pull/7089#discussion_r419460990



##########
File path: cpp/src/parquet/properties.h
##########
@@ -34,10 +34,14 @@
 
 namespace parquet {
 
+/// Control for data types in parquet.
 struct ParquetVersion {
   enum type { PARQUET_1_0, PARQUET_2_0 };
 };
 
+/// Controls layout of data pages.

Review comment:
       I'll clarify, I'm asking this to be put in a doc comment for whomever read this and doesn't know which one to use.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #7089: ARROW-8657: [C++][Python] Add separate configuration for data pages

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #7089:
URL: https://github.com/apache/arrow/pull/7089#issuecomment-623073094


   https://issues.apache.org/jira/browse/ARROW-8657


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7089: ARROW-8657: [C++][Python] Add separate configuration for data pages

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7089:
URL: https://github.com/apache/arrow/pull/7089#issuecomment-623480713


   Sorry fat-fingered the review request. I will take a look at this 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] fsaintjacques commented on a change in pull request #7089: ARROW-8657: [C++][Python] Add separate configuration for data pages

Posted by GitBox <gi...@apache.org>.
fsaintjacques commented on a change in pull request #7089:
URL: https://github.com/apache/arrow/pull/7089#discussion_r419460990



##########
File path: cpp/src/parquet/properties.h
##########
@@ -34,10 +34,14 @@
 
 namespace parquet {
 
+/// Control for data types in parquet.
 struct ParquetVersion {
   enum type { PARQUET_1_0, PARQUET_2_0 };
 };
 
+/// Controls layout of data pages.

Review comment:
       I'll clarify, I'm asking this to be put in a doc comment for whoever read this and doesn't know which one to use.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on a change in pull request #7089: ARROW-8657: [C++][Python] Add separate configuration for data pages

Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #7089:
URL: https://github.com/apache/arrow/pull/7089#discussion_r419852845



##########
File path: cpp/src/parquet/properties.h
##########
@@ -34,10 +34,14 @@
 
 namespace parquet {
 
+/// Control for data types in parquet.
 struct ParquetVersion {
   enum type { PARQUET_1_0, PARQUET_2_0 };
 };
 
+/// Controls layout of data pages.

Review comment:
       Added Wes's comment.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] fsaintjacques commented on a change in pull request #7089: ARROW-8657: [C++][Python] Add separate configuration for data pages

Posted by GitBox <gi...@apache.org>.
fsaintjacques commented on a change in pull request #7089:
URL: https://github.com/apache/arrow/pull/7089#discussion_r419392424



##########
File path: cpp/src/parquet/properties.h
##########
@@ -34,10 +34,14 @@
 
 namespace parquet {
 
+/// Control for data types in parquet.
 struct ParquetVersion {
   enum type { PARQUET_1_0, PARQUET_2_0 };
 };
 
+/// Controls layout of data pages.

Review comment:
       Could you add more background on the differences and why you'd want that?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org