You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/05/03 08:14:23 UTC
[GitHub] [arrow] emkornfield opened a new pull request #7089: ARROW-8657: [C++][Python] Add separate configuration for data pages
emkornfield opened a new pull request #7089:
URL: https://github.com/apache/arrow/pull/7089
- Adds a separate write config to determine which version of
data page to use.
- Plumb this throught to python.
- At the moment version and data page version are completely
independent.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7089: ARROW-8657: [C++][Python] Add separate configuration for data pages
Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7089:
URL: https://github.com/apache/arrow/pull/7089#issuecomment-624166234
Added more verbose comments. I'll merge this pending CI
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] wesm commented on a change in pull request #7089: ARROW-8657: [C++][Python] Add separate configuration for data pages
Posted by GitBox <gi...@apache.org>.
wesm commented on a change in pull request #7089:
URL: https://github.com/apache/arrow/pull/7089#discussion_r420236495
##########
File path: cpp/src/parquet/properties.h
##########
@@ -34,10 +34,17 @@
namespace parquet {
+/// Control for data types in parquet.
struct ParquetVersion {
enum type { PARQUET_1_0, PARQUET_2_0 };
};
+/// Controls layout of data pages.
+/// parquet-format v2.0.0 introduced a data page metadata
Review comment:
A "new" missing here
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] wesm commented on a change in pull request #7089: ARROW-8657: [C++][Python] Add separate configuration for data pages
Posted by GitBox <gi...@apache.org>.
wesm commented on a change in pull request #7089:
URL: https://github.com/apache/arrow/pull/7089#discussion_r419458260
##########
File path: cpp/src/parquet/properties.h
##########
@@ -34,10 +34,14 @@
namespace parquet {
+/// Control for data types in parquet.
struct ParquetVersion {
enum type { PARQUET_1_0, PARQUET_2_0 };
};
+/// Controls layout of data pages.
Review comment:
parquet-format v2.0.0 introduced a data page metadata and serialized page structure (for example, encoded levels are no longer compressed)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] fsaintjacques commented on a change in pull request #7089: ARROW-8657: [C++][Python] Add separate configuration for data pages
Posted by GitBox <gi...@apache.org>.
fsaintjacques commented on a change in pull request #7089:
URL: https://github.com/apache/arrow/pull/7089#discussion_r419460990
##########
File path: cpp/src/parquet/properties.h
##########
@@ -34,10 +34,14 @@
namespace parquet {
+/// Control for data types in parquet.
struct ParquetVersion {
enum type { PARQUET_1_0, PARQUET_2_0 };
};
+/// Controls layout of data pages.
Review comment:
I'll clarify, I'm asking this to be put in a doc comment for whomever read this and doesn't know which one to use.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7089: ARROW-8657: [C++][Python] Add separate configuration for data pages
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #7089:
URL: https://github.com/apache/arrow/pull/7089#issuecomment-623073094
https://issues.apache.org/jira/browse/ARROW-8657
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7089: ARROW-8657: [C++][Python] Add separate configuration for data pages
Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7089:
URL: https://github.com/apache/arrow/pull/7089#issuecomment-623480713
Sorry fat-fingered the review request. I will take a look at this
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] fsaintjacques commented on a change in pull request #7089: ARROW-8657: [C++][Python] Add separate configuration for data pages
Posted by GitBox <gi...@apache.org>.
fsaintjacques commented on a change in pull request #7089:
URL: https://github.com/apache/arrow/pull/7089#discussion_r419460990
##########
File path: cpp/src/parquet/properties.h
##########
@@ -34,10 +34,14 @@
namespace parquet {
+/// Control for data types in parquet.
struct ParquetVersion {
enum type { PARQUET_1_0, PARQUET_2_0 };
};
+/// Controls layout of data pages.
Review comment:
I'll clarify, I'm asking this to be put in a doc comment for whoever read this and doesn't know which one to use.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] emkornfield commented on a change in pull request #7089: ARROW-8657: [C++][Python] Add separate configuration for data pages
Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #7089:
URL: https://github.com/apache/arrow/pull/7089#discussion_r419852845
##########
File path: cpp/src/parquet/properties.h
##########
@@ -34,10 +34,14 @@
namespace parquet {
+/// Control for data types in parquet.
struct ParquetVersion {
enum type { PARQUET_1_0, PARQUET_2_0 };
};
+/// Controls layout of data pages.
Review comment:
Added Wes's comment.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] fsaintjacques commented on a change in pull request #7089: ARROW-8657: [C++][Python] Add separate configuration for data pages
Posted by GitBox <gi...@apache.org>.
fsaintjacques commented on a change in pull request #7089:
URL: https://github.com/apache/arrow/pull/7089#discussion_r419392424
##########
File path: cpp/src/parquet/properties.h
##########
@@ -34,10 +34,14 @@
namespace parquet {
+/// Control for data types in parquet.
struct ParquetVersion {
enum type { PARQUET_1_0, PARQUET_2_0 };
};
+/// Controls layout of data pages.
Review comment:
Could you add more background on the differences and why you'd want that?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org