You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/02/16 08:56:27 UTC

[GitHub] [arrow-rs] shanisolomon opened a new pull request #1318: Expose column index and offset index

shanisolomon opened a new pull request #1318:
URL: https://github.com/apache/arrow-rs/pull/1318


   # Which issue does this PR close?
   Closes #1317.
   
   Exposing the column index and offset index offsets and lengths so parquet engines could optimize their reads.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on pull request #1318: Expose column index and offset index

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #1318:
URL: https://github.com/apache/arrow-rs/pull/1318#issuecomment-1041423649


   https://github.com/apache/arrow-rs/pull/1320 has been merged FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] sunchao merged pull request #1318: Expose column index and offset index

Posted by GitBox <gi...@apache.org>.
sunchao merged pull request #1318:
URL: https://github.com/apache/arrow-rs/pull/1318


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] codecov-commenter commented on pull request #1318: Expose column index and offset index

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #1318:
URL: https://github.com/apache/arrow-rs/pull/1318#issuecomment-1041271329


   # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1318?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#1318](https://codecov.io/gh/apache/arrow-rs/pull/1318?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (a31927e) into [master](https://codecov.io/gh/apache/arrow-rs/commit/747e72a0c3bf5771c7bfde7b09c5166c5aa51bc3?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (747e72a) will **decrease** coverage by `0.03%`.
   > The diff coverage is `50.00%`.
   
   > :exclamation: Current head a31927e differs from pull request most recent head e309b01. Consider uploading reports for the commit e309b01 to get more accurate results
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-rs/pull/1318/graphs/tree.svg?width=650&height=150&src=pr&token=pq9V9qWZ1N&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-rs/pull/1318?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #1318      +/-   ##
   ==========================================
   - Coverage   83.01%   82.98%   -0.04%     
   ==========================================
     Files         180      180              
     Lines       52810    52866      +56     
   ==========================================
   + Hits        43840    43870      +30     
   - Misses       8970     8996      +26     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/1318?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [parquet/src/schema/printer.rs](https://codecov.io/gh/apache/arrow-rs/pull/1318/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvc2NoZW1hL3ByaW50ZXIucnM=) | `68.57% <0.00%> (-2.98%)` | :arrow_down: |
   | [parquet/src/file/metadata.rs](https://codecov.io/gh/apache/arrow-rs/pull/1318/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvZmlsZS9tZXRhZGF0YS5ycw==) | `88.53% <65.71%> (-2.87%)` | :arrow_down: |
   | [parquet/src/file/serialized\_reader.rs](https://codecov.io/gh/apache/arrow-rs/pull/1318/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvZmlsZS9zZXJpYWxpemVkX3JlYWRlci5ycw==) | `94.55% <100.00%> (+0.07%)` | :arrow_up: |
   | [arrow/src/array/transform/mod.rs](https://codecov.io/gh/apache/arrow-rs/pull/1318/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L3RyYW5zZm9ybS9tb2QucnM=) | `84.52% <0.00%> (ø)` | |
   | [parquet\_derive/src/parquet\_field.rs](https://codecov.io/gh/apache/arrow-rs/pull/1318/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldF9kZXJpdmUvc3JjL3BhcnF1ZXRfZmllbGQucnM=) | `66.21% <0.00%> (+0.22%)` | :arrow_up: |
   | [arrow/src/datatypes/datatype.rs](https://codecov.io/gh/apache/arrow-rs/pull/1318/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2RhdGF0eXBlcy9kYXRhdHlwZS5ycw==) | `66.80% <0.00%> (+0.39%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1318?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1318?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [747e72a...e309b01](https://codecov.io/gh/apache/arrow-rs/pull/1318?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] shanisolomon commented on a change in pull request #1318: Expose column index and offset index

Posted by GitBox <gi...@apache.org>.
shanisolomon commented on a change in pull request #1318:
URL: https://github.com/apache/arrow-rs/pull/1318#discussion_r807887417



##########
File path: parquet/src/file/metadata.rs
##########
@@ -350,6 +350,10 @@ pub struct ColumnChunkMetaData {
     dictionary_page_offset: Option<i64>,
     statistics: Option<Statistics>,
     bloom_filter_offset: Option<i64>,
+    offset_index_offset: Option<i64>,

Review comment:
       I agree, and this is also why I chose to keep it that way. :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on a change in pull request #1318: Expose column index and offset index

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #1318:
URL: https://github.com/apache/arrow-rs/pull/1318#discussion_r807873209



##########
File path: parquet/src/file/metadata.rs
##########
@@ -350,6 +350,10 @@ pub struct ColumnChunkMetaData {
     dictionary_page_offset: Option<i64>,
     statistics: Option<Statistics>,
     bloom_filter_offset: Option<i64>,
+    offset_index_offset: Option<i64>,

Review comment:
       `offset_index_offset` and `offset_index_length` are unfortunate names (such `offset`!) 🤪 
   
   However, it seems that is what they are called in the parquet format itself so consistency in the rust implementation is 👍 
   
   https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L798-L802




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on a change in pull request #1318: Expose column index and offset index

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #1318:
URL: https://github.com/apache/arrow-rs/pull/1318#discussion_r807873209



##########
File path: parquet/src/file/metadata.rs
##########
@@ -350,6 +350,10 @@ pub struct ColumnChunkMetaData {
     dictionary_page_offset: Option<i64>,
     statistics: Option<Statistics>,
     bloom_filter_offset: Option<i64>,
+    offset_index_offset: Option<i64>,

Review comment:
       `offset_index_offset` is an unfortunate name (such `offset`!) 🤪 
   
   However, it seems that is what they are called in the parquet format itself so consistency in the rust implementation is 👍 
   
   https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L798-L802




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] shanisolomon commented on pull request #1318: Expose column index and offset index

Posted by GitBox <gi...@apache.org>.
shanisolomon commented on pull request #1318:
URL: https://github.com/apache/arrow-rs/pull/1318#issuecomment-1041866665


   Thanks, @alamb! Could you help with merging please?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] sunchao commented on pull request #1318: Expose column index and offset index

Posted by GitBox <gi...@apache.org>.
sunchao commented on pull request #1318:
URL: https://github.com/apache/arrow-rs/pull/1318#issuecomment-1041878475


   Just merged to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] shanisolomon commented on pull request #1318: Expose column index and offset index

Posted by GitBox <gi...@apache.org>.
shanisolomon commented on pull request #1318:
URL: https://github.com/apache/arrow-rs/pull/1318#issuecomment-1041318857


   Will sync with `https://github.com/apache/arrow-rs/pull/1320` and update the test after its merge.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org