You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/02/16 08:56:27 UTC
[GitHub] [arrow-rs] shanisolomon opened a new pull request #1318: Expose column index and offset index
shanisolomon opened a new pull request #1318:
URL: https://github.com/apache/arrow-rs/pull/1318
# Which issue does this PR close?
Closes #1317.
Exposing the column index and offset index offsets and lengths so parquet engines could optimize their reads.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] alamb commented on pull request #1318: Expose column index and offset index
Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #1318:
URL: https://github.com/apache/arrow-rs/pull/1318#issuecomment-1041423649
https://github.com/apache/arrow-rs/pull/1320 has been merged FYI
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] sunchao merged pull request #1318: Expose column index and offset index
Posted by GitBox <gi...@apache.org>.
sunchao merged pull request #1318:
URL: https://github.com/apache/arrow-rs/pull/1318
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] codecov-commenter commented on pull request #1318: Expose column index and offset index
Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #1318:
URL: https://github.com/apache/arrow-rs/pull/1318#issuecomment-1041271329
# [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1318?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#1318](https://codecov.io/gh/apache/arrow-rs/pull/1318?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (a31927e) into [master](https://codecov.io/gh/apache/arrow-rs/commit/747e72a0c3bf5771c7bfde7b09c5166c5aa51bc3?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (747e72a) will **decrease** coverage by `0.03%`.
> The diff coverage is `50.00%`.
> :exclamation: Current head a31927e differs from pull request most recent head e309b01. Consider uploading reports for the commit e309b01 to get more accurate results
[![Impacted file tree graph](https://codecov.io/gh/apache/arrow-rs/pull/1318/graphs/tree.svg?width=650&height=150&src=pr&token=pq9V9qWZ1N&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-rs/pull/1318?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
```diff
@@ Coverage Diff @@
## master #1318 +/- ##
==========================================
- Coverage 83.01% 82.98% -0.04%
==========================================
Files 180 180
Lines 52810 52866 +56
==========================================
+ Hits 43840 43870 +30
- Misses 8970 8996 +26
```
| [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/1318?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
|---|---|---|
| [parquet/src/schema/printer.rs](https://codecov.io/gh/apache/arrow-rs/pull/1318/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvc2NoZW1hL3ByaW50ZXIucnM=) | `68.57% <0.00%> (-2.98%)` | :arrow_down: |
| [parquet/src/file/metadata.rs](https://codecov.io/gh/apache/arrow-rs/pull/1318/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvZmlsZS9tZXRhZGF0YS5ycw==) | `88.53% <65.71%> (-2.87%)` | :arrow_down: |
| [parquet/src/file/serialized\_reader.rs](https://codecov.io/gh/apache/arrow-rs/pull/1318/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvZmlsZS9zZXJpYWxpemVkX3JlYWRlci5ycw==) | `94.55% <100.00%> (+0.07%)` | :arrow_up: |
| [arrow/src/array/transform/mod.rs](https://codecov.io/gh/apache/arrow-rs/pull/1318/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L3RyYW5zZm9ybS9tb2QucnM=) | `84.52% <0.00%> (ø)` | |
| [parquet\_derive/src/parquet\_field.rs](https://codecov.io/gh/apache/arrow-rs/pull/1318/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldF9kZXJpdmUvc3JjL3BhcnF1ZXRfZmllbGQucnM=) | `66.21% <0.00%> (+0.22%)` | :arrow_up: |
| [arrow/src/datatypes/datatype.rs](https://codecov.io/gh/apache/arrow-rs/pull/1318/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2RhdGF0eXBlcy9kYXRhdHlwZS5ycw==) | `66.80% <0.00%> (+0.39%)` | :arrow_up: |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1318?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1318?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [747e72a...e309b01](https://codecov.io/gh/apache/arrow-rs/pull/1318?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] shanisolomon commented on a change in pull request #1318: Expose column index and offset index
Posted by GitBox <gi...@apache.org>.
shanisolomon commented on a change in pull request #1318:
URL: https://github.com/apache/arrow-rs/pull/1318#discussion_r807887417
##########
File path: parquet/src/file/metadata.rs
##########
@@ -350,6 +350,10 @@ pub struct ColumnChunkMetaData {
dictionary_page_offset: Option<i64>,
statistics: Option<Statistics>,
bloom_filter_offset: Option<i64>,
+ offset_index_offset: Option<i64>,
Review comment:
I agree, and this is also why I chose to keep it that way. :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] alamb commented on a change in pull request #1318: Expose column index and offset index
Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #1318:
URL: https://github.com/apache/arrow-rs/pull/1318#discussion_r807873209
##########
File path: parquet/src/file/metadata.rs
##########
@@ -350,6 +350,10 @@ pub struct ColumnChunkMetaData {
dictionary_page_offset: Option<i64>,
statistics: Option<Statistics>,
bloom_filter_offset: Option<i64>,
+ offset_index_offset: Option<i64>,
Review comment:
`offset_index_offset` and `offset_index_length` are unfortunate names (such `offset`!) 🤪
However, it seems that is what they are called in the parquet format itself so consistency in the rust implementation is 👍
https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L798-L802
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] alamb commented on a change in pull request #1318: Expose column index and offset index
Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #1318:
URL: https://github.com/apache/arrow-rs/pull/1318#discussion_r807873209
##########
File path: parquet/src/file/metadata.rs
##########
@@ -350,6 +350,10 @@ pub struct ColumnChunkMetaData {
dictionary_page_offset: Option<i64>,
statistics: Option<Statistics>,
bloom_filter_offset: Option<i64>,
+ offset_index_offset: Option<i64>,
Review comment:
`offset_index_offset` is an unfortunate name (such `offset`!) 🤪
However, it seems that is what they are called in the parquet format itself so consistency in the rust implementation is 👍
https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L798-L802
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] shanisolomon commented on pull request #1318: Expose column index and offset index
Posted by GitBox <gi...@apache.org>.
shanisolomon commented on pull request #1318:
URL: https://github.com/apache/arrow-rs/pull/1318#issuecomment-1041866665
Thanks, @alamb! Could you help with merging please?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] sunchao commented on pull request #1318: Expose column index and offset index
Posted by GitBox <gi...@apache.org>.
sunchao commented on pull request #1318:
URL: https://github.com/apache/arrow-rs/pull/1318#issuecomment-1041878475
Just merged to master!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] shanisolomon commented on pull request #1318: Expose column index and offset index
Posted by GitBox <gi...@apache.org>.
shanisolomon commented on pull request #1318:
URL: https://github.com/apache/arrow-rs/pull/1318#issuecomment-1041318857
Will sync with `https://github.com/apache/arrow-rs/pull/1320` and update the test after its merge.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org