You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/12/03 21:56:14 UTC

[GitHub] [iceberg] RussellSpitzer opened a new issue #3662: Split_Offset Information is not Included when Reading Manifests for Planning

RussellSpitzer opened a new issue #3662:
URL: https://github.com/apache/iceberg/issues/3662


   DataTableScan requires the following columns passed to ManifestGroup when reading Manifests
   
   https://github.com/apache/iceberg/blob/97055c4114d396eeef24d8280006984b1c2cb306/core/src/main/java/org/apache/iceberg/DataTableScan.java#L31-L34
   
   To create scan tasks, missing from this is "split_offsets" which means ScanTasks created from these manifest entries will not have any SplitInformation. While I think this worked previously because of bugs in our Avro Pruning code
   
   https://github.com/apache/iceberg/issues/1735
   
   I think we are now having issues because the select is working as it should.
   
   I also think we lack any explicit tests for making sure split offset information is being used (or read) during split planning. I discovered the current issue when I noticed my test suite for #3292 started failing which requires split_offset information to function properly.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on issue #3662: Split_Offset Information is not Included when Reading Manifests for Planning

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on issue #3662:
URL: https://github.com/apache/iceberg/issues/3662#issuecomment-985905376


   Thanks for investigating this Russell, I think this is a pretty important bug we need to fix before 0.13, please let me know if you have time, if not I can publish the fix. If my understanding is correct, do we add the `split_offsets` to `SCAN_COLUMNS` and also add tests related to this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer closed issue #3662: Split_Offset Information is not Included when Reading Manifests for Planning

Posted by GitBox <gi...@apache.org>.
RussellSpitzer closed issue #3662:
URL: https://github.com/apache/iceberg/issues/3662


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on issue #3662: Split_Offset Information is not Included when Reading Manifests for Planning

Posted by GitBox <gi...@apache.org>.
rdblue commented on issue #3662:
URL: https://github.com/apache/iceberg/issues/3662#issuecomment-985937320


   @RussellSpitzer, I'm going to review #3292 again this weekend. Thanks for being patient with my reviews on that one!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on issue #3662: Split_Offset Information is not Included when Reading Manifests for Planning

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on issue #3662:
URL: https://github.com/apache/iceberg/issues/3662#issuecomment-985920639


   I have the fix and tests, just waiting for internal approval to put it upstream.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on issue #3662: Split_Offset Information is not Included when Reading Manifests for Planning

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on issue #3662:
URL: https://github.com/apache/iceberg/issues/3662#issuecomment-985921658


   Additionally I would like us to fix the other issues in #3292 that I note but i'll leave that for another PR.
   
   1. We allow arbitrary splitting of formats that are splittable if they are missing split information (A parquet file with a single row group can be turned into infinite tasks)
   2. We do not allow combining partial scans into a single scan task. For example, if I have 3 files each with 2 row groups, but my read split size is 3 row groups large, I will still get 3 tasks even if file open cost is 0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on issue #3662: Split_Offset Information is not Included when Reading Manifests for Planning

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on issue #3662:
URL: https://github.com/apache/iceberg/issues/3662#issuecomment-985934328


   @jackye1995 ^ Fix attached


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on issue #3662: Split_Offset Information is not Included when Reading Manifests for Planning

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on issue #3662:
URL: https://github.com/apache/iceberg/issues/3662#issuecomment-985938013


   No rush, I need to fix this first and rebase
   
   Sent from my iPhone
   
   > On Dec 3, 2021, at 7:12 PM, Ryan Blue ***@***.***> wrote:
   > 
   > 
   > @RussellSpitzer, I'm going to review #3292 again this weekend. Thanks for being patient with my reviews on that one!
   > 
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub, or unsubscribe.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org