You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "RussellSpitzer (via GitHub)" <gi...@apache.org> on 2023/05/08 16:46:09 UTC

[GitHub] [iceberg] RussellSpitzer opened a new issue, #7557: Support Rewrite Datafiles into a custom Partition Spec

RussellSpitzer opened a new issue, #7557:
URL: https://github.com/apache/iceberg/issues/7557

   ### Feature Request / Improvement
   
   Currently rewrite data-files always uses the current table partition spec when rewriting but it is useful to be able to use any partition spec in the table. In this way, a user could potentially write with one spec like `hours`, but then rewrite into `days`.
   
   This would require expanding the API and then changing the implementation a bit. We should probably do this in two PR's, one adding the new API. Then additional follow up PR's actually adding implementations for the various existing rewrites like Spark's Rewrite DataFiles.
   
   ### Query engine
   
   Spark


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] W-I-D-EE commented on issue #7557: Support Rewrite Datafiles into a custom Partition Spec

Posted by "W-I-D-EE (via GitHub)" <gi...@apache.org>.
W-I-D-EE commented on issue #7557:
URL: https://github.com/apache/iceberg/issues/7557#issuecomment-1681431973

   Do you see this as less about Partition evolution and more about being able to rewrite data into a more optimal partition spec. Like in your example maybe your default partition is using days, but on one specific day your data was exceptionally chatty resulting in suboptimal query performance. So being able to rewrite the chatty days into hours could result in a more optimal partition setup.
   
   If so, would it make sense for users to be able to add a partition spec to a table without altering/evolving the existing default. Maybe this already exists?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [I] Support Rewrite Datafiles into a custom Partition Spec [iceberg]

Posted by "himadripal (via GitHub)" <gi...@apache.org>.
himadripal commented on issue #7557:
URL: https://github.com/apache/iceberg/issues/7557#issuecomment-1963108856

   #9803 revised PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on issue #7557: Support Rewrite Datafiles into a custom Partition Spec

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.
RussellSpitzer commented on issue #7557:
URL: https://github.com/apache/iceberg/issues/7557#issuecomment-1681505620

   Note that @dpaani Has a PR for this.
   #7585
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] edgarRd commented on issue #7557: Support Rewrite Datafiles into a custom Partition Spec

Posted by "edgarRd (via GitHub)" <gi...@apache.org>.
edgarRd commented on issue #7557:
URL: https://github.com/apache/iceberg/issues/7557#issuecomment-1561607714

   I added this for the same use case when we had the previous rewrite data files action still in [core](https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/actions/BaseRewriteDataFilesAction.java#L138) https://github.com/apache/iceberg/issues/7635 but apparently that didn't make it to the new `RewriteDataFiles`. It's still used in Flink tho.
   
   I agree, It'd be great to consolidate to have only one data files rewrite API.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org