You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by "dramaticlly (via GitHub)" <gi...@apache.org> on 2023/02/03 22:55:04 UTC

[GitHub] [iceberg] dramaticlly opened a new issue, #6741: Support write distribution mode as a Spark SqlConf option in Iceberg

dramaticlly opened a new issue, #6741:
URL: https://github.com/apache/iceberg/issues/6741

   ### Feature Request / Improvement
   
   Today, the row level deletion, update for iceberg table have to be done via Spark SQL. 
   
   Currently iceberg provides table properties to configure the write distribution mode (none, hash and range) but we love to see the ability to configure this on per Spark job level 
   
   Reasoning
   1. iceberg is now plan to default write distribution mode from none to range https://github.com/apache/iceberg/issues/6679
   1. `None` as write distribution mode used to minimize the shuffle for already partition aligned data usually only needed to set for GDPR like deletion job but not necessarily needed for other jobs. So set it on table properties for all write/delete/update seems like not not the best idea.
   
   example SQL usecase can be found
   ```sql
   DELETE FROM tbl1
   WHERE date <= '20230101'
   AND external_id IN (SELECT id FROM tb2) 
   ```
   
   ### Query engine
   
   None


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] aokolnychyi commented on issue #6741: Support write distribution mode as a Spark SqlConf option in Iceberg

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.

aokolnychyi commented on issue #6741:
URL: https://github.com/apache/iceberg/issues/6741#issuecomment-1428650056

   @dramaticlly, I have some time to implement this SQL conf. Have you looked into it already? I want this feature to be shipped with 1.2.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] JunchengMa commented on issue #6741: Support write distribution mode as a Spark SqlConf option in Iceberg

Posted by "JunchengMa (via GitHub)" <gi...@apache.org>.

JunchengMa commented on issue #6741:
URL: https://github.com/apache/iceberg/issues/6741#issuecomment-1416594916

   Thanks @dramaticlly for opening this feature request
   In addition to the GDPR deletion use case, the column redaction use case would also benefit from setting `write.update.distribution-mode`='none' at the job level to avoid shuffles.
   
   Example SQL:
   ```
   UPDATE db_name.tlb_name SET col_a = NULL WHERE date <= '20220801'
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] aokolnychyi commented on issue #6741: Support write distribution mode as a Spark SqlConf option in Iceberg

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.

aokolnychyi commented on issue #6741:
URL: https://github.com/apache/iceberg/issues/6741#issuecomment-1416541311

   We had such a config in the prototype but ended up switching to table properties. Given that we don't have a way to pass options to MERGE or DELETE, I feel it would make sense to offer a SQL config like this.
   
   I'll be happy to review a PR for this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] aokolnychyi closed issue #6741: Support write distribution mode as a Spark SqlConf option in Iceberg

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.

aokolnychyi closed issue #6741: Support write distribution mode as a Spark SqlConf option in  Iceberg 
URL: https://github.com/apache/iceberg/issues/6741


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org