You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/07/30 09:13:53 UTC

[GitHub] [iceberg] coolderli opened a new pull request #2897: Docs: Add UPDATE describtion for spark3.

coolderli opened a new pull request #2897:
URL: https://github.com/apache/iceberg/pull/2897


   I found the doc's lack of `UPDATE` queries on spark3. And I added it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #2897: Docs: Add UPDATE describtion for spark3.

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #2897:
URL: https://github.com/apache/iceberg/pull/2897#issuecomment-890039428


   Thanks, @coolderli! Good to have this added to the docs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #2897: Docs: Add UPDATE describtion for spark3.

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #2897:
URL: https://github.com/apache/iceberg/pull/2897#issuecomment-919564894


   Thanks for reminding me, @coolderli! I'll merge this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] coolderli commented on pull request #2897: Docs: Add UPDATE describtion for spark3.

Posted by GitBox <gi...@apache.org>.
coolderli commented on pull request #2897:
URL: https://github.com/apache/iceberg/pull/2897#issuecomment-919255465


   cc @rdblue 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] coolderli commented on a change in pull request #2897: Docs: Add UPDATE describtion for spark3.

Posted by GitBox <gi...@apache.org>.
coolderli commented on a change in pull request #2897:
URL: https://github.com/apache/iceberg/pull/2897#discussion_r680476267



##########
File path: site/docs/spark-writes.md
##########
@@ -171,6 +172,18 @@ WHERE ts >= '2020-05-01 00:00:00' and ts < '2020-06-01 00:00:00'
 
 If the delte filter matches entire partitions of the table, Iceberg will perform a metadata-only delete. If the filter matches individual rows of a table, then Iceberg will rewrite only the affected data files.
 
+### `UPDATE`
+
+Spark 3 added support for `UPDATE` queries to remove data from tables.

Review comment:
       I've fixed it. I'll think about the complex updates for `MERGE INTO`  and add in another patch.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2897: Docs: Add UPDATE describtion for spark3.

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2897:
URL: https://github.com/apache/iceberg/pull/2897#discussion_r685340008



##########
File path: site/docs/spark-writes.md
##########
@@ -89,6 +90,16 @@ Inserts also support additional conditions:
 WHEN NOT MATCHED AND s.event_time > still_valid_threshold THEN INSERT (id, count) VALUES (s.id, 1)
 ```
 
+Source rows can also come from the target table or a subset of it.
+
+```sql
+MERGE INTO prod.db.target t
+USING (SELECT * from prod.db.target where op = 'increment') s
+ON t.id = s.id
+WHEN MATCHED AND t.count <> 0 THEN UPDATE SET t.count = 0, t.op = 'delete'
+WHEN MATCHED AND t.count = 0 THEN UPDATE SET t.count = s.count + 1
+```
+

Review comment:
       This change looks unrelated and is probably not a good example of how you would merge a table into itself because the `op` field would probably not be stored in the target. I think it's a good idea to show that you can merge a table into itself, but that's probably a different PR where we can discuss what we want in the example. I would definitely include a recommendation to add partition filters.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2897: Docs: Add UPDATE describtion for spark3.

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2897:
URL: https://github.com/apache/iceberg/pull/2897#discussion_r680104995



##########
File path: site/docs/spark-writes.md
##########
@@ -171,6 +172,18 @@ WHERE ts >= '2020-05-01 00:00:00' and ts < '2020-06-01 00:00:00'
 
 If the delte filter matches entire partitions of the table, Iceberg will perform a metadata-only delete. If the filter matches individual rows of a table, then Iceberg will rewrite only the affected data files.
 
+### `UPDATE`
+
+Spark 3 added support for `UPDATE` queries to remove data from tables.

Review comment:
       I think this description isn't quite right. It should be
   > Spark 3.1 added support for `UPDATE` queries that update matching rows in tables.
   
   It would also be good to add a pointer to `MERGE INTO` for more complex updates.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] coolderli commented on a change in pull request #2897: Docs: Add UPDATE describtion for spark3.

Posted by GitBox <gi...@apache.org>.
coolderli commented on a change in pull request #2897:
URL: https://github.com/apache/iceberg/pull/2897#discussion_r685657448



##########
File path: site/docs/spark-writes.md
##########
@@ -171,6 +172,18 @@ WHERE ts >= '2020-05-01 00:00:00' and ts < '2020-06-01 00:00:00'
 
 If the delte filter matches entire partitions of the table, Iceberg will perform a metadata-only delete. If the filter matches individual rows of a table, then Iceberg will rewrite only the affected data files.
 
+### `UPDATE`
+
+Spark 3 added support for `UPDATE` queries to remove data from tables.

Review comment:
       Sorry about that. I have updated.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2897: Docs: Add UPDATE describtion for spark3.

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2897:
URL: https://github.com/apache/iceberg/pull/2897#discussion_r680540824



##########
File path: site/docs/spark-writes.md
##########
@@ -171,6 +172,18 @@ WHERE ts >= '2020-05-01 00:00:00' and ts < '2020-06-01 00:00:00'
 
 If the delte filter matches entire partitions of the table, Iceberg will perform a metadata-only delete. If the filter matches individual rows of a table, then Iceberg will rewrite only the affected data files.
 
+### `UPDATE`
+
+Spark 3 added support for `UPDATE` queries to remove data from tables.

Review comment:
       Let's do it here while we're editing the `UPDATE` section.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] coolderli commented on a change in pull request #2897: Docs: Add UPDATE describtion for spark3.

Posted by GitBox <gi...@apache.org>.
coolderli commented on a change in pull request #2897:
URL: https://github.com/apache/iceberg/pull/2897#discussion_r685266911



##########
File path: site/docs/spark-writes.md
##########
@@ -171,6 +172,18 @@ WHERE ts >= '2020-05-01 00:00:00' and ts < '2020-06-01 00:00:00'
 
 If the delte filter matches entire partitions of the table, Iceberg will perform a metadata-only delete. If the filter matches individual rows of a table, then Iceberg will rewrite only the affected data files.
 
+### `UPDATE`
+
+Spark 3 added support for `UPDATE` queries to remove data from tables.

Review comment:
       @rdblue I have added some updates for `MERGE INTO`, do I have any other omissions?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2897: Docs: Add UPDATE describtion for spark3.

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2897:
URL: https://github.com/apache/iceberg/pull/2897#discussion_r685342734



##########
File path: site/docs/spark-writes.md
##########
@@ -171,6 +172,18 @@ WHERE ts >= '2020-05-01 00:00:00' and ts < '2020-06-01 00:00:00'
 
 If the delte filter matches entire partitions of the table, Iceberg will perform a metadata-only delete. If the filter matches individual rows of a table, then Iceberg will rewrite only the affected data files.
 
+### `UPDATE`
+
+Spark 3 added support for `UPDATE` queries to remove data from tables.

Review comment:
       I think you misunderstood what I was suggesting. Instead of changing `MERGE INTO`, I think that the `UPDATE` docs should point the reader to `MERGE INTO` for more complex cases. Something like this with a link:
   
   > For more complex row-level updates based on incoming data, see the section on `MERGE INTO`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] coolderli commented on pull request #2897: Docs: Add UPDATE describtion for spark3.

Posted by GitBox <gi...@apache.org>.
coolderli commented on pull request #2897:
URL: https://github.com/apache/iceberg/pull/2897#issuecomment-889757984


   cc @rdblue 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #2897: Docs: Add UPDATE describtion for spark3.

Posted by GitBox <gi...@apache.org>.
rdblue merged pull request #2897:
URL: https://github.com/apache/iceberg/pull/2897


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org