You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/02/25 05:06:19 UTC

[GitHub] [iceberg] ajantha-bhat opened a new pull request #4223: Clarify merge-on-read modes in docs for spark

ajantha-bhat opened a new pull request #4223:
URL: https://github.com/apache/iceberg/pull/4223


   Currently iceberg-spark  documentation doesn't talk about merge-on-read mode. Hence the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] samredai commented on pull request #4223: Docs: Clarify merge-on-read modes in docs for spark

Posted by GitBox <gi...@apache.org>.

samredai commented on pull request #4223:
URL: https://github.com/apache/iceberg/pull/4223#issuecomment-1054419258


   These properties like `write.merge.mode` also need to be added to the [table configuration](https://github.com/apache/iceberg/blob/master/docs/versioned/tables/configuration.md) page right?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] samredai commented on a change in pull request #4223: Docs: Clarify merge-on-read modes in docs for spark

Posted by GitBox <gi...@apache.org>.

samredai commented on a change in pull request #4223:
URL: https://github.com/apache/iceberg/pull/4223#discussion_r816032103



##########
File path: docs/versioned/spark/spark-writes.md
##########
@@ -98,6 +98,11 @@ WHEN NOT MATCHED AND s.event_time > still_valid_threshold THEN INSERT (id, count
 
 Only one record in the source data can update any given row of the target table, or else an error will be thrown.
 
+!!! Note
+    By default Spark uses copy-on-write merge mode.
+    With spark-3.2 and onwards, iceberg supports merge-on-read mode.    
+    To use merge-on-read merge mode, need to set the table property `write.merge.mode` to "merge-on-read".

Review comment:
       grammar nit: "need to set the table property" can just say "set the table property". Same comment for the other note boxes.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] liuml07 commented on a change in pull request #4223: Docs: Clarify merge-on-read modes in docs for spark

Posted by GitBox <gi...@apache.org>.

liuml07 commented on a change in pull request #4223:
URL: https://github.com/apache/iceberg/pull/4223#discussion_r841008642



##########
File path: docs/spark/spark-writes.md
##########
@@ -207,6 +217,11 @@ WHERE EXISTS (SELECT oid FROM prod.db.returned_orders WHERE t1.oid = oid)
 
 For more complex row-level updates based on incoming data, see the section on `MERGE INTO`.
 
+!!! Note
+    By default Spark uses copy-on-write update mode.
+    With spark-3.2 and onwards, iceberg supports merge-on-read mode.    

Review comment:
       s/iceberg/Iceberg/
   
   s/merge-on-read mode/merge-on-read update mode/
   
   Same other places.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] szehon-ho commented on a change in pull request #4223: Docs: Clarify merge-on-read modes in docs for spark

Posted by GitBox <gi...@apache.org>.

szehon-ho commented on a change in pull request #4223:
URL: https://github.com/apache/iceberg/pull/4223#discussion_r814996711



##########
File path: docs/versioned/spark/spark-writes.md
##########
@@ -98,6 +98,11 @@ WHEN NOT MATCHED AND s.event_time > still_valid_threshold THEN INSERT (id, count
 
 Only one record in the source data can update any given row of the target table, or else an error will be thrown.
 
+!!! Note
+    By default Spark uses copy-on-write merge mode.
+    With spark-3.2 and onwards, iceberg supports merge-on-read mode.    

Review comment:
       Not sure what everyone thinks, but seems there's not much context.  Should we explain what it means (that it writes v2 delete files)?

##########
File path: docs/versioned/spark/spark-writes.md
##########
@@ -98,6 +98,11 @@ WHEN NOT MATCHED AND s.event_time > still_valid_threshold THEN INSERT (id, count
 
 Only one record in the source data can update any given row of the target table, or else an error will be thrown.
 
+!!! Note
+    By default Spark uses copy-on-write merge mode.
+    With spark-3.2 and onwards, iceberg supports merge-on-read mode.    
+    To use merge-on-read merge mode, need to set the table property `write.merge.mode` to "merge-on-read".

Review comment:
       Do we need to add that format-version needs to be v2?
   
   Also nit: I think 'need to' is redundant, removing it seems more consistent with the rest of docs

##########
File path: docs/versioned/spark/spark-writes.md
##########
@@ -98,6 +98,11 @@ WHEN NOT MATCHED AND s.event_time > still_valid_threshold THEN INSERT (id, count
 
 Only one record in the source data can update any given row of the target table, or else an error will be thrown.
 
+!!! Note

Review comment:
       Does the three !!! renders anything?  (I dont see it when viewing the file as md).  Should we just use a normal sub-header here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] ajantha-bhat commented on a change in pull request #4223: Docs: Clarify merge-on-read modes in docs for spark

Posted by GitBox <gi...@apache.org>.

ajantha-bhat commented on a change in pull request #4223:
URL: https://github.com/apache/iceberg/pull/4223#discussion_r841093870



##########
File path: docs/spark/spark-writes.md
##########
@@ -207,6 +217,11 @@ WHERE EXISTS (SELECT oid FROM prod.db.returned_orders WHERE t1.oid = oid)
 
 For more complex row-level updates based on incoming data, see the section on `MERGE INTO`.
 
+!!! Note
+    By default Spark uses copy-on-write update mode.
+    With spark-3.2 and onwards, iceberg supports merge-on-read mode.    

Review comment:
       updated. Thanks




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] ajantha-bhat commented on a change in pull request #4223: Docs: Clarify merge-on-read modes in docs for spark

Posted by GitBox <gi...@apache.org>.

ajantha-bhat commented on a change in pull request #4223:
URL: https://github.com/apache/iceberg/pull/4223#discussion_r816609756



##########
File path: docs/versioned/spark/spark-writes.md
##########
@@ -98,6 +98,11 @@ WHEN NOT MATCHED AND s.event_time > still_valid_threshold THEN INSERT (id, count
 
 Only one record in the source data can update any given row of the target table, or else an error will be thrown.
 
+!!! Note

Review comment:
       we have a PR for that already,
   https://github.com/apache/iceberg/pull/3432




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] ajantha-bhat commented on a change in pull request #4223: Docs: Clarify merge-on-read modes in docs for spark

Posted by GitBox <gi...@apache.org>.

ajantha-bhat commented on a change in pull request #4223:
URL: https://github.com/apache/iceberg/pull/4223#discussion_r838725493



##########
File path: docs/versioned/spark/spark-writes.md
##########
@@ -98,6 +98,11 @@ WHEN NOT MATCHED AND s.event_time > still_valid_threshold THEN INSERT (id, count
 
 Only one record in the source data can update any given row of the target table, or else an error will be thrown.
 
+!!! Note
+    By default Spark uses copy-on-write merge mode.
+    With spark-3.2 and onwards, iceberg supports merge-on-read mode.    
+    To use merge-on-read merge mode, need to set the table property `write.merge.mode` to "merge-on-read".

Review comment:
       fixed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] ajantha-bhat commented on pull request #4223: Docs: Clarify merge-on-read modes in docs for spark

Posted by GitBox <gi...@apache.org>.

ajantha-bhat commented on pull request #4223:
URL: https://github.com/apache/iceberg/pull/4223#issuecomment-1083337942


   @samredai , @rdblue , @szehon-ho : This is commonly asked question in the Iceberg slack workspace. Today also someone asking this.
   So, I feel we should merge this and update the site docs and provide more context if required after #3432 is merged.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on a change in pull request #4223: Docs: Clarify merge-on-read modes in docs for spark

Posted by GitBox <gi...@apache.org>.

rdblue commented on a change in pull request #4223:
URL: https://github.com/apache/iceberg/pull/4223#discussion_r815135716



##########
File path: docs/versioned/spark/spark-writes.md
##########
@@ -98,6 +98,11 @@ WHEN NOT MATCHED AND s.event_time > still_valid_threshold THEN INSERT (id, count
 
 Only one record in the source data can update any given row of the target table, or else an error will be thrown.
 
+!!! Note

Review comment:
       It causes this to render in a text box. I would prefer having a full section on the copy-on-write vs merge-on-read distinction and then refer to it from the individual command sections.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] samredai commented on pull request #4223: Docs: Clarify merge-on-read modes in docs for spark

Posted by GitBox <gi...@apache.org>.

samredai commented on pull request #4223:
URL: https://github.com/apache/iceberg/pull/4223#issuecomment-1083928661


   > I feel we should merge this and update the site docs and provide more context if required after https://github.com/apache/iceberg/pull/3432 is merged
   
   +1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on pull request #4223: Docs: Clarify merge-on-read modes in docs for spark

Posted by GitBox <gi...@apache.org>.

rdblue commented on pull request #4223:
URL: https://github.com/apache/iceberg/pull/4223#issuecomment-1051244772


   @samredai FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] samredai edited a comment on pull request #4223: Docs: Clarify merge-on-read modes in docs for spark

Posted by GitBox <gi...@apache.org>.

samredai edited a comment on pull request #4223:
URL: https://github.com/apache/iceberg/pull/4223#issuecomment-1054419258


   ~These properties like `write.merge.mode` also need to be added to the [table configuration](https://github.com/apache/iceberg/blob/master/docs/versioned/tables/configuration.md) page right?~
   
   Nvm, it's already added but just not deployed yet!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org