You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/12/30 06:54:08 UTC

[GitHub] [iceberg] xiaotianzhang01 opened a new pull request #3820: Docs: Add `WRITE LOCALLY ORDERED BY` and `WRITE DISTRIBUTED BY` in spark-ddl.md

xiaotianzhang01 opened a new pull request #3820:
URL: https://github.com/apache/iceberg/pull/3820


   Enrich the description of the syntax of `write distribution and ordering` based on alter table. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] xiaotianzhang01 commented on a change in pull request #3820: Docs: Add `WRITE LOCALLY ORDERED BY` and `WRITE DISTRIBUTED BY` in spark-ddl.md

Posted by GitBox <gi...@apache.org>.
xiaotianzhang01 commented on a change in pull request #3820:
URL: https://github.com/apache/iceberg/pull/3820#discussion_r782911765



##########
File path: site/docs/spark-ddl.md
##########
@@ -360,3 +360,29 @@ ALTER TABLE prod.db.sample WRITE ORDERED BY category ASC NULLS LAST, id DESC NUL
 !!! Note
     Table write order does not guarantee data order for queries. It only affects how data is written to the table.
 
+Only local sorting can be set at the same time, use `LOCALLY ORDERED BY`
+
+```sql
+ALTER TABLE prod.db.sample WRITE LOCALLY ORDERED BY category, id
+-- use optional ASC/DEC keyword to specify sort order of each field (default ASC)
+ALTER TABLE prod.db.sample WRITE LOCALLY ORDERED BY category ASC, id DESC
+-- use optional NULLS FIRST/NULLS LAST keyword to specify null order of each field (default FIRST)
+ALTER TABLE prod.db.sample WRITE LOCALLY ORDERED BY category ASC NULLS LAST, id DESC NULLS FIRST
+```
+### `ALTER TABLE ... WRITE DISTRIBUTED BY PARTITION` 
+
+Iceberg tables can be configured with a hash distribution where tuples that share the same values for clustering expressions are

Review comment:
       okey




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #3820: Docs: Add `WRITE LOCALLY ORDERED BY` and `WRITE DISTRIBUTED BY` in spark-ddl.md

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #3820:
URL: https://github.com/apache/iceberg/pull/3820#issuecomment-1015615728


   Thanks, @xiaotianzhang01!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #3820: Docs: Add `WRITE LOCALLY ORDERED BY` and `WRITE DISTRIBUTED BY` in spark-ddl.md

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #3820:
URL: https://github.com/apache/iceberg/pull/3820#discussion_r777621881



##########
File path: site/docs/spark-ddl.md
##########
@@ -360,3 +360,29 @@ ALTER TABLE prod.db.sample WRITE ORDERED BY category ASC NULLS LAST, id DESC NUL
 !!! Note
     Table write order does not guarantee data order for queries. It only affects how data is written to the table.
 
+Only local sorting can be set at the same time, use `LOCALLY ORDERED BY`
+
+```sql
+ALTER TABLE prod.db.sample WRITE LOCALLY ORDERED BY category, id
+-- use optional ASC/DEC keyword to specify sort order of each field (default ASC)
+ALTER TABLE prod.db.sample WRITE LOCALLY ORDERED BY category ASC, id DESC
+-- use optional NULLS FIRST/NULLS LAST keyword to specify null order of each field (default FIRST)
+ALTER TABLE prod.db.sample WRITE LOCALLY ORDERED BY category ASC NULLS LAST, id DESC NULLS FIRST
+```
+### `ALTER TABLE ... WRITE DISTRIBUTED BY PARTITION` 
+
+Iceberg tables can be configured with a hash distribution where tuples that share the same values for clustering expressions are

Review comment:
       The requirement is to distribute by partition. Hash distribution is an implementation detail. Instead, I think this should state that `WRITE DISTRIBUTED BY PARTITION` will guarantee that a given partition is handled by one writer.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] xiaotianzhang01 commented on a change in pull request #3820: Docs: Add `WRITE LOCALLY ORDERED BY` and `WRITE DISTRIBUTED BY` in spark-ddl.md

Posted by GitBox <gi...@apache.org>.
xiaotianzhang01 commented on a change in pull request #3820:
URL: https://github.com/apache/iceberg/pull/3820#discussion_r782912061



##########
File path: site/docs/spark-ddl.md
##########
@@ -360,3 +360,29 @@ ALTER TABLE prod.db.sample WRITE ORDERED BY category ASC NULLS LAST, id DESC NUL
 !!! Note
     Table write order does not guarantee data order for queries. It only affects how data is written to the table.
 
+Only local sorting can be set at the same time, use `LOCALLY ORDERED BY`

Review comment:
       update




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #3820: Docs: Add `WRITE LOCALLY ORDERED BY` and `WRITE DISTRIBUTED BY` in spark-ddl.md

Posted by GitBox <gi...@apache.org>.
rdblue merged pull request #3820:
URL: https://github.com/apache/iceberg/pull/3820


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #3820: Docs: Add `WRITE LOCALLY ORDERED BY` and `WRITE DISTRIBUTED BY` in spark-ddl.md

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #3820:
URL: https://github.com/apache/iceberg/pull/3820#discussion_r777622685



##########
File path: site/docs/spark-ddl.md
##########
@@ -360,3 +360,29 @@ ALTER TABLE prod.db.sample WRITE ORDERED BY category ASC NULLS LAST, id DESC NUL
 !!! Note
     Table write order does not guarantee data order for queries. It only affects how data is written to the table.
 
+Only local sorting can be set at the same time, use `LOCALLY ORDERED BY`

Review comment:
       This should first state that `WRITE ORDERED BY` sets a global ordering where rows are ordered across tasks, like using `ORDER BY` in an `INSERT` command. Then introduce `LOCALLY ORDERED BY` to order within each task but not across tasks.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] xiaotianzhang01 commented on pull request #3820: Docs: Add `WRITE LOCALLY ORDERED BY` and `WRITE DISTRIBUTED BY` in spark-ddl.md

Posted by GitBox <gi...@apache.org>.
xiaotianzhang01 commented on pull request #3820:
URL: https://github.com/apache/iceberg/pull/3820#issuecomment-1002896840


   In addition, I would like to know whether the partition field can be set by WRITE DISTRIBUTED BY, how we designed it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org