You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/05/28 15:47:28 UTC

[GitHub] [iceberg] cccs-jc opened a new issue, #4889: Support ORDERED BY in CTAS statement

cccs-jc opened a new issue, #4889:
URL: https://github.com/apache/iceberg/issues/4889

   The dbt-spark adapter uses CTAS to create tables. 
   https://iceberg.apache.org/docs/latest/spark-ddl/#create-table--as-select
   
   ```sql
   REPLACE TABLE prod.db.sample
   USING iceberg
   PARTITIONED BY (part)
   TBLPROPERTIES ('key'='value')
   AS SELECT ...
   ```
   When the table is partition iceberg implicitly performs a order by for the given partitions. However for some tables you want to partition by day and also sort by user_id.
   
   You can achieve this by applying a write ordered by.
   
   https://iceberg.apache.org/docs/latest/spark-ddl/#alter-table--write-ordered-by
   
   However there does not seem to be a way to specify the ordered by clause when creating the table i.e.: using a CTAS. I would need this capability to implement support for ORDERED BY in dbt-spark https://github.com/dbt-labs/dbt-spark/issues/343
   ```sql
   REPLACE TABLE prod.db.sample
   USING iceberg
   PARTITIONED BY (part)
   ORDERED BY part, user_id
   TBLPROPERTIES ('key'='value')
   AS SELECT ...
   ```
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on issue #4889: Support ORDERED BY in CTAS statement

Posted by GitBox <gi...@apache.org>.
kbendick commented on issue #4889:
URL: https://github.com/apache/iceberg/issues/4889#issuecomment-1154198123

   Wanted to let you know I’ve seen this issue @cccs-jc 
   
   do you know if this is something that can be used with non-Iceberg datasourcev2 tables? If so, then we only need to update Iceberg. If not, we might need to update the upstream Spark as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] github-actions[bot] commented on issue #4889: Support ORDERED BY in CTAS statement

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #4889:
URL: https://github.com/apache/iceberg/issues/4889#issuecomment-1510005776

   This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] shay1bz commented on issue #4889: Support ORDERED BY in CTAS statement

Posted by GitBox <gi...@apache.org>.
shay1bz commented on issue #4889:
URL: https://github.com/apache/iceberg/issues/4889#issuecomment-1275677230

   It seems that Spark ( > 3) does not support creating sorted tables, unless the table is bucketed. Does this mean that the only way to create a sorted Iceberg table (without buckets) is to create it non-sorted, and then apply "WRITE ... ORDERED BY" ?
   
   Currently, we are creating the table with the Java API - not through Spark - with the desired sorting spec, and then writing the DataFrame to the existing, empty table. I'd like to avoid interacting with Iceberg API directly, but the other option (CTAS and then WRITE ORDERED BY) results in 2 Spark jobs. @kbendick I'd really appreciate you opinion on that :D Thanks.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] cccs-jc commented on issue #4889: Support ORDERED BY in CTAS statement

Posted by GitBox <gi...@apache.org>.
cccs-jc commented on issue #4889:
URL: https://github.com/apache/iceberg/issues/4889#issuecomment-1155054083

   I think this is an iceberg only issue. I don't know how it would be applied using spark dataframes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org