You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/05/28 15:47:28 UTC
[GitHub] [iceberg] cccs-jc opened a new issue, #4889: Support ORDERED BY in CTAS statement
cccs-jc opened a new issue, #4889:
URL: https://github.com/apache/iceberg/issues/4889
The dbt-spark adapter uses CTAS to create tables.
https://iceberg.apache.org/docs/latest/spark-ddl/#create-table--as-select
```sql
REPLACE TABLE prod.db.sample
USING iceberg
PARTITIONED BY (part)
TBLPROPERTIES ('key'='value')
AS SELECT ...
```
When the table is partition iceberg implicitly performs a order by for the given partitions. However for some tables you want to partition by day and also sort by user_id.
You can achieve this by applying a write ordered by.
https://iceberg.apache.org/docs/latest/spark-ddl/#alter-table--write-ordered-by
However there does not seem to be a way to specify the ordered by clause when creating the table i.e.: using a CTAS. I would need this capability to implement support for ORDERED BY in dbt-spark https://github.com/dbt-labs/dbt-spark/issues/343
```sql
REPLACE TABLE prod.db.sample
USING iceberg
PARTITIONED BY (part)
ORDERED BY part, user_id
TBLPROPERTIES ('key'='value')
AS SELECT ...
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] kbendick commented on issue #4889: Support ORDERED BY in CTAS statement
Posted by GitBox <gi...@apache.org>.
kbendick commented on issue #4889:
URL: https://github.com/apache/iceberg/issues/4889#issuecomment-1154198123
Wanted to let you know I’ve seen this issue @cccs-jc
do you know if this is something that can be used with non-Iceberg datasourcev2 tables? If so, then we only need to update Iceberg. If not, we might need to update the upstream Spark as well.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] github-actions[bot] commented on issue #4889: Support ORDERED BY in CTAS statement
Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #4889:
URL: https://github.com/apache/iceberg/issues/4889#issuecomment-1510005776
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] shay1bz commented on issue #4889: Support ORDERED BY in CTAS statement
Posted by GitBox <gi...@apache.org>.
shay1bz commented on issue #4889:
URL: https://github.com/apache/iceberg/issues/4889#issuecomment-1275677230
It seems that Spark ( > 3) does not support creating sorted tables, unless the table is bucketed. Does this mean that the only way to create a sorted Iceberg table (without buckets) is to create it non-sorted, and then apply "WRITE ... ORDERED BY" ?
Currently, we are creating the table with the Java API - not through Spark - with the desired sorting spec, and then writing the DataFrame to the existing, empty table. I'd like to avoid interacting with Iceberg API directly, but the other option (CTAS and then WRITE ORDERED BY) results in 2 Spark jobs. @kbendick I'd really appreciate you opinion on that :D Thanks.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] cccs-jc commented on issue #4889: Support ORDERED BY in CTAS statement
Posted by GitBox <gi...@apache.org>.
cccs-jc commented on issue #4889:
URL: https://github.com/apache/iceberg/issues/4889#issuecomment-1155054083
I think this is an iceberg only issue. I don't know how it would be applied using spark dataframes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org