You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/09/02 16:45:36 UTC
[GitHub] [iceberg] markope opened a new issue #3063: Documentation of partitioned write has a wrong order by expression in the example
markope opened a new issue #3063:
URL: https://github.com/apache/iceberg/issues/3063
In your [documentation on partitioned writes](https://iceberg.apache.org/spark-writes/#writing-to-partitioned-tables) you mention that data needs to be ordered by the partition clause to avoid the "file already closed" error when writing to disk.
Create a table like so:
```sql
CREATE TABLE prod.db.sample (
id bigint,
data string,
category string,
ts timestamp)
USING iceberg
PARTITIONED BY (days(ts), category)
```
Then you need to insert into the table with spark sql with a `date_trunc("day", ts)` because if the `ts` column contains hour detail then this might skew up the category order when ordering by only `ts`.
```sql
INSERT INTO prod.db.sample
SELECT id, data, category, ts FROM another_table
ORDER BY date_trunc("day", ts), category
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org