You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/09/02 16:45:36 UTC

[GitHub] [iceberg] markope opened a new issue #3063: Documentation of partitioned write has a wrong order by expression in the example

markope opened a new issue #3063:
URL: https://github.com/apache/iceberg/issues/3063


   In your [documentation on partitioned writes](https://iceberg.apache.org/spark-writes/#writing-to-partitioned-tables) you mention that data needs to be ordered by the partition clause to avoid the "file already closed" error when writing to disk. 
   
   Create a table like so:
   
   ```sql
   CREATE TABLE prod.db.sample (
       id bigint,
       data string,
       category string,
       ts timestamp)
   USING iceberg
   PARTITIONED BY (days(ts), category)
   ```
   
   Then you need to insert into the table with spark sql with a `date_trunc("day", ts)` because if the `ts` column contains hour detail then this might skew up the category order when ordering by only `ts`.
   
   ```sql
   INSERT INTO prod.db.sample
   SELECT id, data, category, ts FROM another_table
   ORDER BY date_trunc("day", ts), category
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org