You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/09/23 04:05:32 UTC

[GitHub] [iceberg] dmgcodevil opened a new issue #3168: How to sort Spark DataFrame

dmgcodevil opened a new issue #3168:
URL: https://github.com/apache/iceberg/issues/3168


   I have the following partition spec:
   
   ```
   timestamp: day
   id: bucket-10
   ```
   
   This is how I sort spark dataFrame:
   
   
   
   ```
   IcebergSpark.registerBucketUDF(sc, "iceberg_bucket10", DataTypes.StringType, 10)
   source.sortWithinPartitions(col("int_field"), expr("iceberg_bucket10(id)"))
   ```
   
   However, the job is failing:
   
   ```
   java.lang.IllegalStateException: Already closed files for partition: timestamp_day=2021-09-23/id_bucket=0
   ```
   
   What I'm doing wrong ?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] dmgcodevil edited a comment on issue #3168: How to sort Spark DataFrame ?

Posted by GitBox <gi...@apache.org>.
dmgcodevil edited a comment on issue #3168:
URL: https://github.com/apache/iceberg/issues/3168#issuecomment-926128541


   However, if I reverse the order of columns, i.e.:
   
   ```
   source.sortWithinPartitions(expr("iceberg_bucket10(id)"), col("int_field"))
   ```
   
   it's working
   
   is that expected behavior?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] dmgcodevil commented on issue #3168: How to sort Spark DataFrame ?

Posted by GitBox <gi...@apache.org>.
dmgcodevil commented on issue #3168:
URL: https://github.com/apache/iceberg/issues/3168#issuecomment-926128541


   However, if I reverse the order of columns, i.e.:
   
   ```
   source.sortWithinPartitions(expr("iceberg_bucket10(id)"), col("int_field"))
   ```
   
   it's working


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org