You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2019/07/11 17:38:46 UTC
[GitHub] [incubator-iceberg] rdblue commented on issue #274: Error while
using bucket partitions
rdblue commented on issue #274: Error while using bucket partitions
URL: https://github.com/apache/incubator-iceberg/issues/274#issuecomment-510582767
Yeah, you need to get the bucket function to Spark and sort by that. Here's how to create a Spark UDF with the function:
```scala
import com.netflix.iceberg.transforms.Transforms
import com.netflix.iceberg.types.Types
import org.apache.spark.sql.types.IntegerType
// load the bucket transform from Iceberg to use as a UDF
val bucketTransform = Transforms.bucket[java.lang.Long](Types.LongType.get(), 16)
// needed because Scala has trouble with the Java transform type
def bucketFunc(id: Long): Int = bucketTransform.apply(id)
// create and register a UDF
val bucket16 = spark.udf.register("bucket16", bucketFunc _)
```
Then you can use it like this:
```sql
INSERT INTO table SELECT id, data FROM source ORDER BY bucket16(id)
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org