You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2019/07/11 17:38:46 UTC

[GitHub] [incubator-iceberg] rdblue commented on issue #274: Error while using bucket partitions

rdblue commented on issue #274: Error while using bucket partitions
URL: https://github.com/apache/incubator-iceberg/issues/274#issuecomment-510582767
 
 
   Yeah, you need to get the bucket function to Spark and sort by that. Here's how to create a Spark UDF with the function:
   
   ```scala
   import com.netflix.iceberg.transforms.Transforms
   import com.netflix.iceberg.types.Types
   import org.apache.spark.sql.types.IntegerType
   
   // load the bucket transform from Iceberg to use as a UDF
   val bucketTransform = Transforms.bucket[java.lang.Long](Types.LongType.get(), 16)
   
   // needed because Scala has trouble with the Java transform type
   def bucketFunc(id: Long): Int = bucketTransform.apply(id)
   
   // create and register a UDF
   val bucket16 = spark.udf.register("bucket16", bucketFunc _)
   ```
   
   Then you can use it like this:
   
   ```sql
   INSERT INTO table SELECT id, data FROM source ORDER BY bucket16(id)
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org