You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/09/07 14:58:00 UTC
[GitHub] [iceberg] mrendi29 opened a new issue, #5721: Registering BucketUDF on PySpark
mrendi29 opened a new issue, #5721:
URL: https://github.com/apache/iceberg/issues/5721
### Query engine
Apache Spark (PySpark)
### Question
On https://iceberg.apache.org/docs/latest/spark-writes/#writing-to-partitioned-tables if you need to register the bucket UDF you can do so by:
```
import org.apache.iceberg.spark.IcebergSpark
import org.apache.spark.sql.types.DataTypes
IcebergSpark.registerBucketUDF(spark, "iceberg_bucket16", DataTypes.LongType, 16)
```
How would we do this in PySpark? Does the method below work or is there another suggested method?
```
from pyspark.sql.types import LongType
from pyspark.sql import functions as F
spark.udf.register("iceberg_bucket16", F.bucket(16), LongType())
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] TechTinkerer42 commented on issue #5721: Registering BucketUDF on PySpark
Posted by "TechTinkerer42 (via GitHub)" <gi...@apache.org>.
TechTinkerer42 commented on issue #5721:
URL: https://github.com/apache/iceberg/issues/5721#issuecomment-1694941784
Here is an example of Pyspark code to register a bucketing UDF. @mrendi29, I hope you have figured out your issue by now, but I'm sharing this in case it helps someone else.
` # Register bucket UDF
jvm_gateway = spark.sparkContext._gateway.jvm
iceberg_spark = jvm_gateway.org.apache.iceberg.spark.IcebergSpark
data_types = jvm_gateway.org.apache.spark.sql.types.DataTypes
# 100 is the number of buckets
iceberg_spark.registerBucketUDF(spark._jsparkSession, "iceberg_bucket", data_types.StringType, 100)
`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] fireking77 commented on issue #5721: Registering BucketUDF on PySpark
Posted by "fireking77 (via GitHub)" <gi...@apache.org>.
fireking77 commented on issue #5721:
URL: https://github.com/apache/iceberg/issues/5721#issuecomment-1407757922
I would also curious about this question!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org