You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Gourav Sengupta <go...@gmail.com> on 2018/04/03 21:32:37 UTC
bucketing in SPARK
Hi,
I am going through the presentation
https://databricks.com/session/hive-bucketing-in-apache-spark.
Do we need to bucket both the tables for this to work? And is it mandatory
that the number of buckets should be multiple of each other?
Also if I export a persistent table to S3 will this still work? Or is there
a way that this can work for external tables in SPARK?
*SPARK Version:* 2.3.0:
*Method to initiate SPARK Session:*
sparkSession = SparkSession.builder \
.config("spark.serializer",
"org.apache.spark.serializer.KryoSerializer") \
.config("spark.sql.sources.bucketing.enabled", "true") \
.appName("GouravTest").getOrCreate()
Regards,
Gourav Sengupta