You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Gourav Sengupta <go...@gmail.com> on 2018/04/03 21:32:37 UTC

bucketing in SPARK

Hi,

I am going through the presentation
https://databricks.com/session/hive-bucketing-in-apache-spark.

Do we need to bucket both the tables for this to work? And is it mandatory
that the number of buckets should be multiple of each other?

Also if I export a persistent table to S3 will this still work? Or is there
a way that this can work for external tables in SPARK?


*SPARK Version:* 2.3.0:

*Method to initiate SPARK Session:*
sparkSession = SparkSession.builder \
                .config("spark.serializer",
"org.apache.spark.serializer.KryoSerializer") \
                .config("spark.sql.sources.bucketing.enabled", "true") \
                .appName("GouravTest").getOrCreate()


Regards,
Gourav Sengupta