You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Selvam Raman <se...@gmail.com> on 2016/11/08 21:40:43 UTC
How Spark determines Parquet partition size
Hi,
Can you please tell me how parquet partitions the data while saving the
dataframe.
I have a dataframe which contains 10 values like below
+------------+
|field_num|
+------------+
| 139|
| 140|
| 40|
| 41|
| 148|
| 149|
| 151|
| 152|
| 153|
| 154|
+------------+
df.write.partitionBy("field_num").parquet("/Users/rs/parti/")
it saves the file like (field_num=140,.....filed_num=154)..
when i try the below command it gives 5.
scala> spark.read.parquet("file:///Users/rs/parti").rdd.partitions.length
res4: Int = 5
so how does parquet partitioning the data in spark?
--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"