You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by ayan guha <gu...@gmail.com> on 2017/07/21 20:25:59 UTC
Re: Spark Data Frame Writer - Range Partiotioning
How about creating a partituon column and use it?
On Sat, 22 Jul 2017 at 2:47 am, Jain, Nishit <nj...@underarmour.com> wrote:
> Is it possible to have Spark Data Frame Writer write based on
> RangePartioning?
>
> For Ex -
>
> I have 10 distinct values for column_a, say 1 to 10.
>
> df.write
> .partitionBy("column_a")
>
> Above code by default will create 10 folders .. column_a=1,column_a=2
> ...column_a=10
>
> I want to see if it is possible to have these partitions based on bucket -
> col_a=1to5, col_a=5-10 .. or something like that? Then also have query
> engine respect it
>
> Thanks,
>
> Nishit
>
--
Best Regards,
Ayan Guha
Re: Spark Data Frame Writer - Range Partiotioning
Posted by "Jain, Nishit" <nj...@underarmour.com>.
But wouldn’t partitioning column partition the data only in Spark RDD? Would it also partition columns at disk when data is written (diving data in folders)?
From: ayan guha <gu...@gmail.com>>
Date: Friday, July 21, 2017 at 3:25 PM
To: "Jain, Nishit" <nj...@underarmour.com>>, "user@spark.apache.org<ma...@spark.apache.org>" <us...@spark.apache.org>>
Subject: Re: Spark Data Frame Writer - Range Partiotioning
How about creating a partituon column and use it?
On Sat, 22 Jul 2017 at 2:47 am, Jain, Nishit <nj...@underarmour.com>> wrote:
Is it possible to have Spark Data Frame Writer write based on RangePartioning?
For Ex -
I have 10 distinct values for column_a, say 1 to 10.
df.write
.partitionBy("column_a")
Above code by default will create 10 folders .. column_a=1,column_a=2 ...column_a=10
I want to see if it is possible to have these partitions based on bucket - col_a=1to5, col_a=5-10 .. or something like that? Then also have query engine respect it
Thanks,
Nishit
--
Best Regards,
Ayan Guha