You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by ayan guha <gu...@gmail.com> on 2017/07/21 20:25:59 UTC

Re: Spark Data Frame Writer - Range Partiotioning

How about creating a partituon column and use it?

On Sat, 22 Jul 2017 at 2:47 am, Jain, Nishit <nj...@underarmour.com> wrote:

> Is it possible to have Spark Data Frame Writer write based on
> RangePartioning?
>
> For Ex -
>
> I have 10 distinct values for column_a, say 1 to 10.
>
> df.write
> .partitionBy("column_a")
>
> Above code by default will create 10 folders .. column_a=1,column_a=2
> ...column_a=10
>
> I want to see if it is possible to have these partitions based on bucket -
> col_a=1to5, col_a=5-10 .. or something like that? Then also have query
> engine respect it
>
> Thanks,
>
> Nishit
>
-- 
Best Regards,
Ayan Guha

Re: Spark Data Frame Writer - Range Partiotioning

Posted by "Jain, Nishit" <nj...@underarmour.com>.
But wouldn’t partitioning column partition the data only in Spark RDD? Would it also partition columns at disk when data is written (diving data in folders)?

From: ayan guha <gu...@gmail.com>>
Date: Friday, July 21, 2017 at 3:25 PM
To: "Jain, Nishit" <nj...@underarmour.com>>, "user@spark.apache.org<ma...@spark.apache.org>" <us...@spark.apache.org>>
Subject: Re: Spark Data Frame Writer - Range Partiotioning

How about creating a partituon column and use it?

On Sat, 22 Jul 2017 at 2:47 am, Jain, Nishit <nj...@underarmour.com>> wrote:

Is it possible to have Spark Data Frame Writer write based on RangePartioning?

For Ex -

I have 10 distinct values for column_a, say 1 to 10.

df.write
.partitionBy("column_a")


Above code by default will create 10 folders .. column_a=1,column_a=2 ...column_a=10

I want to see if it is possible to have these partitions based on bucket - col_a=1to5, col_a=5-10 .. or something like that? Then also have query engine respect it

Thanks,

Nishit

--
Best Regards,
Ayan Guha