You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "Kali.tummala@gmail.com" <Ka...@gmail.com> on 2015/10/18 19:56:59 UTC

Pass spark partition explicitly ?

Hi All, 

can I pass number of partitions to all the RDD explicitly while submitting
the spark Job or di=o I need to mention in my spark code itself ?

Thanks
Sri 



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Pass-spark-partition-explicitly-tp25113.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Pass spark partition explicitly ?

Posted by sri hari kali charan Tummala <ka...@gmail.com>.
Hi Richard,

Thanks so my take from your discussion is we want pass explicitly partition
values it have to be written inside the code.

Thanks
Sri

On Sun, Oct 18, 2015 at 7:05 PM, Richard Eggert <ri...@gmail.com>
wrote:

> If you want to override the default partitioning behavior,  you have to do
> so in your code where you create each RDD. Different RDDs usually have
> different numbers of partitions (except when one RDD is directly derived
> from another without shuffling) because they usually have different sizes,
> so it wouldn't make sense to have some sort of "global" notion of how many
> partitions to create.  You could,  if you wanted,  pass partition counts in
> as command line options to your application and use those values in your
> code that creates the RDDs, of course.
>
> Rich
> On Oct 18, 2015 1:57 PM, "Kali.tummala@gmail.com" <Ka...@gmail.com>
> wrote:
>
>> Hi All,
>>
>> can I pass number of partitions to all the RDD explicitly while submitting
>> the spark Job or di=o I need to mention in my spark code itself ?
>>
>> Thanks
>> Sri
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Pass-spark-partition-explicitly-tp25113.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>


-- 
Thanks & Regards
Sri Tummala

Re: Pass spark partition explicitly ?

Posted by Richard Eggert <ri...@gmail.com>.
If you want to override the default partitioning behavior,  you have to do
so in your code where you create each RDD. Different RDDs usually have
different numbers of partitions (except when one RDD is directly derived
from another without shuffling) because they usually have different sizes,
so it wouldn't make sense to have some sort of "global" notion of how many
partitions to create.  You could,  if you wanted,  pass partition counts in
as command line options to your application and use those values in your
code that creates the RDDs, of course.

Rich
On Oct 18, 2015 1:57 PM, "Kali.tummala@gmail.com" <Ka...@gmail.com>
wrote:

> Hi All,
>
> can I pass number of partitions to all the RDD explicitly while submitting
> the spark Job or di=o I need to mention in my spark code itself ?
>
> Thanks
> Sri
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Pass-spark-partition-explicitly-tp25113.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>