You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by JaeSung Jun <ja...@gmail.com> on 2015/12/10 14:19:53 UTC

Does RDD[Type1, Iterable[Type2]] split into multiple partitions?

Hi,

I'm currently working on Iterable type of RDD, which is like :

val keyValueIterableRDD[CaseClass1, Iterable[CaseClass2]] = buildRDD(...)

If there is only one unique key and Iterable is big enough, would this
Iterable be partitioned across all executors like followings ?

(executor1)
(xxx, iterator from 0 to 10,000)

(executor2)
(xxx, iterator from 10,001 to 20,000)

(executor2)
(xxx, iterator from 20,001 to 30,000)

...

Thanks
Jason

Re: Does RDD[Type1, Iterable[Type2]] split into multiple partitions?

Posted by Reynold Xin <rx...@databricks.com>.

No, since the signature itself limits it.


On Thu, Dec 10, 2015 at 9:19 PM, JaeSung Jun <ja...@gmail.com> wrote:

> Hi,
>
> I'm currently working on Iterable type of RDD, which is like :
>
> val keyValueIterableRDD[CaseClass1, Iterable[CaseClass2]] = buildRDD(...)
>
> If there is only one unique key and Iterable is big enough, would this
> Iterable be partitioned across all executors like followings ?
>
> (executor1)
> (xxx, iterator from 0 to 10,000)
>
> (executor2)
> (xxx, iterator from 10,001 to 20,000)
>
> (executor2)
> (xxx, iterator from 20,001 to 30,000)
>
> ...
>
> Thanks
> Jason
>
>