You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by "张志强(旺轩)" <zz...@alibaba-inc.com> on 2015/10/29 06:15:21 UTC

sample or takeSample or ??

How do I to get a NEW RDD that has a number of elements that I specified?
Sample()? It has no the number parameter, takeSample() it returns as a list?

 

Help, please.


Re: sample or takeSample or ??

Posted by Akhil Das <ak...@sigmoidanalytics.com>.
You can't create a new RDD by selecting few elements. A rdd.take(n),
takeSample etc are actions and it will trigger your entire pipeline to be
executed.
You can although do something like this i guess:

val sample_data = rdd.take(10)

val sample_rdd = sc.parallelize(sample_data)



Thanks
Best Regards

On Thu, Oct 29, 2015 at 10:45 AM, 张志强(旺轩) <zz...@alibaba-inc.com> wrote:

> How do I to get a NEW RDD that has a number of elements that I specified?
> Sample()? It has no the number parameter, takeSample() it returns as a list?
>
>
>
> Help, please.
>