You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by 王永春 <yo...@audaque.com> on 2014/03/17 10:39:53 UTC

Question about RDD creations in Spark

Hello. I have a question about RDD creations in Spark. When will a new RDD be created?

I got a initial RDD from the hadoopRDD method of SparkContext and do a count action on it. After that I
could examine the RDD from the driver program's webui page. Then I do a flatMap transformation on the
initial RDD and do a count action on the returned RDD reference from the former transformation. I expected
that I could got a new RDD entry on the driver program's webui page beside the former one, but it's not
the fact. There was still only one RDD entry - the initial RDD. 

Who can tell me whether no new RDD was created after the flatMap transformation and a successive count
action or it had been created but not listed on the driver program's webui page? Following is the  java code
fragment.

JavaSparkContext sc = new JavaSparkContext(...);
JavaRDD rdd0 = sc.hadoopRDD(...);
rdd0.cache();
rdd0.count();  // I could exam the initial RDD from the webui page after this action.

JavaRDD rdd1 = rdd0.flatMap(...);
rdd1.cache();
rdd1.count(); // I expected a new RDD be created after this action, but it seems not the fact.

----
Yongchun Wang
 Audaque Data Technology Ltd.
Shenzhen, China