You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by 王永春 <yo...@audaque.com> on 2014/03/17 10:39:53 UTC
Question about RDD creations in Spark
Hello. I have a question about RDD creations in Spark. When will a new RDD be created?
I got a initial RDD from the hadoopRDD method of SparkContext and do a count action on it. After that I
could examine the RDD from the driver program's webui page. Then I do a flatMap transformation on the
initial RDD and do a count action on the returned RDD reference from the former transformation. I expected
that I could got a new RDD entry on the driver program's webui page beside the former one, but it's not
the fact. There was still only one RDD entry - the initial RDD.
Who can tell me whether no new RDD was created after the flatMap transformation and a successive count
action or it had been created but not listed on the driver program's webui page? Following is the java code
fragment.
JavaSparkContext sc = new JavaSparkContext(...);
JavaRDD rdd0 = sc.hadoopRDD(...);
rdd0.cache();
rdd0.count(); // I could exam the initial RDD from the webui page after this action.
JavaRDD rdd1 = rdd0.flatMap(...);
rdd1.cache();
rdd1.count(); // I expected a new RDD be created after this action, but it seems not the fact.
----
Yongchun Wang
Audaque Data Technology Ltd.
Shenzhen, China