You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Jia Zou <ja...@gmail.com> on 2016/07/25 17:50:53 UTC
JavaRDD.foreach (new VoidFunction<>...) always returns the last element
My code is as following:
System.out.println("Initialize points...");
JavaPairRDD<IntWritable, DoubleArrayWritable> data =
sc.sequenceFile(inputFile, IntWritable.class,
DoubleArrayWritable.class);
RDD<Tuple2<IntWritable, DoubleArrayWritable>> rdd =
JavaPairRDD.toRDD(data);
JavaRDD<Tuple2<IntWritable, DoubleArrayWritable>> points =
JavaRDD.fromRDD(rdd, data.classTag());
points.persist(StorageLevel.MEMORY_ONLY());
int i;
for (i=0; i<iterations; i++) {
System.out.println("iteration="+i);
//points.foreach(new
ForEachMapPointToCluster(numDimensions, numClusters));
points.foreach(new VoidFunction<Tuple2<IntWritable,
DoubleArrayWritable>>() {
public void call(Tuple2<IntWritable,
DoubleArrayWritable> tuple) {
IntWritable key = tuple._1();
System.out.println("key:"+key.get());
DoubleArrayWritable array = tuple._2();
double[] point = array.getData();
for (int d = 0; d < 20; d ++) {
System.out.println(d+":"+point[d]);
}
}
});
}
The output is a lot of following, only the last element in the rdd has been
output.
key:999
0:0.9953839426689233
1:0.12656798341145892
2:0.16621114723289654
3:0.48628049787614236
4:0.476991470215116
5:0.5033640235789054
6:0.09257098597507829
7:0.3153088440494892
8:0.8807426085223242
9:0.2809625780570739
10:0.9584880094505738
11:0.38521222520661547
12:0.5114241334425228
13:0.9524628903835111
14:0.5252549496842003
15:0.5732037830866236
16:0.8632451606583632
17:0.39754347061499895
18:0.2859522809981715
19:0.2659002343432888
key:999
0:0.9953839426689233
1:0.12656798341145892
2:0.16621114723289654
3:0.48628049787614236
4:0.476991470215116
5:0.5033640235789054
6:0.09257098597507829
7:0.3153088440494892
8:0.8807426085223242
9:0.2809625780570739
10:0.9584880094505738
11:0.38521222520661547
12:0.5114241334425228
13:0.9524628903835111
14:0.5252549496842003
15:0.5732037830866236
16:0.8632451606583632
17:0.39754347061499895
18:0.2859522809981715
19:0.2659002343432888
key:999
0:0.9953839426689233
1:0.12656798341145892
2:0.16621114723289654
3:0.48628049787614236
4:0.476991470215116
5:0.5033640235789054
6:0.09257098597507829
7:0.3153088440494892
8:0.8807426085223242
9:0.2809625780570739
10:0.9584880094505738
11:0.38521222520661547
12:0.5114241334425228
13:0.9524628903835111
14:0.5252549496842003
15:0.5732037830866236
16:0.8632451606583632
17:0.39754347061499895
18:0.2859522809981715
19:0.2659002343432888
Re: JavaRDD.foreach (new VoidFunction<>...) always returns the last element
Posted by Jia Zou <ja...@gmail.com>.
Hi Sean,
Thanks for your great help! It works all right if I remove persist!!
For next step, I will transform those values before persist.
I convert to RDD and back to JavaRDD just for testing purposes.
Best Regards,
Jia
On Mon, Jul 25, 2016 at 1:01 PM, Sean Owen <so...@cloudera.com> wrote:
> Why are you converting to RDD and back to JavaRDD?
> The problem is storing references to Writable, which are mutated by the
> InputFormat. Somewhere you have 1000 refs to the same key. I think it may
> be the persist. You want to immediately transform these values to something
> besides a Writable.
>
> On Mon, Jul 25, 2016, 18:50 Jia Zou <ja...@gmail.com> wrote:
>
>>
>> My code is as following:
>>
>> System.out.println("Initialize points...");
>>
>> JavaPairRDD<IntWritable, DoubleArrayWritable> data =
>>
>> sc.sequenceFile(inputFile, IntWritable.
>> class, DoubleArrayWritable.class);
>>
>> RDD<Tuple2<IntWritable, DoubleArrayWritable>> rdd =
>>
>> JavaPairRDD.toRDD(data);
>>
>> JavaRDD<Tuple2<IntWritable, DoubleArrayWritable>> points
>> = JavaRDD.fromRDD(rdd, data.classTag());
>>
>> points.persist(StorageLevel.MEMORY_ONLY());
>>
>> int i;
>>
>>
>> for (i=0; i<iterations; i++) {
>>
>> System.out.println("iteration="+i);
>>
>> //points.foreach(new
>> ForEachMapPointToCluster(numDimensions, numClusters));
>>
>> points.foreach(new
>> VoidFunction<Tuple2<IntWritable, DoubleArrayWritable>>() {
>>
>> public void call(Tuple2<IntWritable,
>> DoubleArrayWritable> tuple) {
>>
>> IntWritable key = tuple._1();
>>
>> System.out.println("key:"+key.get());
>>
>> DoubleArrayWritable array = tuple._2();
>>
>> double[] point = array.getData();
>>
>> for (int d = 0; d < 20; d ++) {
>>
>> System.out.println(d+":"+point[d]);
>>
>> }
>>
>> }
>>
>> });
>>
>> }
>>
>>
>> The output is a lot of following, only the last element in the rdd has
>> been output.
>>
>> key:999
>>
>> 0:0.9953839426689233
>>
>> 1:0.12656798341145892
>>
>> 2:0.16621114723289654
>>
>> 3:0.48628049787614236
>>
>> 4:0.476991470215116
>>
>> 5:0.5033640235789054
>>
>> 6:0.09257098597507829
>>
>> 7:0.3153088440494892
>>
>> 8:0.8807426085223242
>>
>> 9:0.2809625780570739
>>
>> 10:0.9584880094505738
>>
>> 11:0.38521222520661547
>>
>> 12:0.5114241334425228
>>
>> 13:0.9524628903835111
>>
>> 14:0.5252549496842003
>>
>> 15:0.5732037830866236
>>
>> 16:0.8632451606583632
>>
>> 17:0.39754347061499895
>>
>> 18:0.2859522809981715
>>
>> 19:0.2659002343432888
>>
>> key:999
>>
>> 0:0.9953839426689233
>>
>> 1:0.12656798341145892
>>
>> 2:0.16621114723289654
>>
>> 3:0.48628049787614236
>>
>> 4:0.476991470215116
>>
>> 5:0.5033640235789054
>>
>> 6:0.09257098597507829
>>
>> 7:0.3153088440494892
>>
>> 8:0.8807426085223242
>>
>> 9:0.2809625780570739
>>
>> 10:0.9584880094505738
>>
>> 11:0.38521222520661547
>>
>> 12:0.5114241334425228
>>
>> 13:0.9524628903835111
>>
>> 14:0.5252549496842003
>>
>> 15:0.5732037830866236
>>
>> 16:0.8632451606583632
>>
>> 17:0.39754347061499895
>>
>> 18:0.2859522809981715
>>
>> 19:0.2659002343432888
>>
>> key:999
>>
>> 0:0.9953839426689233
>>
>> 1:0.12656798341145892
>>
>> 2:0.16621114723289654
>>
>> 3:0.48628049787614236
>>
>> 4:0.476991470215116
>>
>> 5:0.5033640235789054
>>
>> 6:0.09257098597507829
>>
>> 7:0.3153088440494892
>>
>> 8:0.8807426085223242
>>
>> 9:0.2809625780570739
>>
>> 10:0.9584880094505738
>>
>> 11:0.38521222520661547
>>
>> 12:0.5114241334425228
>>
>> 13:0.9524628903835111
>>
>> 14:0.5252549496842003
>>
>> 15:0.5732037830866236
>>
>> 16:0.8632451606583632
>>
>> 17:0.39754347061499895
>>
>> 18:0.2859522809981715
>>
>> 19:0.2659002343432888
>>
>
Re: JavaRDD.foreach (new VoidFunction<>...) always returns the last element
Posted by Sean Owen <so...@cloudera.com>.
Why are you converting to RDD and back to JavaRDD?
The problem is storing references to Writable, which are mutated by the
InputFormat. Somewhere you have 1000 refs to the same key. I think it may
be the persist. You want to immediately transform these values to something
besides a Writable.
On Mon, Jul 25, 2016, 18:50 Jia Zou <ja...@gmail.com> wrote:
>
> My code is as following:
>
> System.out.println("Initialize points...");
>
> JavaPairRDD<IntWritable, DoubleArrayWritable> data =
>
> sc.sequenceFile(inputFile, IntWritable.
> class, DoubleArrayWritable.class);
>
> RDD<Tuple2<IntWritable, DoubleArrayWritable>> rdd =
>
> JavaPairRDD.toRDD(data);
>
> JavaRDD<Tuple2<IntWritable, DoubleArrayWritable>> points =
> JavaRDD.fromRDD(rdd, data.classTag());
>
> points.persist(StorageLevel.MEMORY_ONLY());
>
> int i;
>
>
> for (i=0; i<iterations; i++) {
>
> System.out.println("iteration="+i);
>
> //points.foreach(new
> ForEachMapPointToCluster(numDimensions, numClusters));
>
> points.foreach(new
> VoidFunction<Tuple2<IntWritable, DoubleArrayWritable>>() {
>
> public void call(Tuple2<IntWritable,
> DoubleArrayWritable> tuple) {
>
> IntWritable key = tuple._1();
>
> System.out.println("key:"+key.get());
>
> DoubleArrayWritable array = tuple._2();
>
> double[] point = array.getData();
>
> for (int d = 0; d < 20; d ++) {
>
> System.out.println(d+":"+point[d]);
>
> }
>
> }
>
> });
>
> }
>
>
> The output is a lot of following, only the last element in the rdd has
> been output.
>
> key:999
>
> 0:0.9953839426689233
>
> 1:0.12656798341145892
>
> 2:0.16621114723289654
>
> 3:0.48628049787614236
>
> 4:0.476991470215116
>
> 5:0.5033640235789054
>
> 6:0.09257098597507829
>
> 7:0.3153088440494892
>
> 8:0.8807426085223242
>
> 9:0.2809625780570739
>
> 10:0.9584880094505738
>
> 11:0.38521222520661547
>
> 12:0.5114241334425228
>
> 13:0.9524628903835111
>
> 14:0.5252549496842003
>
> 15:0.5732037830866236
>
> 16:0.8632451606583632
>
> 17:0.39754347061499895
>
> 18:0.2859522809981715
>
> 19:0.2659002343432888
>
> key:999
>
> 0:0.9953839426689233
>
> 1:0.12656798341145892
>
> 2:0.16621114723289654
>
> 3:0.48628049787614236
>
> 4:0.476991470215116
>
> 5:0.5033640235789054
>
> 6:0.09257098597507829
>
> 7:0.3153088440494892
>
> 8:0.8807426085223242
>
> 9:0.2809625780570739
>
> 10:0.9584880094505738
>
> 11:0.38521222520661547
>
> 12:0.5114241334425228
>
> 13:0.9524628903835111
>
> 14:0.5252549496842003
>
> 15:0.5732037830866236
>
> 16:0.8632451606583632
>
> 17:0.39754347061499895
>
> 18:0.2859522809981715
>
> 19:0.2659002343432888
>
> key:999
>
> 0:0.9953839426689233
>
> 1:0.12656798341145892
>
> 2:0.16621114723289654
>
> 3:0.48628049787614236
>
> 4:0.476991470215116
>
> 5:0.5033640235789054
>
> 6:0.09257098597507829
>
> 7:0.3153088440494892
>
> 8:0.8807426085223242
>
> 9:0.2809625780570739
>
> 10:0.9584880094505738
>
> 11:0.38521222520661547
>
> 12:0.5114241334425228
>
> 13:0.9524628903835111
>
> 14:0.5252549496842003
>
> 15:0.5732037830866236
>
> 16:0.8632451606583632
>
> 17:0.39754347061499895
>
> 18:0.2859522809981715
>
> 19:0.2659002343432888
>