You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Jia Zou <ja...@gmail.com> on 2016/07/25 17:50:53 UTC

JavaRDD.foreach (new VoidFunction<>...) always returns the last element

My code is as following:

                System.out.println("Initialize points...");

                JavaPairRDD<IntWritable, DoubleArrayWritable> data =

                                sc.sequenceFile(inputFile, IntWritable.class,
DoubleArrayWritable.class);

                RDD<Tuple2<IntWritable, DoubleArrayWritable>> rdd =

                                JavaPairRDD.toRDD(data);

                JavaRDD<Tuple2<IntWritable, DoubleArrayWritable>> points =
JavaRDD.fromRDD(rdd, data.classTag());

                points.persist(StorageLevel.MEMORY_ONLY());

                int i;


              for (i=0; i<iterations; i++) {

                        System.out.println("iteration="+i);

                        //points.foreach(new
ForEachMapPointToCluster(numDimensions, numClusters));

                        points.foreach(new VoidFunction<Tuple2<IntWritable,
DoubleArrayWritable>>() {

                            public void call(Tuple2<IntWritable,
DoubleArrayWritable> tuple) {

                                IntWritable key = tuple._1();

                                System.out.println("key:"+key.get());

                                DoubleArrayWritable array = tuple._2();

                                double[] point = array.getData();

                                for (int d = 0; d < 20; d ++) {

                                    System.out.println(d+":"+point[d]);

                                }

                            }

                        });

                }


The output is a lot of following, only the last element in the rdd has been
output.

key:999

0:0.9953839426689233

1:0.12656798341145892

2:0.16621114723289654

3:0.48628049787614236

4:0.476991470215116

5:0.5033640235789054

6:0.09257098597507829

7:0.3153088440494892

8:0.8807426085223242

9:0.2809625780570739

10:0.9584880094505738

11:0.38521222520661547

12:0.5114241334425228

13:0.9524628903835111

14:0.5252549496842003

15:0.5732037830866236

16:0.8632451606583632

17:0.39754347061499895

18:0.2859522809981715

19:0.2659002343432888

key:999

0:0.9953839426689233

1:0.12656798341145892

2:0.16621114723289654

3:0.48628049787614236

4:0.476991470215116

5:0.5033640235789054

6:0.09257098597507829

7:0.3153088440494892

8:0.8807426085223242

9:0.2809625780570739

10:0.9584880094505738

11:0.38521222520661547

12:0.5114241334425228

13:0.9524628903835111

14:0.5252549496842003

15:0.5732037830866236

16:0.8632451606583632

17:0.39754347061499895

18:0.2859522809981715

19:0.2659002343432888

key:999

0:0.9953839426689233

1:0.12656798341145892

2:0.16621114723289654

3:0.48628049787614236

4:0.476991470215116

5:0.5033640235789054

6:0.09257098597507829

7:0.3153088440494892

8:0.8807426085223242

9:0.2809625780570739

10:0.9584880094505738

11:0.38521222520661547

12:0.5114241334425228

13:0.9524628903835111

14:0.5252549496842003

15:0.5732037830866236

16:0.8632451606583632

17:0.39754347061499895

18:0.2859522809981715

19:0.2659002343432888

Re: JavaRDD.foreach (new VoidFunction<>...) always returns the last element

Posted by Jia Zou <ja...@gmail.com>.
Hi Sean,

Thanks for your great help! It works all right if I remove persist!!

For next step, I will transform those values before persist.
I convert to RDD and back to JavaRDD just for testing purposes.

Best Regards,
Jia

On Mon, Jul 25, 2016 at 1:01 PM, Sean Owen <so...@cloudera.com> wrote:

> Why are you converting to RDD and back to JavaRDD?
> The problem is storing references to Writable, which are mutated by the
> InputFormat. Somewhere you have 1000 refs to the same key. I think it may
> be the persist. You want to immediately transform these values to something
> besides a Writable.
>
> On Mon, Jul 25, 2016, 18:50 Jia Zou <ja...@gmail.com> wrote:
>
>>
>> My code is as following:
>>
>>                 System.out.println("Initialize points...");
>>
>>                 JavaPairRDD<IntWritable, DoubleArrayWritable> data =
>>
>>                                 sc.sequenceFile(inputFile, IntWritable.
>> class, DoubleArrayWritable.class);
>>
>>                 RDD<Tuple2<IntWritable, DoubleArrayWritable>> rdd =
>>
>>                                 JavaPairRDD.toRDD(data);
>>
>>                 JavaRDD<Tuple2<IntWritable, DoubleArrayWritable>> points
>> = JavaRDD.fromRDD(rdd, data.classTag());
>>
>>                 points.persist(StorageLevel.MEMORY_ONLY());
>>
>>                 int i;
>>
>>
>>               for (i=0; i<iterations; i++) {
>>
>>                         System.out.println("iteration="+i);
>>
>>                         //points.foreach(new
>> ForEachMapPointToCluster(numDimensions, numClusters));
>>
>>                         points.foreach(new
>> VoidFunction<Tuple2<IntWritable, DoubleArrayWritable>>() {
>>
>>                             public void call(Tuple2<IntWritable,
>> DoubleArrayWritable> tuple) {
>>
>>                                 IntWritable key = tuple._1();
>>
>>                                 System.out.println("key:"+key.get());
>>
>>                                 DoubleArrayWritable array = tuple._2();
>>
>>                                 double[] point = array.getData();
>>
>>                                 for (int d = 0; d < 20; d ++) {
>>
>>                                     System.out.println(d+":"+point[d]);
>>
>>                                 }
>>
>>                             }
>>
>>                         });
>>
>>                 }
>>
>>
>> The output is a lot of following, only the last element in the rdd has
>> been output.
>>
>> key:999
>>
>> 0:0.9953839426689233
>>
>> 1:0.12656798341145892
>>
>> 2:0.16621114723289654
>>
>> 3:0.48628049787614236
>>
>> 4:0.476991470215116
>>
>> 5:0.5033640235789054
>>
>> 6:0.09257098597507829
>>
>> 7:0.3153088440494892
>>
>> 8:0.8807426085223242
>>
>> 9:0.2809625780570739
>>
>> 10:0.9584880094505738
>>
>> 11:0.38521222520661547
>>
>> 12:0.5114241334425228
>>
>> 13:0.9524628903835111
>>
>> 14:0.5252549496842003
>>
>> 15:0.5732037830866236
>>
>> 16:0.8632451606583632
>>
>> 17:0.39754347061499895
>>
>> 18:0.2859522809981715
>>
>> 19:0.2659002343432888
>>
>> key:999
>>
>> 0:0.9953839426689233
>>
>> 1:0.12656798341145892
>>
>> 2:0.16621114723289654
>>
>> 3:0.48628049787614236
>>
>> 4:0.476991470215116
>>
>> 5:0.5033640235789054
>>
>> 6:0.09257098597507829
>>
>> 7:0.3153088440494892
>>
>> 8:0.8807426085223242
>>
>> 9:0.2809625780570739
>>
>> 10:0.9584880094505738
>>
>> 11:0.38521222520661547
>>
>> 12:0.5114241334425228
>>
>> 13:0.9524628903835111
>>
>> 14:0.5252549496842003
>>
>> 15:0.5732037830866236
>>
>> 16:0.8632451606583632
>>
>> 17:0.39754347061499895
>>
>> 18:0.2859522809981715
>>
>> 19:0.2659002343432888
>>
>> key:999
>>
>> 0:0.9953839426689233
>>
>> 1:0.12656798341145892
>>
>> 2:0.16621114723289654
>>
>> 3:0.48628049787614236
>>
>> 4:0.476991470215116
>>
>> 5:0.5033640235789054
>>
>> 6:0.09257098597507829
>>
>> 7:0.3153088440494892
>>
>> 8:0.8807426085223242
>>
>> 9:0.2809625780570739
>>
>> 10:0.9584880094505738
>>
>> 11:0.38521222520661547
>>
>> 12:0.5114241334425228
>>
>> 13:0.9524628903835111
>>
>> 14:0.5252549496842003
>>
>> 15:0.5732037830866236
>>
>> 16:0.8632451606583632
>>
>> 17:0.39754347061499895
>>
>> 18:0.2859522809981715
>>
>> 19:0.2659002343432888
>>
>

Re: JavaRDD.foreach (new VoidFunction<>...) always returns the last element

Posted by Sean Owen <so...@cloudera.com>.
Why are you converting to RDD and back to JavaRDD?
The problem is storing references to Writable, which are mutated by the
InputFormat. Somewhere you have 1000 refs to the same key. I think it may
be the persist. You want to immediately transform these values to something
besides a Writable.

On Mon, Jul 25, 2016, 18:50 Jia Zou <ja...@gmail.com> wrote:

>
> My code is as following:
>
>                 System.out.println("Initialize points...");
>
>                 JavaPairRDD<IntWritable, DoubleArrayWritable> data =
>
>                                 sc.sequenceFile(inputFile, IntWritable.
> class, DoubleArrayWritable.class);
>
>                 RDD<Tuple2<IntWritable, DoubleArrayWritable>> rdd =
>
>                                 JavaPairRDD.toRDD(data);
>
>                 JavaRDD<Tuple2<IntWritable, DoubleArrayWritable>> points =
> JavaRDD.fromRDD(rdd, data.classTag());
>
>                 points.persist(StorageLevel.MEMORY_ONLY());
>
>                 int i;
>
>
>               for (i=0; i<iterations; i++) {
>
>                         System.out.println("iteration="+i);
>
>                         //points.foreach(new
> ForEachMapPointToCluster(numDimensions, numClusters));
>
>                         points.foreach(new
> VoidFunction<Tuple2<IntWritable, DoubleArrayWritable>>() {
>
>                             public void call(Tuple2<IntWritable,
> DoubleArrayWritable> tuple) {
>
>                                 IntWritable key = tuple._1();
>
>                                 System.out.println("key:"+key.get());
>
>                                 DoubleArrayWritable array = tuple._2();
>
>                                 double[] point = array.getData();
>
>                                 for (int d = 0; d < 20; d ++) {
>
>                                     System.out.println(d+":"+point[d]);
>
>                                 }
>
>                             }
>
>                         });
>
>                 }
>
>
> The output is a lot of following, only the last element in the rdd has
> been output.
>
> key:999
>
> 0:0.9953839426689233
>
> 1:0.12656798341145892
>
> 2:0.16621114723289654
>
> 3:0.48628049787614236
>
> 4:0.476991470215116
>
> 5:0.5033640235789054
>
> 6:0.09257098597507829
>
> 7:0.3153088440494892
>
> 8:0.8807426085223242
>
> 9:0.2809625780570739
>
> 10:0.9584880094505738
>
> 11:0.38521222520661547
>
> 12:0.5114241334425228
>
> 13:0.9524628903835111
>
> 14:0.5252549496842003
>
> 15:0.5732037830866236
>
> 16:0.8632451606583632
>
> 17:0.39754347061499895
>
> 18:0.2859522809981715
>
> 19:0.2659002343432888
>
> key:999
>
> 0:0.9953839426689233
>
> 1:0.12656798341145892
>
> 2:0.16621114723289654
>
> 3:0.48628049787614236
>
> 4:0.476991470215116
>
> 5:0.5033640235789054
>
> 6:0.09257098597507829
>
> 7:0.3153088440494892
>
> 8:0.8807426085223242
>
> 9:0.2809625780570739
>
> 10:0.9584880094505738
>
> 11:0.38521222520661547
>
> 12:0.5114241334425228
>
> 13:0.9524628903835111
>
> 14:0.5252549496842003
>
> 15:0.5732037830866236
>
> 16:0.8632451606583632
>
> 17:0.39754347061499895
>
> 18:0.2859522809981715
>
> 19:0.2659002343432888
>
> key:999
>
> 0:0.9953839426689233
>
> 1:0.12656798341145892
>
> 2:0.16621114723289654
>
> 3:0.48628049787614236
>
> 4:0.476991470215116
>
> 5:0.5033640235789054
>
> 6:0.09257098597507829
>
> 7:0.3153088440494892
>
> 8:0.8807426085223242
>
> 9:0.2809625780570739
>
> 10:0.9584880094505738
>
> 11:0.38521222520661547
>
> 12:0.5114241334425228
>
> 13:0.9524628903835111
>
> 14:0.5252549496842003
>
> 15:0.5732037830866236
>
> 16:0.8632451606583632
>
> 17:0.39754347061499895
>
> 18:0.2859522809981715
>
> 19:0.2659002343432888
>