You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Devender Yadav <de...@impetus.co.in> on 2017/04/24 17:36:18 UTC

Arraylist is empty after JavaRDD.foreach

Hi All,


I am using Spark 1.6.2 and Java 7.


Sample json (total 100 records):

{"name":"dev","salary":10000,"occupation":"engg","address":"noida"}

{"name":"karthik","salary":20000,"occupation":"engg","address":"noida"}

Useful code:

   final List<Map<String,String>> jsonData = new ArrayList<>();

   DataFrame df =  sqlContext.read().json("file:///home/dev/data-json/emp.json");
   JavaRDD<String> rdd = df.repartition(1).toJSON().toJavaRDD();

   rdd.foreach(new VoidFunction<String>() {
       @Override
       public void call(String line)  {
           try {
               jsonData.add (new ObjectMapper().readValue(line, Map.class));
               System.out.println(Thread.currentThread().getName());
               System.out.println("List size: "+jsonData.size());
           } catch (IOException e) {
               e.printStackTrace();
           }
       }
   });

   System.out.println(Thread.currentThread().getName());
   System.out.println("List size: "+jsonData.size());

jsonData List is empty in the end.


Output:

Executor task launch worker-1
List size: 1
Executor task launch worker-1
List size: 2
Executor task launch worker-1
List size: 3
.
.
.
Executor task launch worker-1
List size: 100

main
List size: 0



Regards,
Devender

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

Re: Arraylist is empty after JavaRDD.foreach

Posted by Devender Yadav <de...@impetus.co.in>.
Hi Franke,


I want to convert DataFrame to JSON String.


Regards,
Devender
________________________________
From: Jörn Franke <jo...@gmail.com>
Sent: Monday, April 24, 2017 11:15:08 PM
To: Devender Yadav
Cc: user@spark.apache.org
Subject: Re: Arraylist is empty after JavaRDD<String>.foreach

I am not sure what you try to achieve here. You should never use the arraylist as you use it here as a global variable (an anti-pattern). Why don't you use the count function of the dataframe?

On 24. Apr 2017, at 19:36, Devender Yadav <de...@impetus.co.in>> wrote:


Hi All,


I am using Spark 1.6.2 and Java 7.


Sample json (total 100 records):

{"name":"dev","salary":10000,"occupation":"engg","address":"noida"}

{"name":"karthik","salary":20000,"occupation":"engg","address":"noida"}

Useful code:

   final List<Map<String,String>> jsonData = new ArrayList<>();

   DataFrame df =  sqlContext.read().json("file:///home/dev/data-json/emp.json");
   JavaRDD<String> rdd = df.repartition(1).toJSON().toJavaRDD();

   rdd.foreach(new VoidFunction<String>() {
       @Override
       public void call(String line)  {
           try {
               jsonData.add (new ObjectMapper().readValue(line, Map.class));
               System.out.println(Thread.currentThread().getName());
               System.out.println("List size: "+jsonData.size());
           } catch (IOException e) {
               e.printStackTrace();
           }
       }
   });

   System.out.println(Thread.currentThread().getName());
   System.out.println("List size: "+jsonData.size());

jsonData List is empty in the end.


Output:

Executor task launch worker-1
List size: 1
Executor task launch worker-1
List size: 2
Executor task launch worker-1
List size: 3
.
.
.
Executor task launch worker-1
List size: 100

main
List size: 0



Regards,
Devender

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

Re: Arraylist is empty after JavaRDD.foreach

Posted by Jörn Franke <jo...@gmail.com>.
I am not sure what you try to achieve here. You should never use the arraylist as you use it here as a global variable (an anti-pattern). Why don't you use the count function of the dataframe?

> On 24. Apr 2017, at 19:36, Devender Yadav <de...@impetus.co.in> wrote:
> 
> Hi All,
> 
> 
> I am using Spark 1.6.2 and Java 7.
> 
> Sample json (total 100 records):
> 
> {"name":"dev","salary":10000,"occupation":"engg","address":"noida"}
> 
> {"name":"karthik","salary":20000,"occupation":"engg","address":"noida"}
> 
> Useful code:
> 
>    final List<Map<String,String>> jsonData = new ArrayList<>();
> 
>    DataFrame df =  sqlContext.read().json("file:///home/dev/data-json/emp.json");
>    JavaRDD<String> rdd = df.repartition(1).toJSON().toJavaRDD(); 
> 
>    rdd.foreach(new VoidFunction<String>() {
>        @Override
>        public void call(String line)  {
>            try {
>                jsonData.add (new ObjectMapper().readValue(line, Map.class));
>                System.out.println(Thread.currentThread().getName());
>                System.out.println("List size: "+jsonData.size());
>            } catch (IOException e) {
>                e.printStackTrace();
>            }
>        }
>    });
> 
>    System.out.println(Thread.currentThread().getName());
>    System.out.println("List size: "+jsonData.size());
> jsonData List is empty in the end. 
> 
> Output:
> 
> Executor task launch worker-1
> List size: 1
> Executor task launch worker-1
> List size: 2
> Executor task launch worker-1
> List size: 3
> .
> .
> .
> Executor task launch worker-1
> List size: 100
> 
> main
> List size: 0
> 
> 
> Regards,
> Devender
> 
> 
> 
> 
> 
> 
> 
> NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

Re: Arraylist is empty after JavaRDD.foreach

Posted by Michael Armbrust <mi...@databricks.com>.
Foreach runs on the executors and so is not able to modify an array list
that is only present on the driver.  You should just call collectAsList on
the DataFrame.

On Mon, Apr 24, 2017 at 10:36 AM, Devender Yadav <
devender.yadav@impetus.co.in> wrote:

> Hi All,
>
>
> I am using Spark 1.6.2 and Java 7.
>
>
> *Sample json* (total 100 records):
>
> {"name":"dev","salary":10000,"occupation":"engg","address":"noida"}
>
> {"name":"karthik","salary":20000,"occupation":"engg","address":"noida"}
>
> *Useful code:*
>
>    final List<Map<String,String>> jsonData = new ArrayList<>();
>
>    DataFrame df =  sqlContext.read().json("file:///home/dev/data-json/emp.json");
>    JavaRDD<String> rdd = df.repartition(1).toJSON().toJavaRDD();
>
>    rdd.foreach(new VoidFunction<String>() {
>        @Override
>        public void call(String line)  {
>            try {
>                jsonData.add (new ObjectMapper().readValue(line, Map.class));
>                System.out.println(Thread.currentThread().getName());
>                System.out.println("List size: "+jsonData.size());
>            } catch (IOException e) {
>                e.printStackTrace();
>            }
>        }
>    });
>
>    System.out.println(Thread.currentThread().getName());
>    System.out.println("List size: "+jsonData.size());
>
> jsonData List is empty in the end.
>
>
> Output:
>
> Executor task launch worker-1List size: 1Executor task launch worker-1List size: 2Executor task launch worker-1List size: 3...Executor task launch worker-1List size: 100
>
> mainList size: 0
>
>
>
> Regards,
> Devender
>
> ------------------------------
>
>
>
>
>
>
> NOTE: This message may contain information that is confidential,
> proprietary, privileged or otherwise protected by law. The message is
> intended solely for the named addressee. If received in error, please
> destroy and notify the sender. Any use of this email is prohibited when
> received in error. Impetus does not represent, warrant and/or guarantee,
> that the integrity of this communication has been maintained nor that the
> communication is free of errors, virus, interception or interference.
>