You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Ashish Soni <as...@gmail.com> on 2015/07/02 02:43:17 UTC

DataFrame Filter Inside Another Data Frame Map

Hi All  ,

I am not sure what is the wrong with below code as it give below error when
i access inside the map but it works outside

JavaRDD<Charge> rdd2 = rdd.map(new Function<Charge, Charge>() {

            @Override
            public Charge call(Charge ch) throws Exception {


               * DataFrame df = accountRdd.filter("login=test");*

                return ch;
            }

        });

5/07/01 20:38:08 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.NullPointerException
    at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:129)
    at org.apache.spark.sql.DataFrame.org
$apache$spark$sql$DataFrame$$logicalPlanToDataFrame(DataFrame.scala:154)

Re: DataFrame Filter Inside Another Data Frame Map

Posted by Raghavendra Pandey <ra...@gmail.com>.
You can collect the dataframe as array n then create map out of it...,
On Jul 2, 2015 9:23 AM, <as...@gmail.com> wrote:

> Any example how can i return a Hashmap from data frame ?
>
> Thanks ,
> Ashish
>
> On Jul 1, 2015, at 11:34 PM, Holden Karau <ho...@pigscanfly.ca> wrote:
>
> Collecting it as a regular (Java/scala/Python) map. You can also broadcast
> the map if your going to use it multiple times.
>
> On Wednesday, July 1, 2015, Ashish Soni <as...@gmail.com> wrote:
>
>> Thanks , So if i load some static data from database and then i need to
>> use than in my map function to filter records what will be the best way to
>> do it,
>>
>> Ashish
>>
>> On Wed, Jul 1, 2015 at 10:45 PM, Raghavendra Pandey <
>> raghavendra.pandey@gmail.com> wrote:
>>
>>> You cannot refer to one rdd inside another rdd.map function...
>>> Rdd object is not serialiable. Whatever objects you use inside map
>>> function  should be serializable as they get transferred to executor nodes.
>>> On Jul 2, 2015 6:13 AM, "Ashish Soni" <as...@gmail.com> wrote:
>>>
>>>> Hi All  ,
>>>>
>>>> I am not sure what is the wrong with below code as it give below error
>>>> when i access inside the map but it works outside
>>>>
>>>> JavaRDD<Charge> rdd2 = rdd.map(new Function<Charge, Charge>() {
>>>>
>>>>
>>>>             @Override
>>>>             public Charge call(Charge ch) throws Exception {
>>>>
>>>>
>>>>                * DataFrame df = accountRdd.filter("login=test");*
>>>>
>>>>                 return ch;
>>>>             }
>>>>
>>>>         });
>>>>
>>>> 5/07/01 20:38:08 ERROR Executor: Exception in task 0.0 in stage 0.0
>>>> (TID 0)
>>>> java.lang.NullPointerException
>>>>     at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:129)
>>>>     at org.apache.spark.sql.DataFrame.org
>>>> $apache$spark$sql$DataFrame$$logicalPlanToDataFrame(DataFrame.scala:154)
>>>>
>>>
>>
>
> --
> Cell : 425-233-8271
> Twitter: https://twitter.com/holdenkarau
> Linked In: https://www.linkedin.com/in/holdenkarau
>
>

Re: DataFrame Filter Inside Another Data Frame Map

Posted by as...@gmail.com.
Any example how can i return a Hashmap from data frame ?

Thanks ,
Ashish

> On Jul 1, 2015, at 11:34 PM, Holden Karau <ho...@pigscanfly.ca> wrote:
> 
> Collecting it as a regular (Java/scala/Python) map. You can also broadcast the map if your going to use it multiple times.
> 
>> On Wednesday, July 1, 2015, Ashish Soni <as...@gmail.com> wrote:
>> Thanks , So if i load some static data from database and then i need to use than in my map function to filter records what will be the best way to do it,
>> 
>> Ashish
>> 
>>> On Wed, Jul 1, 2015 at 10:45 PM, Raghavendra Pandey <ra...@gmail.com> wrote:
>>> You cannot refer to one rdd inside another rdd.map function...
>>> Rdd object is not serialiable. Whatever objects you use inside map function  should be serializable as they get transferred to executor nodes.
>>> 
>>>> On Jul 2, 2015 6:13 AM, "Ashish Soni" <as...@gmail.com> wrote:
>>>> Hi All  , 
>>>> 
>>>> I am not sure what is the wrong with below code as it give below error when i access inside the map but it works outside
>>>> 
>>>> JavaRDD<Charge> rdd2 = rdd.map(new Function<Charge, Charge>() {          
>>>>             
>>>>             @Override
>>>>             public Charge call(Charge ch) throws Exception {
>>>> 
>>>> 
>>>>                 DataFrame df = accountRdd.filter("login=test");
>>>> 
>>>>                 return ch;
>>>>             }
>>>>             
>>>>         });
>>>> 
>>>> 5/07/01 20:38:08 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
>>>> java.lang.NullPointerException
>>>>     at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:129)
>>>>     at org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$logicalPlanToDataFrame(DataFrame.scala:154)
> 
> 
> -- 
> Cell : 425-233-8271
> Twitter: https://twitter.com/holdenkarau
> Linked In: https://www.linkedin.com/in/holdenkarau
> 

Re: DataFrame Filter Inside Another Data Frame Map

Posted by Holden Karau <ho...@pigscanfly.ca>.
Collecting it as a regular (Java/scala/Python) map. You can also broadcast
the map if your going to use it multiple times.

On Wednesday, July 1, 2015, Ashish Soni <as...@gmail.com> wrote:

> Thanks , So if i load some static data from database and then i need to
> use than in my map function to filter records what will be the best way to
> do it,
>
> Ashish
>
> On Wed, Jul 1, 2015 at 10:45 PM, Raghavendra Pandey <
> raghavendra.pandey@gmail.com
> <javascript:_e(%7B%7D,'cvml','raghavendra.pandey@gmail.com');>> wrote:
>
>> You cannot refer to one rdd inside another rdd.map function...
>> Rdd object is not serialiable. Whatever objects you use inside map
>> function  should be serializable as they get transferred to executor nodes.
>> On Jul 2, 2015 6:13 AM, "Ashish Soni" <asoni.learn@gmail.com
>> <javascript:_e(%7B%7D,'cvml','asoni.learn@gmail.com');>> wrote:
>>
>>> Hi All  ,
>>>
>>> I am not sure what is the wrong with below code as it give below error
>>> when i access inside the map but it works outside
>>>
>>> JavaRDD<Charge> rdd2 = rdd.map(new Function<Charge, Charge>() {
>>>
>>>             @Override
>>>             public Charge call(Charge ch) throws Exception {
>>>
>>>
>>>                * DataFrame df = accountRdd.filter("login=test");*
>>>
>>>                 return ch;
>>>             }
>>>
>>>         });
>>>
>>> 5/07/01 20:38:08 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID
>>> 0)
>>> java.lang.NullPointerException
>>>     at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:129)
>>>     at org.apache.spark.sql.DataFrame.org
>>> $apache$spark$sql$DataFrame$$logicalPlanToDataFrame(DataFrame.scala:154)
>>>
>>
>

-- 
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau
Linked In: https://www.linkedin.com/in/holdenkarau

Re: DataFrame Filter Inside Another Data Frame Map

Posted by Ashish Soni <as...@gmail.com>.
Thanks , So if i load some static data from database and then i need to use
than in my map function to filter records what will be the best way to do
it,

Ashish

On Wed, Jul 1, 2015 at 10:45 PM, Raghavendra Pandey <
raghavendra.pandey@gmail.com> wrote:

> You cannot refer to one rdd inside another rdd.map function...
> Rdd object is not serialiable. Whatever objects you use inside map
> function  should be serializable as they get transferred to executor nodes.
> On Jul 2, 2015 6:13 AM, "Ashish Soni" <as...@gmail.com> wrote:
>
>> Hi All  ,
>>
>> I am not sure what is the wrong with below code as it give below error
>> when i access inside the map but it works outside
>>
>> JavaRDD<Charge> rdd2 = rdd.map(new Function<Charge, Charge>() {
>>
>>             @Override
>>             public Charge call(Charge ch) throws Exception {
>>
>>
>>                * DataFrame df = accountRdd.filter("login=test");*
>>
>>                 return ch;
>>             }
>>
>>         });
>>
>> 5/07/01 20:38:08 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID
>> 0)
>> java.lang.NullPointerException
>>     at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:129)
>>     at org.apache.spark.sql.DataFrame.org
>> $apache$spark$sql$DataFrame$$logicalPlanToDataFrame(DataFrame.scala:154)
>>
>

Re: DataFrame Filter Inside Another Data Frame Map

Posted by Raghavendra Pandey <ra...@gmail.com>.
You cannot refer to one rdd inside another rdd.map function...
Rdd object is not serialiable. Whatever objects you use inside map
function  should be serializable as they get transferred to executor nodes.
On Jul 2, 2015 6:13 AM, "Ashish Soni" <as...@gmail.com> wrote:

> Hi All  ,
>
> I am not sure what is the wrong with below code as it give below error
> when i access inside the map but it works outside
>
> JavaRDD<Charge> rdd2 = rdd.map(new Function<Charge, Charge>() {
>
>             @Override
>             public Charge call(Charge ch) throws Exception {
>
>
>                * DataFrame df = accountRdd.filter("login=test");*
>
>                 return ch;
>             }
>
>         });
>
> 5/07/01 20:38:08 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
> java.lang.NullPointerException
>     at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:129)
>     at org.apache.spark.sql.DataFrame.org
> $apache$spark$sql$DataFrame$$logicalPlanToDataFrame(DataFrame.scala:154)
>