You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by gpatcham <gp...@gmail.com> on 2016/01/18 21:29:10 UTC

using spark context in map funciton TASk not serilizable error

Hi,

I have a use case where I need to pass sparkcontext in map function 

reRDD.map(row =>method1(row,sc)).saveAsTextFile(outputDir)

Method1 needs spark context to query cassandra. But I see below error

java.io.NotSerializableException: org.apache.spark.SparkContext

Is there a way we can fix this ?

Thanks



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/using-spark-context-in-map-funciton-TASk-not-serilizable-error-tp25998.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: using spark context in map funciton TASk not serilizable error

Posted by Ted Yu <yu...@gmail.com>.

class SQLContext private[sql](
    @transient val sparkContext: SparkContext,
    @transient protected[sql] val cacheManager: CacheManager,
    @transient private[sql] val listener: SQLListener,
    val isRootContext: Boolean)
  extends org.apache.spark.Logging with Serializable {

FYI

On Mon, Jan 18, 2016 at 1:44 PM, Giri P <gp...@gmail.com> wrote:

> yes I tried doing that but that doesn't work.
>
> I'm looking at using SQLContext and dataframes. Is SQLCOntext serializable?
>
> On Mon, Jan 18, 2016 at 1:29 PM, Ted Yu <yu...@gmail.com> wrote:
>
>> Did you mean constructing SparkContext on the worker nodes ?
>>
>> Not sure whether that would work.
>>
>> Doesn't seem to be good practice.
>>
>> On Mon, Jan 18, 2016 at 1:27 PM, Giri P <gp...@gmail.com> wrote:
>>
>>> Can we use @transient ?
>>>
>>>
>>> On Mon, Jan 18, 2016 at 12:44 PM, Giri P <gp...@gmail.com> wrote:
>>>
>>>> I'm using spark cassandra connector to do this and the way we access
>>>> cassandra table is
>>>>
>>>> sc.cassandraTable("keySpace", "tableName")
>>>>
>>>> Thanks
>>>> Giri
>>>>
>>>> On Mon, Jan 18, 2016 at 12:37 PM, Ted Yu <yu...@gmail.com> wrote:
>>>>
>>>>> Can you pass the properties which are needed for accessing Cassandra
>>>>> without going through SparkContext ?
>>>>>
>>>>> SparkContext isn't designed to be used in the way illustrated below.
>>>>>
>>>>> Cheers
>>>>>
>>>>> On Mon, Jan 18, 2016 at 12:29 PM, gpatcham <gp...@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I have a use case where I need to pass sparkcontext in map function
>>>>>>
>>>>>> reRDD.map(row =>method1(row,sc)).saveAsTextFile(outputDir)
>>>>>>
>>>>>> Method1 needs spark context to query cassandra. But I see below error
>>>>>>
>>>>>> java.io.NotSerializableException: org.apache.spark.SparkContext
>>>>>>
>>>>>> Is there a way we can fix this ?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/using-spark-context-in-map-funciton-TASk-not-serilizable-error-tp25998.html
>>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>>> Nabble.com.
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: using spark context in map funciton TASk not serilizable error

Posted by Giri P <gp...@gmail.com>.

yes I tried doing that but that doesn't work.

I'm looking at using SQLContext and dataframes. Is SQLCOntext serializable?

On Mon, Jan 18, 2016 at 1:29 PM, Ted Yu <yu...@gmail.com> wrote:

> Did you mean constructing SparkContext on the worker nodes ?
>
> Not sure whether that would work.
>
> Doesn't seem to be good practice.
>
> On Mon, Jan 18, 2016 at 1:27 PM, Giri P <gp...@gmail.com> wrote:
>
>> Can we use @transient ?
>>
>>
>> On Mon, Jan 18, 2016 at 12:44 PM, Giri P <gp...@gmail.com> wrote:
>>
>>> I'm using spark cassandra connector to do this and the way we access
>>> cassandra table is
>>>
>>> sc.cassandraTable("keySpace", "tableName")
>>>
>>> Thanks
>>> Giri
>>>
>>> On Mon, Jan 18, 2016 at 12:37 PM, Ted Yu <yu...@gmail.com> wrote:
>>>
>>>> Can you pass the properties which are needed for accessing Cassandra
>>>> without going through SparkContext ?
>>>>
>>>> SparkContext isn't designed to be used in the way illustrated below.
>>>>
>>>> Cheers
>>>>
>>>> On Mon, Jan 18, 2016 at 12:29 PM, gpatcham <gp...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I have a use case where I need to pass sparkcontext in map function
>>>>>
>>>>> reRDD.map(row =>method1(row,sc)).saveAsTextFile(outputDir)
>>>>>
>>>>> Method1 needs spark context to query cassandra. But I see below error
>>>>>
>>>>> java.io.NotSerializableException: org.apache.spark.SparkContext
>>>>>
>>>>> Is there a way we can fix this ?
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/using-spark-context-in-map-funciton-TASk-not-serilizable-error-tp25998.html
>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>> Nabble.com.
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: using spark context in map funciton TASk not serilizable error

Posted by Ted Yu <yu...@gmail.com>.

Did you mean constructing SparkContext on the worker nodes ?

Not sure whether that would work.

Doesn't seem to be good practice.

On Mon, Jan 18, 2016 at 1:27 PM, Giri P <gp...@gmail.com> wrote:

> Can we use @transient ?
>
>
> On Mon, Jan 18, 2016 at 12:44 PM, Giri P <gp...@gmail.com> wrote:
>
>> I'm using spark cassandra connector to do this and the way we access
>> cassandra table is
>>
>> sc.cassandraTable("keySpace", "tableName")
>>
>> Thanks
>> Giri
>>
>> On Mon, Jan 18, 2016 at 12:37 PM, Ted Yu <yu...@gmail.com> wrote:
>>
>>> Can you pass the properties which are needed for accessing Cassandra
>>> without going through SparkContext ?
>>>
>>> SparkContext isn't designed to be used in the way illustrated below.
>>>
>>> Cheers
>>>
>>> On Mon, Jan 18, 2016 at 12:29 PM, gpatcham <gp...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have a use case where I need to pass sparkcontext in map function
>>>>
>>>> reRDD.map(row =>method1(row,sc)).saveAsTextFile(outputDir)
>>>>
>>>> Method1 needs spark context to query cassandra. But I see below error
>>>>
>>>> java.io.NotSerializableException: org.apache.spark.SparkContext
>>>>
>>>> Is there a way we can fix this ?
>>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/using-spark-context-in-map-funciton-TASk-not-serilizable-error-tp25998.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>
>>>>
>>>
>>
>

Re: using spark context in map funciton TASk not serilizable error

Posted by Giri P <gp...@gmail.com>.

Can we use @transient ?


On Mon, Jan 18, 2016 at 12:44 PM, Giri P <gp...@gmail.com> wrote:

> I'm using spark cassandra connector to do this and the way we access
> cassandra table is
>
> sc.cassandraTable("keySpace", "tableName")
>
> Thanks
> Giri
>
> On Mon, Jan 18, 2016 at 12:37 PM, Ted Yu <yu...@gmail.com> wrote:
>
>> Can you pass the properties which are needed for accessing Cassandra
>> without going through SparkContext ?
>>
>> SparkContext isn't designed to be used in the way illustrated below.
>>
>> Cheers
>>
>> On Mon, Jan 18, 2016 at 12:29 PM, gpatcham <gp...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I have a use case where I need to pass sparkcontext in map function
>>>
>>> reRDD.map(row =>method1(row,sc)).saveAsTextFile(outputDir)
>>>
>>> Method1 needs spark context to query cassandra. But I see below error
>>>
>>> java.io.NotSerializableException: org.apache.spark.SparkContext
>>>
>>> Is there a way we can fix this ?
>>>
>>> Thanks
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/using-spark-context-in-map-funciton-TASk-not-serilizable-error-tp25998.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: user-help@spark.apache.org
>>>
>>>
>>
>

Re: using spark context in map funciton TASk not serilizable error

Posted by Giri P <gp...@gmail.com>.

I'm using spark cassandra connector to do this and the way we access
cassandra table is

sc.cassandraTable("keySpace", "tableName")

Thanks
Giri

On Mon, Jan 18, 2016 at 12:37 PM, Ted Yu <yu...@gmail.com> wrote:

> Can you pass the properties which are needed for accessing Cassandra
> without going through SparkContext ?
>
> SparkContext isn't designed to be used in the way illustrated below.
>
> Cheers
>
> On Mon, Jan 18, 2016 at 12:29 PM, gpatcham <gp...@gmail.com> wrote:
>
>> Hi,
>>
>> I have a use case where I need to pass sparkcontext in map function
>>
>> reRDD.map(row =>method1(row,sc)).saveAsTextFile(outputDir)
>>
>> Method1 needs spark context to query cassandra. But I see below error
>>
>> java.io.NotSerializableException: org.apache.spark.SparkContext
>>
>> Is there a way we can fix this ?
>>
>> Thanks
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/using-spark-context-in-map-funciton-TASk-not-serilizable-error-tp25998.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>

Re: using spark context in map funciton TASk not serilizable error

Posted by Ted Yu <yu...@gmail.com>.

Can you pass the properties which are needed for accessing Cassandra
without going through SparkContext ?

SparkContext isn't designed to be used in the way illustrated below.

Cheers

On Mon, Jan 18, 2016 at 12:29 PM, gpatcham <gp...@gmail.com> wrote:

> Hi,
>
> I have a use case where I need to pass sparkcontext in map function
>
> reRDD.map(row =>method1(row,sc)).saveAsTextFile(outputDir)
>
> Method1 needs spark context to query cassandra. But I see below error
>
> java.io.NotSerializableException: org.apache.spark.SparkContext
>
> Is there a way we can fix this ?
>
> Thanks
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/using-spark-context-in-map-funciton-TASk-not-serilizable-error-tp25998.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Re: using spark context in map funciton TASk not serilizable error

Posted by Giri P <gp...@gmail.com>.

method1 looks like this

reRDD.map(row =>method1(row,sc)).saveAsTextFile(outputDir)

reRDD has userId's

def method1(sc:SparkContext , userId: string){
sc.cassandraTable("Keyspace", "Table2").where("userid = ?" userId)
...do something

return "Test"
}

On Wed, Jan 20, 2016 at 11:00 AM, Shixiong(Ryan) Zhu <
shixiong@databricks.com> wrote:

> You should not use SparkContext or RDD directly in your closures.
>
> Could you show the codes of "method1"? Maybe you only needs join or
> something else. E.g.,
>
> val cassandraRDD = sc.cassandraTable("keySpace", "tableName")
> reRDD.join(cassandraRDD).map(....).saveAsTextFile(outputDir)
>
>
> On Tue, Jan 19, 2016 at 4:12 AM, Ricardo Paiva <
> ricardo.paiva@corp.globo.com> wrote:
>
>> Did you try SparkContext.getOrCreate() ?
>>
>> You don't need to pass the sparkContext to the map function, you can
>> retrieve it from the SparkContext singleton.
>>
>> Regards,
>>
>> Ricardo
>>
>>
>> On Mon, Jan 18, 2016 at 6:29 PM, gpatcham [via Apache Spark User List] <[hidden
>> email] <http:///user/SendEmail.jtp?type=node&node=26006&i=0>> wrote:
>>
>>> Hi,
>>>
>>> I have a use case where I need to pass sparkcontext in map function
>>>
>>> reRDD.map(row =>method1(row,sc)).saveAsTextFile(outputDir)
>>>
>>> Method1 needs spark context to query cassandra. But I see below error
>>>
>>> java.io.NotSerializableException: org.apache.spark.SparkContext
>>>
>>> Is there a way we can fix this ?
>>>
>>> Thanks
>>>
>>> ------------------------------
>>> If you reply to this email, your message will be added to the discussion
>>> below:
>>>
>>> http://apache-spark-user-list.1001560.n3.nabble.com/using-spark-context-in-map-funciton-TASk-not-serilizable-error-tp25998.html
>>> To start a new topic under Apache Spark User List, email [hidden email]
>>> <http:///user/SendEmail.jtp?type=node&node=26006&i=1>
>>> To unsubscribe from Apache Spark User List, click here.
>>> NAML
>>> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>>
>>
>>
>>
>> --
>> Ricardo Paiva
>> Big Data
>> *globo.com* <http://www.globo.com>
>>
>> ------------------------------
>> View this message in context: Re: using spark context in map funciton
>> TASk not serilizable error
>> <http://apache-spark-user-list.1001560.n3.nabble.com/using-spark-context-in-map-funciton-TASk-not-serilizable-error-tp25998p26006.html>
>>
>> Sent from the Apache Spark User List mailing list archive
>> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>>
>
>

Re: using spark context in map funciton TASk not serilizable error

Posted by "Shixiong(Ryan) Zhu" <sh...@databricks.com>.

You should not use SparkContext or RDD directly in your closures.

Could you show the codes of "method1"? Maybe you only needs join or
something else. E.g.,

val cassandraRDD = sc.cassandraTable("keySpace", "tableName")
reRDD.join(cassandraRDD).map(....).saveAsTextFile(outputDir)


On Tue, Jan 19, 2016 at 4:12 AM, Ricardo Paiva <ricardo.paiva@corp.globo.com
> wrote:

> Did you try SparkContext.getOrCreate() ?
>
> You don't need to pass the sparkContext to the map function, you can
> retrieve it from the SparkContext singleton.
>
> Regards,
>
> Ricardo
>
>
> On Mon, Jan 18, 2016 at 6:29 PM, gpatcham [via Apache Spark User List] <[hidden
> email] <http:///user/SendEmail.jtp?type=node&node=26006&i=0>> wrote:
>
>> Hi,
>>
>> I have a use case where I need to pass sparkcontext in map function
>>
>> reRDD.map(row =>method1(row,sc)).saveAsTextFile(outputDir)
>>
>> Method1 needs spark context to query cassandra. But I see below error
>>
>> java.io.NotSerializableException: org.apache.spark.SparkContext
>>
>> Is there a way we can fix this ?
>>
>> Thanks
>>
>> ------------------------------
>> If you reply to this email, your message will be added to the discussion
>> below:
>>
>> http://apache-spark-user-list.1001560.n3.nabble.com/using-spark-context-in-map-funciton-TASk-not-serilizable-error-tp25998.html
>> To start a new topic under Apache Spark User List, email [hidden email]
>> <http:///user/SendEmail.jtp?type=node&node=26006&i=1>
>> To unsubscribe from Apache Spark User List, click here.
>> NAML
>> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>
>
>
>
> --
> Ricardo Paiva
> Big Data
> *globo.com* <http://www.globo.com>
>
> ------------------------------
> View this message in context: Re: using spark context in map funciton
> TASk not serilizable error
> <http://apache-spark-user-list.1001560.n3.nabble.com/using-spark-context-in-map-funciton-TASk-not-serilizable-error-tp25998p26006.html>
>
> Sent from the Apache Spark User List mailing list archive
> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>

Re: using spark context in map funciton TASk not serilizable error

Posted by Ricardo Paiva <ri...@corp.globo.com>.

Did you try SparkContext.getOrCreate() ?

You don't need to pass the sparkContext to the map function, you can
retrieve it from the SparkContext singleton.

Regards,

Ricardo


On Mon, Jan 18, 2016 at 6:29 PM, gpatcham [via Apache Spark User List] <
ml-node+s1001560n25998h26@n3.nabble.com> wrote:

> Hi,
>
> I have a use case where I need to pass sparkcontext in map function
>
> reRDD.map(row =>method1(row,sc)).saveAsTextFile(outputDir)
>
> Method1 needs spark context to query cassandra. But I see below error
>
> java.io.NotSerializableException: org.apache.spark.SparkContext
>
> Is there a way we can fix this ?
>
> Thanks
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/using-spark-context-in-map-funciton-TASk-not-serilizable-error-tp25998.html
> To start a new topic under Apache Spark User List, email
> ml-node+s1001560n1h19@n3.nabble.com
> To unsubscribe from Apache Spark User List, click here
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=cmljYXJkby5wYWl2YUBjb3JwLmdsb2JvLmNvbXwxfDQ1MDcxMTc2Mw==>
> .
> NAML
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>



-- 
Ricardo Paiva
Big Data
*globo.com* <http://www.globo.com>




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/using-spark-context-in-map-funciton-TASk-not-serilizable-error-tp25998p26006.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.