You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Priya Ch <le...@gmail.com> on 2015/09/21 13:27:10 UTC

Re: passing SparkContext as parameter

can i use this sparkContext on executors ??
In my application, i have scenario of reading from db for certain records
in rdd. Hence I need sparkContext to read from DB (cassandra in our case),

If sparkContext couldn't be sent to executors , what is the workaround for
this ??????

On Mon, Sep 21, 2015 at 3:06 PM, Petr Novak <os...@gmail.com> wrote:

> add @transient?
>
> On Mon, Sep 21, 2015 at 11:27 AM, Priya Ch <le...@gmail.com>
> wrote:
>
>> Hello All,
>>
>>     How can i pass sparkContext as a parameter to a method in an object.
>> Because passing sparkContext is giving me TaskNotSerializable Exception.
>>
>> How can i achieve this ?
>>
>> Thanks,
>> Padma Ch
>>
>
>

Re: passing SparkContext as parameter

Posted by Romi Kuntsman <ro...@totango.com>.
sparkConext is available on the driver, not on executors.

To read from Cassandra, you can use something like this:
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/2_loading.md

*Romi Kuntsman*, *Big Data Engineer*
http://www.totango.com

On Mon, Sep 21, 2015 at 2:27 PM, Priya Ch <le...@gmail.com>
wrote:

> can i use this sparkContext on executors ??
> In my application, i have scenario of reading from db for certain records
> in rdd. Hence I need sparkContext to read from DB (cassandra in our case),
>
> If sparkContext couldn't be sent to executors , what is the workaround for
> this ??????
>
> On Mon, Sep 21, 2015 at 3:06 PM, Petr Novak <os...@gmail.com> wrote:
>
>> add @transient?
>>
>> On Mon, Sep 21, 2015 at 11:27 AM, Priya Ch <le...@gmail.com>
>> wrote:
>>
>>> Hello All,
>>>
>>>     How can i pass sparkContext as a parameter to a method in an object.
>>> Because passing sparkContext is giving me TaskNotSerializable Exception.
>>>
>>> How can i achieve this ?
>>>
>>> Thanks,
>>> Padma Ch
>>>
>>
>>
>

Re: passing SparkContext as parameter

Posted by Petr Novak <os...@gmail.com>.
And probably the original source code
https://gist.github.com/koen-dejonghe/39c10357607c698c0b04

On Tue, Sep 22, 2015 at 10:37 AM, Petr Novak <os...@gmail.com> wrote:

> To complete design pattern:
>
> http://stackoverflow.com/questions/30450763/spark-streaming-and-connection-pool-implementation
>
> Petr
>
> On Mon, Sep 21, 2015 at 10:02 PM, Romi Kuntsman <ro...@totango.com> wrote:
>
>> Cody, that's a great reference!
>> As shown there - the best way to connect to an external database from the
>> workers is to create a connection pool on (each) worker.
>> The driver mass pass, via broadcast, the connection string, but not the
>> connect object itself and not the spark context.
>>
>> On Mon, Sep 21, 2015 at 5:31 PM Cody Koeninger <co...@koeninger.org>
>> wrote:
>>
>>> That isn't accurate, I think you're confused about foreach.
>>>
>>> Look at
>>>
>>>
>>> http://spark.apache.org/docs/latest/streaming-programming-guide.html#design-patterns-for-using-foreachrdd
>>>
>>>
>>> On Mon, Sep 21, 2015 at 7:36 AM, Romi Kuntsman <ro...@totango.com> wrote:
>>>
>>>> foreach is something that runs on the driver, not the workers.
>>>>
>>>> if you want to perform some function on each record from cassandra, you
>>>> need to do cassandraRdd.map(func), which will run distributed on the spark
>>>> workers
>>>>
>>>> *Romi Kuntsman*, *Big Data Engineer*
>>>> http://www.totango.com
>>>>
>>>> On Mon, Sep 21, 2015 at 3:29 PM, Priya Ch <learnings.chitturi@gmail.com
>>>> > wrote:
>>>>
>>>>> Yes, but i need to read from cassandra db within a spark
>>>>> transformation..something like..
>>>>>
>>>>> dstream.forachRDD{
>>>>>
>>>>> rdd=> rdd.foreach {
>>>>>  message =>
>>>>>      sc.cassandraTable()
>>>>>       .
>>>>>       .
>>>>>       .
>>>>>     }
>>>>> }
>>>>>
>>>>> Since rdd.foreach gets executed on workers, how can i make
>>>>> sparkContext available on workers ???
>>>>>
>>>>> Regards,
>>>>> Padma Ch
>>>>>
>>>>> On Mon, Sep 21, 2015 at 5:10 PM, Ted Yu <yu...@gmail.com> wrote:
>>>>>
>>>>>> You can use broadcast variable for passing connection information.
>>>>>>
>>>>>> Cheers
>>>>>>
>>>>>> On Sep 21, 2015, at 4:27 AM, Priya Ch <le...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> can i use this sparkContext on executors ??
>>>>>> In my application, i have scenario of reading from db for certain
>>>>>> records in rdd. Hence I need sparkContext to read from DB (cassandra in our
>>>>>> case),
>>>>>>
>>>>>> If sparkContext couldn't be sent to executors , what is the
>>>>>> workaround for this ??????
>>>>>>
>>>>>> On Mon, Sep 21, 2015 at 3:06 PM, Petr Novak <os...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> add @transient?
>>>>>>>
>>>>>>> On Mon, Sep 21, 2015 at 11:27 AM, Priya Ch <
>>>>>>> learnings.chitturi@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hello All,
>>>>>>>>
>>>>>>>>     How can i pass sparkContext as a parameter to a method in an
>>>>>>>> object. Because passing sparkContext is giving me TaskNotSerializable
>>>>>>>> Exception.
>>>>>>>>
>>>>>>>> How can i achieve this ?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Padma Ch
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>

Re: passing SparkContext as parameter

Posted by Petr Novak <os...@gmail.com>.
To complete design pattern:
http://stackoverflow.com/questions/30450763/spark-streaming-and-connection-pool-implementation

Petr

On Mon, Sep 21, 2015 at 10:02 PM, Romi Kuntsman <ro...@totango.com> wrote:

> Cody, that's a great reference!
> As shown there - the best way to connect to an external database from the
> workers is to create a connection pool on (each) worker.
> The driver mass pass, via broadcast, the connection string, but not the
> connect object itself and not the spark context.
>
> On Mon, Sep 21, 2015 at 5:31 PM Cody Koeninger <co...@koeninger.org> wrote:
>
>> That isn't accurate, I think you're confused about foreach.
>>
>> Look at
>>
>>
>> http://spark.apache.org/docs/latest/streaming-programming-guide.html#design-patterns-for-using-foreachrdd
>>
>>
>> On Mon, Sep 21, 2015 at 7:36 AM, Romi Kuntsman <ro...@totango.com> wrote:
>>
>>> foreach is something that runs on the driver, not the workers.
>>>
>>> if you want to perform some function on each record from cassandra, you
>>> need to do cassandraRdd.map(func), which will run distributed on the spark
>>> workers
>>>
>>> *Romi Kuntsman*, *Big Data Engineer*
>>> http://www.totango.com
>>>
>>> On Mon, Sep 21, 2015 at 3:29 PM, Priya Ch <le...@gmail.com>
>>> wrote:
>>>
>>>> Yes, but i need to read from cassandra db within a spark
>>>> transformation..something like..
>>>>
>>>> dstream.forachRDD{
>>>>
>>>> rdd=> rdd.foreach {
>>>>  message =>
>>>>      sc.cassandraTable()
>>>>       .
>>>>       .
>>>>       .
>>>>     }
>>>> }
>>>>
>>>> Since rdd.foreach gets executed on workers, how can i make sparkContext
>>>> available on workers ???
>>>>
>>>> Regards,
>>>> Padma Ch
>>>>
>>>> On Mon, Sep 21, 2015 at 5:10 PM, Ted Yu <yu...@gmail.com> wrote:
>>>>
>>>>> You can use broadcast variable for passing connection information.
>>>>>
>>>>> Cheers
>>>>>
>>>>> On Sep 21, 2015, at 4:27 AM, Priya Ch <le...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> can i use this sparkContext on executors ??
>>>>> In my application, i have scenario of reading from db for certain
>>>>> records in rdd. Hence I need sparkContext to read from DB (cassandra in our
>>>>> case),
>>>>>
>>>>> If sparkContext couldn't be sent to executors , what is the workaround
>>>>> for this ??????
>>>>>
>>>>> On Mon, Sep 21, 2015 at 3:06 PM, Petr Novak <os...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> add @transient?
>>>>>>
>>>>>> On Mon, Sep 21, 2015 at 11:27 AM, Priya Ch <
>>>>>> learnings.chitturi@gmail.com> wrote:
>>>>>>
>>>>>>> Hello All,
>>>>>>>
>>>>>>>     How can i pass sparkContext as a parameter to a method in an
>>>>>>> object. Because passing sparkContext is giving me TaskNotSerializable
>>>>>>> Exception.
>>>>>>>
>>>>>>> How can i achieve this ?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Padma Ch
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>

Re: passing SparkContext as parameter

Posted by Priya Ch <le...@gmail.com>.
Suppose I use rdd.joinWithCassnadra("keySpace", "table1"), does this do a
full table scan which is not required at any cost ????

On Tue, Sep 22, 2015 at 3:03 PM, Artem Aliev <ar...@gmail.com> wrote:

> All that code should looks like:
> stream.filter(...).map(x=>(key,
> ....)).joinWithCassandra(....).map(...).saveToCassandra(....)
>
> I'm not sure about exactly 10 messages, spark streaming focus on time not
> count..
>
>
> On Tue, Sep 22, 2015 at 2:14 AM, Priya Ch <le...@gmail.com>
> wrote:
>
>> I have scenario like this -
>>
>>  I read dstream of messages from kafka. Now if my rdd contains 10
>> messages, for each message I need to query the cassandraDB, do some
>> modification and update the records in DB. If there is no option of passing
>> sparkContext to workers to read.write into DB, the only option is to use
>> CassandraConnextor.withSession ???? If yes, for writing to table, should i
>> construct the entire INSERT statement for thousands of fields in the DB ?
>> Is this way of writing code is an optimized way ???
>>
>> On Tue, Sep 22, 2015 at 1:32 AM, Romi Kuntsman <ro...@totango.com> wrote:
>>
>>> Cody, that's a great reference!
>>> As shown there - the best way to connect to an external database from
>>> the workers is to create a connection pool on (each) worker.
>>> The driver mass pass, via broadcast, the connection string, but not the
>>> connect object itself and not the spark context.
>>>
>>> On Mon, Sep 21, 2015 at 5:31 PM Cody Koeninger <co...@koeninger.org>
>>> wrote:
>>>
>>>> That isn't accurate, I think you're confused about foreach.
>>>>
>>>> Look at
>>>>
>>>>
>>>> http://spark.apache.org/docs/latest/streaming-programming-guide.html#design-patterns-for-using-foreachrdd
>>>>
>>>>
>>>> On Mon, Sep 21, 2015 at 7:36 AM, Romi Kuntsman <ro...@totango.com>
>>>> wrote:
>>>>
>>>>> foreach is something that runs on the driver, not the workers.
>>>>>
>>>>> if you want to perform some function on each record from cassandra,
>>>>> you need to do cassandraRdd.map(func), which will run distributed on the
>>>>> spark workers
>>>>>
>>>>> *Romi Kuntsman*, *Big Data Engineer*
>>>>> http://www.totango.com
>>>>>
>>>>> On Mon, Sep 21, 2015 at 3:29 PM, Priya Ch <
>>>>> learnings.chitturi@gmail.com> wrote:
>>>>>
>>>>>> Yes, but i need to read from cassandra db within a spark
>>>>>> transformation..something like..
>>>>>>
>>>>>> dstream.forachRDD{
>>>>>>
>>>>>> rdd=> rdd.foreach {
>>>>>>  message =>
>>>>>>      sc.cassandraTable()
>>>>>>       .
>>>>>>       .
>>>>>>       .
>>>>>>     }
>>>>>> }
>>>>>>
>>>>>> Since rdd.foreach gets executed on workers, how can i make
>>>>>> sparkContext available on workers ???
>>>>>>
>>>>>> Regards,
>>>>>> Padma Ch
>>>>>>
>>>>>> On Mon, Sep 21, 2015 at 5:10 PM, Ted Yu <yu...@gmail.com> wrote:
>>>>>>
>>>>>>> You can use broadcast variable for passing connection information.
>>>>>>>
>>>>>>> Cheers
>>>>>>>
>>>>>>> On Sep 21, 2015, at 4:27 AM, Priya Ch <le...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> can i use this sparkContext on executors ??
>>>>>>> In my application, i have scenario of reading from db for certain
>>>>>>> records in rdd. Hence I need sparkContext to read from DB (cassandra in our
>>>>>>> case),
>>>>>>>
>>>>>>> If sparkContext couldn't be sent to executors , what is the
>>>>>>> workaround for this ??????
>>>>>>>
>>>>>>> On Mon, Sep 21, 2015 at 3:06 PM, Petr Novak <os...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> add @transient?
>>>>>>>>
>>>>>>>> On Mon, Sep 21, 2015 at 11:27 AM, Priya Ch <
>>>>>>>> learnings.chitturi@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hello All,
>>>>>>>>>
>>>>>>>>>     How can i pass sparkContext as a parameter to a method in an
>>>>>>>>> object. Because passing sparkContext is giving me TaskNotSerializable
>>>>>>>>> Exception.
>>>>>>>>>
>>>>>>>>> How can i achieve this ?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Padma Ch
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to spark-connector-user+unsubscribe@lists.datastax.com.
>>
>
> To unsubscribe from this group and stop receiving emails from it, send an
> email to spark-connector-user+unsubscribe@lists.datastax.com.
>

Re: passing SparkContext as parameter

Posted by Priya Ch <le...@gmail.com>.
I have scenario like this -

 I read dstream of messages from kafka. Now if my rdd contains 10 messages,
for each message I need to query the cassandraDB, do some modification and
update the records in DB. If there is no option of passing sparkContext to
workers to read.write into DB, the only option is to use
CassandraConnextor.withSession ???? If yes, for writing to table, should i
construct the entire INSERT statement for thousands of fields in the DB ?
Is this way of writing code is an optimized way ???

On Tue, Sep 22, 2015 at 1:32 AM, Romi Kuntsman <ro...@totango.com> wrote:

> Cody, that's a great reference!
> As shown there - the best way to connect to an external database from the
> workers is to create a connection pool on (each) worker.
> The driver mass pass, via broadcast, the connection string, but not the
> connect object itself and not the spark context.
>
> On Mon, Sep 21, 2015 at 5:31 PM Cody Koeninger <co...@koeninger.org> wrote:
>
>> That isn't accurate, I think you're confused about foreach.
>>
>> Look at
>>
>>
>> http://spark.apache.org/docs/latest/streaming-programming-guide.html#design-patterns-for-using-foreachrdd
>>
>>
>> On Mon, Sep 21, 2015 at 7:36 AM, Romi Kuntsman <ro...@totango.com> wrote:
>>
>>> foreach is something that runs on the driver, not the workers.
>>>
>>> if you want to perform some function on each record from cassandra, you
>>> need to do cassandraRdd.map(func), which will run distributed on the spark
>>> workers
>>>
>>> *Romi Kuntsman*, *Big Data Engineer*
>>> http://www.totango.com
>>>
>>> On Mon, Sep 21, 2015 at 3:29 PM, Priya Ch <le...@gmail.com>
>>> wrote:
>>>
>>>> Yes, but i need to read from cassandra db within a spark
>>>> transformation..something like..
>>>>
>>>> dstream.forachRDD{
>>>>
>>>> rdd=> rdd.foreach {
>>>>  message =>
>>>>      sc.cassandraTable()
>>>>       .
>>>>       .
>>>>       .
>>>>     }
>>>> }
>>>>
>>>> Since rdd.foreach gets executed on workers, how can i make sparkContext
>>>> available on workers ???
>>>>
>>>> Regards,
>>>> Padma Ch
>>>>
>>>> On Mon, Sep 21, 2015 at 5:10 PM, Ted Yu <yu...@gmail.com> wrote:
>>>>
>>>>> You can use broadcast variable for passing connection information.
>>>>>
>>>>> Cheers
>>>>>
>>>>> On Sep 21, 2015, at 4:27 AM, Priya Ch <le...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> can i use this sparkContext on executors ??
>>>>> In my application, i have scenario of reading from db for certain
>>>>> records in rdd. Hence I need sparkContext to read from DB (cassandra in our
>>>>> case),
>>>>>
>>>>> If sparkContext couldn't be sent to executors , what is the workaround
>>>>> for this ??????
>>>>>
>>>>> On Mon, Sep 21, 2015 at 3:06 PM, Petr Novak <os...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> add @transient?
>>>>>>
>>>>>> On Mon, Sep 21, 2015 at 11:27 AM, Priya Ch <
>>>>>> learnings.chitturi@gmail.com> wrote:
>>>>>>
>>>>>>> Hello All,
>>>>>>>
>>>>>>>     How can i pass sparkContext as a parameter to a method in an
>>>>>>> object. Because passing sparkContext is giving me TaskNotSerializable
>>>>>>> Exception.
>>>>>>>
>>>>>>> How can i achieve this ?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Padma Ch
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>

Re: passing SparkContext as parameter

Posted by Romi Kuntsman <ro...@totango.com>.
Cody, that's a great reference!
As shown there - the best way to connect to an external database from the
workers is to create a connection pool on (each) worker.
The driver mass pass, via broadcast, the connection string, but not the
connect object itself and not the spark context.

On Mon, Sep 21, 2015 at 5:31 PM Cody Koeninger <co...@koeninger.org> wrote:

> That isn't accurate, I think you're confused about foreach.
>
> Look at
>
>
> http://spark.apache.org/docs/latest/streaming-programming-guide.html#design-patterns-for-using-foreachrdd
>
>
> On Mon, Sep 21, 2015 at 7:36 AM, Romi Kuntsman <ro...@totango.com> wrote:
>
>> foreach is something that runs on the driver, not the workers.
>>
>> if you want to perform some function on each record from cassandra, you
>> need to do cassandraRdd.map(func), which will run distributed on the spark
>> workers
>>
>> *Romi Kuntsman*, *Big Data Engineer*
>> http://www.totango.com
>>
>> On Mon, Sep 21, 2015 at 3:29 PM, Priya Ch <le...@gmail.com>
>> wrote:
>>
>>> Yes, but i need to read from cassandra db within a spark
>>> transformation..something like..
>>>
>>> dstream.forachRDD{
>>>
>>> rdd=> rdd.foreach {
>>>  message =>
>>>      sc.cassandraTable()
>>>       .
>>>       .
>>>       .
>>>     }
>>> }
>>>
>>> Since rdd.foreach gets executed on workers, how can i make sparkContext
>>> available on workers ???
>>>
>>> Regards,
>>> Padma Ch
>>>
>>> On Mon, Sep 21, 2015 at 5:10 PM, Ted Yu <yu...@gmail.com> wrote:
>>>
>>>> You can use broadcast variable for passing connection information.
>>>>
>>>> Cheers
>>>>
>>>> On Sep 21, 2015, at 4:27 AM, Priya Ch <le...@gmail.com>
>>>> wrote:
>>>>
>>>> can i use this sparkContext on executors ??
>>>> In my application, i have scenario of reading from db for certain
>>>> records in rdd. Hence I need sparkContext to read from DB (cassandra in our
>>>> case),
>>>>
>>>> If sparkContext couldn't be sent to executors , what is the workaround
>>>> for this ??????
>>>>
>>>> On Mon, Sep 21, 2015 at 3:06 PM, Petr Novak <os...@gmail.com>
>>>> wrote:
>>>>
>>>>> add @transient?
>>>>>
>>>>> On Mon, Sep 21, 2015 at 11:27 AM, Priya Ch <
>>>>> learnings.chitturi@gmail.com> wrote:
>>>>>
>>>>>> Hello All,
>>>>>>
>>>>>>     How can i pass sparkContext as a parameter to a method in an
>>>>>> object. Because passing sparkContext is giving me TaskNotSerializable
>>>>>> Exception.
>>>>>>
>>>>>> How can i achieve this ?
>>>>>>
>>>>>> Thanks,
>>>>>> Padma Ch
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: passing SparkContext as parameter

Posted by Cody Koeninger <co...@koeninger.org>.
That isn't accurate, I think you're confused about foreach.

Look at

http://spark.apache.org/docs/latest/streaming-programming-guide.html#design-patterns-for-using-foreachrdd


On Mon, Sep 21, 2015 at 7:36 AM, Romi Kuntsman <ro...@totango.com> wrote:

> foreach is something that runs on the driver, not the workers.
>
> if you want to perform some function on each record from cassandra, you
> need to do cassandraRdd.map(func), which will run distributed on the spark
> workers
>
> *Romi Kuntsman*, *Big Data Engineer*
> http://www.totango.com
>
> On Mon, Sep 21, 2015 at 3:29 PM, Priya Ch <le...@gmail.com>
> wrote:
>
>> Yes, but i need to read from cassandra db within a spark
>> transformation..something like..
>>
>> dstream.forachRDD{
>>
>> rdd=> rdd.foreach {
>>  message =>
>>      sc.cassandraTable()
>>       .
>>       .
>>       .
>>     }
>> }
>>
>> Since rdd.foreach gets executed on workers, how can i make sparkContext
>> available on workers ???
>>
>> Regards,
>> Padma Ch
>>
>> On Mon, Sep 21, 2015 at 5:10 PM, Ted Yu <yu...@gmail.com> wrote:
>>
>>> You can use broadcast variable for passing connection information.
>>>
>>> Cheers
>>>
>>> On Sep 21, 2015, at 4:27 AM, Priya Ch <le...@gmail.com>
>>> wrote:
>>>
>>> can i use this sparkContext on executors ??
>>> In my application, i have scenario of reading from db for certain
>>> records in rdd. Hence I need sparkContext to read from DB (cassandra in our
>>> case),
>>>
>>> If sparkContext couldn't be sent to executors , what is the workaround
>>> for this ??????
>>>
>>> On Mon, Sep 21, 2015 at 3:06 PM, Petr Novak <os...@gmail.com>
>>> wrote:
>>>
>>>> add @transient?
>>>>
>>>> On Mon, Sep 21, 2015 at 11:27 AM, Priya Ch <
>>>> learnings.chitturi@gmail.com> wrote:
>>>>
>>>>> Hello All,
>>>>>
>>>>>     How can i pass sparkContext as a parameter to a method in an
>>>>> object. Because passing sparkContext is giving me TaskNotSerializable
>>>>> Exception.
>>>>>
>>>>> How can i achieve this ?
>>>>>
>>>>> Thanks,
>>>>> Padma Ch
>>>>>
>>>>
>>>>
>>>
>>
>

Re: passing SparkContext as parameter

Posted by Romi Kuntsman <ro...@totango.com>.
foreach is something that runs on the driver, not the workers.

if you want to perform some function on each record from cassandra, you
need to do cassandraRdd.map(func), which will run distributed on the spark
workers

*Romi Kuntsman*, *Big Data Engineer*
http://www.totango.com

On Mon, Sep 21, 2015 at 3:29 PM, Priya Ch <le...@gmail.com>
wrote:

> Yes, but i need to read from cassandra db within a spark
> transformation..something like..
>
> dstream.forachRDD{
>
> rdd=> rdd.foreach {
>  message =>
>      sc.cassandraTable()
>       .
>       .
>       .
>     }
> }
>
> Since rdd.foreach gets executed on workers, how can i make sparkContext
> available on workers ???
>
> Regards,
> Padma Ch
>
> On Mon, Sep 21, 2015 at 5:10 PM, Ted Yu <yu...@gmail.com> wrote:
>
>> You can use broadcast variable for passing connection information.
>>
>> Cheers
>>
>> On Sep 21, 2015, at 4:27 AM, Priya Ch <le...@gmail.com>
>> wrote:
>>
>> can i use this sparkContext on executors ??
>> In my application, i have scenario of reading from db for certain records
>> in rdd. Hence I need sparkContext to read from DB (cassandra in our case),
>>
>> If sparkContext couldn't be sent to executors , what is the workaround
>> for this ??????
>>
>> On Mon, Sep 21, 2015 at 3:06 PM, Petr Novak <os...@gmail.com> wrote:
>>
>>> add @transient?
>>>
>>> On Mon, Sep 21, 2015 at 11:27 AM, Priya Ch <learnings.chitturi@gmail.com
>>> > wrote:
>>>
>>>> Hello All,
>>>>
>>>>     How can i pass sparkContext as a parameter to a method in an
>>>> object. Because passing sparkContext is giving me TaskNotSerializable
>>>> Exception.
>>>>
>>>> How can i achieve this ?
>>>>
>>>> Thanks,
>>>> Padma Ch
>>>>
>>>
>>>
>>
>

Re: passing SparkContext as parameter

Posted by Priya Ch <le...@gmail.com>.
Yes, but i need to read from cassandra db within a spark
transformation..something like..

dstream.forachRDD{

rdd=> rdd.foreach {
 message =>
     sc.cassandraTable()
      .
      .
      .
    }
}

Since rdd.foreach gets executed on workers, how can i make sparkContext
available on workers ???

Regards,
Padma Ch

On Mon, Sep 21, 2015 at 5:10 PM, Ted Yu <yu...@gmail.com> wrote:

> You can use broadcast variable for passing connection information.
>
> Cheers
>
> On Sep 21, 2015, at 4:27 AM, Priya Ch <le...@gmail.com>
> wrote:
>
> can i use this sparkContext on executors ??
> In my application, i have scenario of reading from db for certain records
> in rdd. Hence I need sparkContext to read from DB (cassandra in our case),
>
> If sparkContext couldn't be sent to executors , what is the workaround for
> this ??????
>
> On Mon, Sep 21, 2015 at 3:06 PM, Petr Novak <os...@gmail.com> wrote:
>
>> add @transient?
>>
>> On Mon, Sep 21, 2015 at 11:27 AM, Priya Ch <le...@gmail.com>
>> wrote:
>>
>>> Hello All,
>>>
>>>     How can i pass sparkContext as a parameter to a method in an object.
>>> Because passing sparkContext is giving me TaskNotSerializable Exception.
>>>
>>> How can i achieve this ?
>>>
>>> Thanks,
>>> Padma Ch
>>>
>>
>>
>

Re: passing SparkContext as parameter

Posted by Ted Yu <yu...@gmail.com>.
You can use broadcast variable for passing connection information. 

Cheers

> On Sep 21, 2015, at 4:27 AM, Priya Ch <le...@gmail.com> wrote:
> 
> can i use this sparkContext on executors ??
> In my application, i have scenario of reading from db for certain records in rdd. Hence I need sparkContext to read from DB (cassandra in our case),
> 
> If sparkContext couldn't be sent to executors , what is the workaround for this ??????
> 
>> On Mon, Sep 21, 2015 at 3:06 PM, Petr Novak <os...@gmail.com> wrote:
>> add @transient?
>> 
>>> On Mon, Sep 21, 2015 at 11:27 AM, Priya Ch <le...@gmail.com> wrote:
>>> Hello All,
>>> 
>>>     How can i pass sparkContext as a parameter to a method in an object. Because passing sparkContext is giving me TaskNotSerializable Exception.
>>> 
>>> How can i achieve this ?
>>> 
>>> Thanks,
>>> Padma Ch
> 

Re: passing SparkContext as parameter

Posted by Romi Kuntsman <ro...@totango.com>.
sparkConext is available on the driver, not on executors.

To read from Cassandra, you can use something like this:
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/2_loading.md

*Romi Kuntsman*, *Big Data Engineer*
http://www.totango.com

On Mon, Sep 21, 2015 at 2:27 PM, Priya Ch <le...@gmail.com>
wrote:

> can i use this sparkContext on executors ??
> In my application, i have scenario of reading from db for certain records
> in rdd. Hence I need sparkContext to read from DB (cassandra in our case),
>
> If sparkContext couldn't be sent to executors , what is the workaround for
> this ??????
>
> On Mon, Sep 21, 2015 at 3:06 PM, Petr Novak <os...@gmail.com> wrote:
>
>> add @transient?
>>
>> On Mon, Sep 21, 2015 at 11:27 AM, Priya Ch <le...@gmail.com>
>> wrote:
>>
>>> Hello All,
>>>
>>>     How can i pass sparkContext as a parameter to a method in an object.
>>> Because passing sparkContext is giving me TaskNotSerializable Exception.
>>>
>>> How can i achieve this ?
>>>
>>> Thanks,
>>> Padma Ch
>>>
>>
>>
>

Re: passing SparkContext as parameter

Posted by Ted Yu <yu...@gmail.com>.
You can use broadcast variable for passing connection information. 

Cheers

> On Sep 21, 2015, at 4:27 AM, Priya Ch <le...@gmail.com> wrote:
> 
> can i use this sparkContext on executors ??
> In my application, i have scenario of reading from db for certain records in rdd. Hence I need sparkContext to read from DB (cassandra in our case),
> 
> If sparkContext couldn't be sent to executors , what is the workaround for this ??????
> 
>> On Mon, Sep 21, 2015 at 3:06 PM, Petr Novak <os...@gmail.com> wrote:
>> add @transient?
>> 
>>> On Mon, Sep 21, 2015 at 11:27 AM, Priya Ch <le...@gmail.com> wrote:
>>> Hello All,
>>> 
>>>     How can i pass sparkContext as a parameter to a method in an object. Because passing sparkContext is giving me TaskNotSerializable Exception.
>>> 
>>> How can i achieve this ?
>>> 
>>> Thanks,
>>> Padma Ch
>