You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Kevin Burton <bu...@spinn3r.com> on 2014/12/27 21:23:05 UTC

init / shutdown for complex map job?

I have a job where I want to map over all data in a cassandra database.

I’m then selectively sending things to my own external system (ActiveMQ) if
the item matches criteria.

The problem is that I need to do some init and shutdown.  Basically on init
I need to create ActiveMQ connections and on shutdown I need to close them
or daemon threads will be left running.

What’s the best way to accomplish this. I could find it after I RTFMd…(but
perhaps I missed  it)

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>

Re: init / shutdown for complex map job?

Posted by Kevin Burton <bu...@spinn3r.com>.
Yes.  I can do a just in time init… I can see that the first map was done.

However, I can’t see that the last map was done I think.. and the shutdown
is the key part.  Without it all my daemon threads won’t properly exit and
I will not have all messages sent over the wire.

On Sun, Dec 28, 2014 at 12:18 AM, Akhil Das <ak...@sigmoidanalytics.com>
wrote:

> Something like?
>
> val a = myRDD.mapPartitions(p => {
>
>
>
>             //Do the init
>
>             //Perform some operations
>
>             //Shut it down?
>
>          })
>
>
>
> Thanks
> Best Regards
>
> On Sun, Dec 28, 2014 at 1:53 AM, Kevin Burton <bu...@spinn3r.com> wrote:
>
>> I have a job where I want to map over all data in a cassandra database.
>>
>> I’m then selectively sending things to my own external system (ActiveMQ)
>> if the item matches criteria.
>>
>> The problem is that I need to do some init and shutdown.  Basically on
>> init I need to create ActiveMQ connections and on shutdown I need to close
>> them or daemon threads will be left running.
>>
>> What’s the best way to accomplish this. I could find it after I
>> RTFMd…(but perhaps I missed  it)
>>
>> --
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> <https://plus.google.com/102718274791889610666/posts>
>> <http://spinn3r.com>
>>
>>
>


-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>

Re: init / shutdown for complex map job?

Posted by Sean Owen <so...@cloudera.com>.
(Still pending, but believe it's in progress and being written by a
colleague here.)

On Sun, Dec 28, 2014 at 2:41 PM, Ray Melton <rt...@gmail.com> wrote:
> A follow-up to the blog cited below was hinted at, per "But Wait,
> There's More ... To keep this post brief, the remainder will be left to
> a follow-up post."
>
> Is this follow-up pending?  Is it sort of pending?  Did the follow-up
> happen, but I just couldn't find it on the web?
>
> Regards, Ray.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: init / shutdown for complex map job?

Posted by Ray Melton <rt...@gmail.com>.
A follow-up to the blog cited below was hinted at, per "But Wait,
There's More ... To keep this post brief, the remainder will be left to
a follow-up post."

Is this follow-up pending?  Is it sort of pending?  Did the follow-up
happen, but I just couldn't find it on the web?

Regards, Ray.


On Sun, 28 Dec 2014 08:54:13 +0000
Sean Owen <so...@cloudera.com> wrote:

> You can't quite do cleanup in mapPartitions in that way. Here is a
> bit more explanation (farther down):
> http://blog.cloudera.com/blog/2014/09/how-to-translate-from-mapreduce-to-apache-spark/
> On Dec 28, 2014 8:18 AM, "Akhil Das" <ak...@sigmoidanalytics.com>
> wrote:
> 
> > Something like?
> >
> > val a = myRDD.mapPartitions(p => {
> >
> >
> >
> >             //Do the init
> >
> >             //Perform some operations
> >
> >             //Shut it down?
> >
> >          })
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: init / shutdown for complex map job?

Posted by Sean Owen <so...@cloudera.com>.
You can't quite do cleanup in mapPartitions in that way. Here is a bit more
explanation (farther down):
http://blog.cloudera.com/blog/2014/09/how-to-translate-from-mapreduce-to-apache-spark/
On Dec 28, 2014 8:18 AM, "Akhil Das" <ak...@sigmoidanalytics.com> wrote:

> Something like?
>
> val a = myRDD.mapPartitions(p => {
>
>
>
>             //Do the init
>
>             //Perform some operations
>
>             //Shut it down?
>
>          })
>
>
>
> Thanks
> Best Regards
>
> On Sun, Dec 28, 2014 at 1:53 AM, Kevin Burton <bu...@spinn3r.com> wrote:
>
>> I have a job where I want to map over all data in a cassandra database.
>>
>> I’m then selectively sending things to my own external system (ActiveMQ)
>> if the item matches criteria.
>>
>> The problem is that I need to do some init and shutdown.  Basically on
>> init I need to create ActiveMQ connections and on shutdown I need to close
>> them or daemon threads will be left running.
>>
>> What’s the best way to accomplish this. I could find it after I
>> RTFMd…(but perhaps I missed  it)
>>
>> --
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> <https://plus.google.com/102718274791889610666/posts>
>> <http://spinn3r.com>
>>
>>
>

Re: init / shutdown for complex map job?

Posted by Akhil Das <ak...@sigmoidanalytics.com>.
Something like?

val a = myRDD.mapPartitions(p => {



            //Do the init

            //Perform some operations

            //Shut it down?

         })



Thanks
Best Regards

On Sun, Dec 28, 2014 at 1:53 AM, Kevin Burton <bu...@spinn3r.com> wrote:

> I have a job where I want to map over all data in a cassandra database.
>
> I’m then selectively sending things to my own external system (ActiveMQ)
> if the item matches criteria.
>
> The problem is that I need to do some init and shutdown.  Basically on
> init I need to create ActiveMQ connections and on shutdown I need to close
> them or daemon threads will be left running.
>
> What’s the best way to accomplish this. I could find it after I RTFMd…(but
> perhaps I missed  it)
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> <https://plus.google.com/102718274791889610666/posts>
> <http://spinn3r.com>
>
>