You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Patrick Thompson <pa...@standingwaiting.com> on 2013/12/05 20:10:28 UTC

Where and when are StoreFuncInterface functions called

It's not clear from the docs where the various StoreFuncInterface functions
get called. There are some hints in the API
docs<http://pig.apache.org/docs/r0.12.0/api/>,
but I am left wondering, does pig guarantee that, for example, putNext and
cleanUpOnSuccess will be called in the same execution context?

Is this documented somewhere? Maybe someone can provide an answer? It would
save me a lot of time experimenting and spelunking in the code.

Thanks

Patrick

Re: Where and when are StoreFuncInterface functions called

Posted by Patrick Thompson <pa...@standingwaiting.com>.
Thanks - that helps a lot - I believe I can figure it out from there.

Patrick


On Sun, Dec 15, 2013 at 6:12 PM, Cheolsoo Park <pi...@gmail.com> wrote:

> Hi Patrick,
>
> I think what you need is
> OutputCommitter#commitTask()<
> http://hadoop.apache.org/docs/r1.1.2/api/org/apache/hadoop/mapreduce/OutputCommitter.html#commitTask(org.apache.hadoop.mapreduce.TaskAttemptContext)
> >.
> This is called by Hadoop in each task process, so you can write your own
> OutputCommitter class and associate it with your StoreFunc. Then you can
> make a single call to your DB for the batched output per task.
>
> If you're looking for a way to do some final work per job, you will have to
> rely on either commitJob()<
> http://hadoop.apache.org/docs/r1.1.2/api/org/apache/hadoop/mapreduce/OutputCommitter.html#commitJob(org.apache.hadoop.mapreduce.JobContext)
> >
> or
> cleanUpOnSuccess(). But again, these are not called by the task process. I
> am not sure what context you want to share between putNext() and
> cleanUpOnSuccess(). But JobConf object will be constructed on the frontend
> before launching MR jobs, and properties in this JobConf object will be
> available everywhere. However, you won't be able to update some properties
> in putNext() and see them in cleanUpOnSuccess(). Hope this is clear.
>
> Thanks,
> Cheolsoo
>
>
>
> On Sun, Dec 15, 2013 at 7:11 AM, Patrick Thompson <
> patrick@standingwaiting.com> wrote:
>
> > So is there a good way to flush a buffer accumulated by putNext? I was
> > hoping it was possible in cleanUpOnSuccess, but that apparently isn't
> going
> > to work. This is horrible for something talking to a store such as MySql,
> > as it means you have to do updates one-at-a-time.
> >
> > Patrick
> >
> >
> > On Sun, Dec 15, 2013 at 12:41 AM, Cheolsoo Park <piaozhexiu@gmail.com
> > >wrote:
> >
> > > >> putNext and cleanUpOnSuccess will be called in the same execution
> > > context?
> > >
> > > putNext() is called on the backend during the job execution, whereas
> > > cleanUpOnSuccess() is called on the frontend after the job is finished.
> > So
> > > they won't be executed by the same object. From the comment, I also
> doubt
> > > that you can share properties between them via JobConf.
> > >
> > > See MapReduceLauncher.java as for how cleanUpOnSuccess() is used.
> > >
> > > On Thu, Dec 5, 2013 at 11:10 AM, Patrick Thompson <
> > > patrick@standingwaiting.com> wrote:
> > >
> > > > It's not clear from the docs where the various StoreFuncInterface
> > > functions
> > > > get called. There are some hints in the API
> > > > docs<http://pig.apache.org/docs/r0.12.0/api/>,
> > > > but I am left wondering, does pig guarantee that, for example,
> putNext
> > > and
> > > > cleanUpOnSuccess will be called in the same execution context?
> > > >
> > > > Is this documented somewhere? Maybe someone can provide an answer? It
> > > would
> > > > save me a lot of time experimenting and spelunking in the code.
> > > >
> > > > Thanks
> > > >
> > > > Patrick
> > > >
> > >
> >
> >
> >
> > --
> > fun and games - a blog <http://funazonki.blogspot.com/>, a word
> > game<http://1.whatwouldwho.appspot.com/wwws.html>and
> > CanCan <http://www.standingwaiting.com/CanCan/Game.html>
> >
>



-- 
fun and games - a blog <http://funazonki.blogspot.com/>, a word
game<http://1.whatwouldwho.appspot.com/wwws.html>and
CanCan <http://www.standingwaiting.com/CanCan/Game.html>

Re: Where and when are StoreFuncInterface functions called

Posted by Cheolsoo Park <pi...@gmail.com>.
Hi Patrick,

I think what you need is
OutputCommitter#commitTask()<http://hadoop.apache.org/docs/r1.1.2/api/org/apache/hadoop/mapreduce/OutputCommitter.html#commitTask(org.apache.hadoop.mapreduce.TaskAttemptContext)>.
This is called by Hadoop in each task process, so you can write your own
OutputCommitter class and associate it with your StoreFunc. Then you can
make a single call to your DB for the batched output per task.

If you're looking for a way to do some final work per job, you will have to
rely on either commitJob()<http://hadoop.apache.org/docs/r1.1.2/api/org/apache/hadoop/mapreduce/OutputCommitter.html#commitJob(org.apache.hadoop.mapreduce.JobContext)>
or
cleanUpOnSuccess(). But again, these are not called by the task process. I
am not sure what context you want to share between putNext() and
cleanUpOnSuccess(). But JobConf object will be constructed on the frontend
before launching MR jobs, and properties in this JobConf object will be
available everywhere. However, you won't be able to update some properties
in putNext() and see them in cleanUpOnSuccess(). Hope this is clear.

Thanks,
Cheolsoo



On Sun, Dec 15, 2013 at 7:11 AM, Patrick Thompson <
patrick@standingwaiting.com> wrote:

> So is there a good way to flush a buffer accumulated by putNext? I was
> hoping it was possible in cleanUpOnSuccess, but that apparently isn't going
> to work. This is horrible for something talking to a store such as MySql,
> as it means you have to do updates one-at-a-time.
>
> Patrick
>
>
> On Sun, Dec 15, 2013 at 12:41 AM, Cheolsoo Park <piaozhexiu@gmail.com
> >wrote:
>
> > >> putNext and cleanUpOnSuccess will be called in the same execution
> > context?
> >
> > putNext() is called on the backend during the job execution, whereas
> > cleanUpOnSuccess() is called on the frontend after the job is finished.
> So
> > they won't be executed by the same object. From the comment, I also doubt
> > that you can share properties between them via JobConf.
> >
> > See MapReduceLauncher.java as for how cleanUpOnSuccess() is used.
> >
> > On Thu, Dec 5, 2013 at 11:10 AM, Patrick Thompson <
> > patrick@standingwaiting.com> wrote:
> >
> > > It's not clear from the docs where the various StoreFuncInterface
> > functions
> > > get called. There are some hints in the API
> > > docs<http://pig.apache.org/docs/r0.12.0/api/>,
> > > but I am left wondering, does pig guarantee that, for example, putNext
> > and
> > > cleanUpOnSuccess will be called in the same execution context?
> > >
> > > Is this documented somewhere? Maybe someone can provide an answer? It
> > would
> > > save me a lot of time experimenting and spelunking in the code.
> > >
> > > Thanks
> > >
> > > Patrick
> > >
> >
>
>
>
> --
> fun and games - a blog <http://funazonki.blogspot.com/>, a word
> game<http://1.whatwouldwho.appspot.com/wwws.html>and
> CanCan <http://www.standingwaiting.com/CanCan/Game.html>
>

Re: Where and when are StoreFuncInterface functions called

Posted by Patrick Thompson <pa...@standingwaiting.com>.
So is there a good way to flush a buffer accumulated by putNext? I was
hoping it was possible in cleanUpOnSuccess, but that apparently isn't going
to work. This is horrible for something talking to a store such as MySql,
as it means you have to do updates one-at-a-time.

Patrick


On Sun, Dec 15, 2013 at 12:41 AM, Cheolsoo Park <pi...@gmail.com>wrote:

> >> putNext and cleanUpOnSuccess will be called in the same execution
> context?
>
> putNext() is called on the backend during the job execution, whereas
> cleanUpOnSuccess() is called on the frontend after the job is finished. So
> they won't be executed by the same object. From the comment, I also doubt
> that you can share properties between them via JobConf.
>
> See MapReduceLauncher.java as for how cleanUpOnSuccess() is used.
>
> On Thu, Dec 5, 2013 at 11:10 AM, Patrick Thompson <
> patrick@standingwaiting.com> wrote:
>
> > It's not clear from the docs where the various StoreFuncInterface
> functions
> > get called. There are some hints in the API
> > docs<http://pig.apache.org/docs/r0.12.0/api/>,
> > but I am left wondering, does pig guarantee that, for example, putNext
> and
> > cleanUpOnSuccess will be called in the same execution context?
> >
> > Is this documented somewhere? Maybe someone can provide an answer? It
> would
> > save me a lot of time experimenting and spelunking in the code.
> >
> > Thanks
> >
> > Patrick
> >
>



-- 
fun and games - a blog <http://funazonki.blogspot.com/>, a word
game<http://1.whatwouldwho.appspot.com/wwws.html>and
CanCan <http://www.standingwaiting.com/CanCan/Game.html>

Re: Where and when are StoreFuncInterface functions called

Posted by Cheolsoo Park <pi...@gmail.com>.
>> putNext and cleanUpOnSuccess will be called in the same execution
context?

putNext() is called on the backend during the job execution, whereas
cleanUpOnSuccess() is called on the frontend after the job is finished. So
they won't be executed by the same object. From the comment, I also doubt
that you can share properties between them via JobConf.

See MapReduceLauncher.java as for how cleanUpOnSuccess() is used.

On Thu, Dec 5, 2013 at 11:10 AM, Patrick Thompson <
patrick@standingwaiting.com> wrote:

> It's not clear from the docs where the various StoreFuncInterface functions
> get called. There are some hints in the API
> docs<http://pig.apache.org/docs/r0.12.0/api/>,
> but I am left wondering, does pig guarantee that, for example, putNext and
> cleanUpOnSuccess will be called in the same execution context?
>
> Is this documented somewhere? Maybe someone can provide an answer? It would
> save me a lot of time experimenting and spelunking in the code.
>
> Thanks
>
> Patrick
>