You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Yang <te...@gmail.com> on 2012/07/03 18:57:54 UTC

for UDF, figure out whether it's on a task tracker?

normally job tracker and task tracker is on different nodes.

when I submit a pig script using UDF. I think the UDF constructor is first
run (several times, don't know why)
on the job tracker, and then it's run on each of the task trackers.

now I want to do some custom work inside the constructor, such as checking
the existence of certain files
which are specific to only task trackers. such work only needs to be done
on task trackers.
So , is there a way to figure out whether the UDF is being run on task
tracker or job tracker?

Thanks!
yang

Re: for UDF, figure out whether it's on a task tracker?

Posted by Yang <te...@gmail.com>.
I see, thanks Jonathan.

On Tue, Jul 3, 2012 at 10:01 AM, Jonathan Coveney <jc...@gmail.com>wrote:

> UDF's are instantiated at job construction time a couple of times in order
> to inspect various properties about them. This is subideal, but alas. I
> generally lazily initialize in exec, as that is only called on the
> mapper/reducer. The lifecycle of UDF's can be a bit confusing in this way.
>
> 2012/7/3 Yang <te...@gmail.com>
>
> > normally job tracker and task tracker is on different nodes.
> >
> > when I submit a pig script using UDF. I think the UDF constructor is
> first
> > run (several times, don't know why)
> > on the job tracker, and then it's run on each of the task trackers.
> >
> > now I want to do some custom work inside the constructor, such as
> checking
> > the existence of certain files
> > which are specific to only task trackers. such work only needs to be done
> > on task trackers.
> > So , is there a way to figure out whether the UDF is being run on task
> > tracker or job tracker?
> >
> > Thanks!
> > yang
> >
>

Re: for UDF, figure out whether it's on a task tracker?

Posted by Jonathan Coveney <jc...@gmail.com>.
UDF's are instantiated at job construction time a couple of times in order
to inspect various properties about them. This is subideal, but alas. I
generally lazily initialize in exec, as that is only called on the
mapper/reducer. The lifecycle of UDF's can be a bit confusing in this way.

2012/7/3 Yang <te...@gmail.com>

> normally job tracker and task tracker is on different nodes.
>
> when I submit a pig script using UDF. I think the UDF constructor is first
> run (several times, don't know why)
> on the job tracker, and then it's run on each of the task trackers.
>
> now I want to do some custom work inside the constructor, such as checking
> the existence of certain files
> which are specific to only task trackers. such work only needs to be done
> on task trackers.
> So , is there a way to figure out whether the UDF is being run on task
> tracker or job tracker?
>
> Thanks!
> yang
>