You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Dotan Patrich <do...@fortscale.com> on 2014/03/05 16:28:08 UTC

Duplicate jar are getting into PigContext

Hi,
I've run a pig script the has REGISTER command in it using java code that
uses the PigServer.registerScript method.
On each execution of the pig script (using the same java process) the
registered jar is added to the classpath of the taskjvm.sh file.
This is duplicated until we get an error for having a classpath which is
too long...
Debuging it, I can see that the same jar from the local file system is
being added multiple times to the PigContext skipJars member.

Does anyone know of an open issue regarding this? am I using the PigServer
wrong?
I couldn't find any open issue regarding this, so the current workaround
was to remove the register statement from the pig script and call
PigServer.registerJar() once when my java process starts.

Thanks for the help,
Dotan

Re: Duplicate jar are getting into PigContext

Posted by Cheolsoo Park <pi...@gmail.com>.
Oh great! Let me get it fixed.


On Wed, Mar 26, 2014 at 10:11 PM, Dotan Patrich <do...@fortscale.com>wrote:

> Hi Cheolsoo,
>
> Thank you very much for the reply and interest in this.
> I actually did opened a jira issue for this a few weeks ago -
> PIG-3798<https://issues.apache.org/jira/browse/PIG-3798>
>
> Thanks,
> Dotan
>
>
>
> On Wed, Mar 26, 2014 at 8:50 PM, Cheolsoo Park <pi...@gmail.com>
> wrote:
>
> > Hi Dotan,
> >
> > Very sorry for the late reply.
> >
> > >> Debuging it, I can see that the same jar from the local file system is
> > being
> > added multiple times to the PigContext skipJars member.
> >
> > From a brief look, skipJars is updated by
> > JobControlCompiler<
> >
> https://github.com/apache/pig/blob/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java#L1612
> > >,
> > and I can see it's called every time when a new MR job is compiled by
> > getJob()<
> >
> https://github.com/apache/pig/blob/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java#L605
> > >method.
> > A
> > simple fix might be to not add a jar if it is already present in the
> > skipJars list. Do you mind filing a jira?
> >
> > Thanks,
> > Cheolsoo
> >
> >
> >
> >
> > On Wed, Mar 5, 2014 at 7:28 AM, Dotan Patrich <do...@fortscale.com>
> > wrote:
> >
> > > Hi,
> > > I've run a pig script the has REGISTER command in it using java code
> that
> > > uses the PigServer.registerScript method.
> > > On each execution of the pig script (using the same java process) the
> > > registered jar is added to the classpath of the taskjvm.sh file.
> > > This is duplicated until we get an error for having a classpath which
> is
> > > too long...
> > > Debuging it, I can see that the same jar from the local file system is
> > > being added multiple times to the PigContext skipJars member.
> > >
> > > Does anyone know of an open issue regarding this? am I using the
> > PigServer
> > > wrong?
> > > I couldn't find any open issue regarding this, so the current
> workaround
> > > was to remove the register statement from the pig script and call
> > > PigServer.registerJar() once when my java process starts.
> > >
> > > Thanks for the help,
> > > Dotan
> > >
> >
>

Re: Duplicate jar are getting into PigContext

Posted by Dotan Patrich <do...@fortscale.com>.
Hi Cheolsoo,

Thank you very much for the reply and interest in this.
I actually did opened a jira issue for this a few weeks ago -
PIG-3798<https://issues.apache.org/jira/browse/PIG-3798>

Thanks,
Dotan



On Wed, Mar 26, 2014 at 8:50 PM, Cheolsoo Park <pi...@gmail.com> wrote:

> Hi Dotan,
>
> Very sorry for the late reply.
>
> >> Debuging it, I can see that the same jar from the local file system is
> being
> added multiple times to the PigContext skipJars member.
>
> From a brief look, skipJars is updated by
> JobControlCompiler<
> https://github.com/apache/pig/blob/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java#L1612
> >,
> and I can see it's called every time when a new MR job is compiled by
> getJob()<
> https://github.com/apache/pig/blob/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java#L605
> >method.
> A
> simple fix might be to not add a jar if it is already present in the
> skipJars list. Do you mind filing a jira?
>
> Thanks,
> Cheolsoo
>
>
>
>
> On Wed, Mar 5, 2014 at 7:28 AM, Dotan Patrich <do...@fortscale.com>
> wrote:
>
> > Hi,
> > I've run a pig script the has REGISTER command in it using java code that
> > uses the PigServer.registerScript method.
> > On each execution of the pig script (using the same java process) the
> > registered jar is added to the classpath of the taskjvm.sh file.
> > This is duplicated until we get an error for having a classpath which is
> > too long...
> > Debuging it, I can see that the same jar from the local file system is
> > being added multiple times to the PigContext skipJars member.
> >
> > Does anyone know of an open issue regarding this? am I using the
> PigServer
> > wrong?
> > I couldn't find any open issue regarding this, so the current workaround
> > was to remove the register statement from the pig script and call
> > PigServer.registerJar() once when my java process starts.
> >
> > Thanks for the help,
> > Dotan
> >
>

Re: Duplicate jar are getting into PigContext

Posted by Cheolsoo Park <pi...@gmail.com>.
Hi Dotan,

Very sorry for the late reply.

>> Debuging it, I can see that the same jar from the local file system is being
added multiple times to the PigContext skipJars member.

>From a brief look, skipJars is updated by
JobControlCompiler<https://github.com/apache/pig/blob/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java#L1612>,
and I can see it's called every time when a new MR job is compiled by
getJob()<https://github.com/apache/pig/blob/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java#L605>method.
A
simple fix might be to not add a jar if it is already present in the
skipJars list. Do you mind filing a jira?

Thanks,
Cheolsoo




On Wed, Mar 5, 2014 at 7:28 AM, Dotan Patrich <do...@fortscale.com> wrote:

> Hi,
> I've run a pig script the has REGISTER command in it using java code that
> uses the PigServer.registerScript method.
> On each execution of the pig script (using the same java process) the
> registered jar is added to the classpath of the taskjvm.sh file.
> This is duplicated until we get an error for having a classpath which is
> too long...
> Debuging it, I can see that the same jar from the local file system is
> being added multiple times to the PigContext skipJars member.
>
> Does anyone know of an open issue regarding this? am I using the PigServer
> wrong?
> I couldn't find any open issue regarding this, so the current workaround
> was to remove the register statement from the pig script and call
> PigServer.registerJar() once when my java process starts.
>
> Thanks for the help,
> Dotan
>