You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@pig.apache.org by Patrick Salami <pa...@gmail.com> on 2013/03/28 20:51:14 UTC

long parse time

We have some very long pig scripts that run several times per day. We
believe that the script parsing process takes very long (about 1h). During
this time, the pig command just hangs before any output is displayed (I am
assuming this is the parsing phase). My question is, can this process be
optimized by somehow serializing the intermediate parsed script to disk
after the parsing phase is complete so that we don't have to go through the
parsing process each time the script is run (so long as the script itself
does not change)? That way, we could then load and run the parsed
representation of the script rather than re-parsing it for each run. Since
this is probably not a readily-available feature, could someone please
point me to the right place in the code where this intermediate output can
be intercepted?

Thanks!

Re: long parse time

Posted by Patrick Salami <pa...@gmail.com>.

Koji,
thanks for the tip. We did try it with 0.11 and also had the same issue.
Based on your suggestion, we will try it with trunk and see if the issue is
resolved there.

Thanks!


On Wed, Apr 3, 2013 at 12:40 PM, Koji Noguchi <kn...@yahoo-inc.com>wrote:

> With help from the reviewer, I learned that this was fixed in trunk.
> https://issues.apache.org/jira/browse/PIG-2769
>
> but not in 0.11.
>
> Koji
>
>
>
> On Apr 2, 2013, at 1:27 PM, Koji Noguchi wrote:
>
> Hi Patrick,
>
> Did it work with 0.11?  If not, I hit a similar issue and created
>  https://issues.apache.org/jira/browse/PIG-3266
>
> Problem started from pig 0.10.
>
> Koji
>
> On Mar 29, 2013, at 12:17 PM, Patrick Salami wrote:
>
> Thanks for the tip. We are actually using Pig 0.10. I will upgrade to 0.11
> and see if that resolves the issue.
>
>
> On Thu, Mar 28, 2013 at 5:15 PM, Alan Gates <gates@hortonworks.com<
> mailto:gates@hortonworks.com <ga...@hortonworks.com>>> wrote:
>
> What version of Pig are you using?  Unreasonably long parse times were in
> issue in Pig 0.9 and 0.10, I believe those issues were fixed in Pig 0.11.
>
> Alan.
>
> On Mar 28, 2013, at 12:51 PM, Patrick Salami wrote:
>
> We have some very long pig scripts that run several times per day. We
> believe that the script parsing process takes very long (about 1h).
> During
> this time, the pig command just hangs before any output is displayed (I
> am
> assuming this is the parsing phase). My question is, can this process be
> optimized by somehow serializing the intermediate parsed script to disk
> after the parsing phase is complete so that we don't have to go through
> the
> parsing process each time the script is run (so long as the script itself
> does not change)? That way, we could then load and run the parsed
> representation of the script rather than re-parsing it for each run.
> Since
> this is probably not a readily-available feature, could someone please
> point me to the right place in the code where this intermediate output
> can
> be intercepted?
>
> Thanks!
>
>
>
>
>

Re: long parse time

Posted by Koji Noguchi <kn...@yahoo-inc.com>.

With help from the reviewer, I learned that this was fixed in trunk.
https://issues.apache.org/jira/browse/PIG-2769

but not in 0.11.

Koji



On Apr 2, 2013, at 1:27 PM, Koji Noguchi wrote:

Hi Patrick,

Did it work with 0.11?  If not, I hit a similar issue and created
 https://issues.apache.org/jira/browse/PIG-3266

Problem started from pig 0.10.

Koji

On Mar 29, 2013, at 12:17 PM, Patrick Salami wrote:

Thanks for the tip. We are actually using Pig 0.10. I will upgrade to 0.11
and see if that resolves the issue.


On Thu, Mar 28, 2013 at 5:15 PM, Alan Gates <ga...@hortonworks.com>> wrote:

What version of Pig are you using?  Unreasonably long parse times were in
issue in Pig 0.9 and 0.10, I believe those issues were fixed in Pig 0.11.

Alan.

On Mar 28, 2013, at 12:51 PM, Patrick Salami wrote:

We have some very long pig scripts that run several times per day. We
believe that the script parsing process takes very long (about 1h).
During
this time, the pig command just hangs before any output is displayed (I
am
assuming this is the parsing phase). My question is, can this process be
optimized by somehow serializing the intermediate parsed script to disk
after the parsing phase is complete so that we don't have to go through
the
parsing process each time the script is run (so long as the script itself
does not change)? That way, we could then load and run the parsed
representation of the script rather than re-parsing it for each run.
Since
this is probably not a readily-available feature, could someone please
point me to the right place in the code where this intermediate output
can
be intercepted?

Thanks!

Re: long parse time

Posted by Koji Noguchi <kn...@yahoo-inc.com>.

Hi Patrick,

Did it work with 0.11?  If not, I hit a similar issue and created
  https://issues.apache.org/jira/browse/PIG-3266

Problem started from pig 0.10.

Koji

On Mar 29, 2013, at 12:17 PM, Patrick Salami wrote:

Thanks for the tip. We are actually using Pig 0.10. I will upgrade to 0.11
and see if that resolves the issue.


On Thu, Mar 28, 2013 at 5:15 PM, Alan Gates <ga...@hortonworks.com>> wrote:

What version of Pig are you using?  Unreasonably long parse times were in
issue in Pig 0.9 and 0.10, I believe those issues were fixed in Pig 0.11.

Alan.

On Mar 28, 2013, at 12:51 PM, Patrick Salami wrote:

We have some very long pig scripts that run several times per day. We
believe that the script parsing process takes very long (about 1h).
During
this time, the pig command just hangs before any output is displayed (I
am
assuming this is the parsing phase). My question is, can this process be
optimized by somehow serializing the intermediate parsed script to disk
after the parsing phase is complete so that we don't have to go through
the
parsing process each time the script is run (so long as the script itself
does not change)? That way, we could then load and run the parsed
representation of the script rather than re-parsing it for each run.
Since
this is probably not a readily-available feature, could someone please
point me to the right place in the code where this intermediate output
can
be intercepted?

Thanks!

Re: long parse time

Posted by Patrick Salami <pa...@gmail.com>.

Thanks for the tip. We are actually using Pig 0.10. I will upgrade to 0.11
and see if that resolves the issue.


On Thu, Mar 28, 2013 at 5:15 PM, Alan Gates <ga...@hortonworks.com> wrote:

> What version of Pig are you using?  Unreasonably long parse times were in
> issue in Pig 0.9 and 0.10, I believe those issues were fixed in Pig 0.11.
>
> Alan.
>
> On Mar 28, 2013, at 12:51 PM, Patrick Salami wrote:
>
> > We have some very long pig scripts that run several times per day. We
> > believe that the script parsing process takes very long (about 1h).
> During
> > this time, the pig command just hangs before any output is displayed (I
> am
> > assuming this is the parsing phase). My question is, can this process be
> > optimized by somehow serializing the intermediate parsed script to disk
> > after the parsing phase is complete so that we don't have to go through
> the
> > parsing process each time the script is run (so long as the script itself
> > does not change)? That way, we could then load and run the parsed
> > representation of the script rather than re-parsing it for each run.
> Since
> > this is probably not a readily-available feature, could someone please
> > point me to the right place in the code where this intermediate output
> can
> > be intercepted?
> >
> > Thanks!
>
>

Re: long parse time

Posted by Alan Gates <ga...@hortonworks.com>.

What version of Pig are you using?  Unreasonably long parse times were in issue in Pig 0.9 and 0.10, I believe those issues were fixed in Pig 0.11.

Alan.

On Mar 28, 2013, at 12:51 PM, Patrick Salami wrote:

> We have some very long pig scripts that run several times per day. We
> believe that the script parsing process takes very long (about 1h). During
> this time, the pig command just hangs before any output is displayed (I am
> assuming this is the parsing phase). My question is, can this process be
> optimized by somehow serializing the intermediate parsed script to disk
> after the parsing phase is complete so that we don't have to go through the
> parsing process each time the script is run (so long as the script itself
> does not change)? That way, we could then load and run the parsed
> representation of the script rather than re-parsing it for each run. Since
> this is probably not a readily-available feature, could someone please
> point me to the right place in the code where this intermediate output can
> be intercepted?
> 
> Thanks!