You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Prashant Kommireddi <pr...@gmail.com> on 2012/08/08 00:44:23 UTC

Pig 0.10.0 slow startup

Hi All,

Just wanted to follow-up on Chun's question. Several of our Pig users have
been experiencing slow start-ups with Pig 0.10.0, when the same script runs
fine with 0.9.1. Anyone else facing similar issues?

Thanks,
Prashant

Hi all,

I'm trying to move from Pig 0.9.1 to Pig 0.10.0 . When I try to run the same
script using the two Pig versions, 0.9.1 starts off fast and almost
immediately submits the job to the cluster. On the other hand, Pig 0.10.0
takes forever to submit the job. When I use the java option
-XX:+PrintGCDetails, I see that for 0.10.0 the GC is being run many times
before and after the job is submitted to the cluster.

Does anyone know what is causing this and/or how I might be able to
troubleshoot it?

I've uploaded truncated output showing when GC happens to
Pastebin:http://pastebin.com/B8WTHW9r

Thanks,
Chun

Re: Pig 0.10.0 slow startup

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Julien removed a dozen or so loader/storer instantiations.
That can do it if you do work in constructors.

D

On Fri, Aug 10, 2012 at 1:15 PM, Prashant Kommireddi
<pr...@gmail.com> wrote:
> Thanks Chun.
>
> Jon, any idea what on 0.11 might have fixed it?
>
> On Thu, Aug 9, 2012 at 3:32 PM, Chun Yang
> <cy...@contractor.salesforce.com>wrote:
>
>> I tried with pig11 (from git), timing for the two variants are more
>> comparable.
>>
>> stats for `pig11 -b -e 'explain -script students-a.pig'`
>> 6.33s user 0.74s system 153% cpu 4.611 total
>> 6.55s user 0.68s system 155% cpu 4.664 total
>> 6.40s user 0.79s system 157% cpu 4.560 total
>> 6.47s user 0.62s system 155% cpu 4.560 total
>>
>> stats for `pig11 -b -e 'explain -script students-b.pig'`
>> 5.66s user 0.62s system 169% cpu 3.707 total
>> 5.69s user 0.53s system 165% cpu 3.758 total
>> 5.44s user 0.70s system 165% cpu 3.706 total
>> 5.68s user 0.51s system 166% cpu 3.708 total
>>
>> So looks like it was fixed somewhere for 0.11?
>> ________________________________________
>> From: Jonathan Coveney [jcoveney@gmail.com]
>> Sent: Thursday, August 09, 2012 11:00 AM
>> To: user@pig.apache.org
>> Subject: Re: Pig 0.10.0 slow startup
>>
>> Can you do me a favor and run the exact same stuff with pig11? Just to
>> isolate if this is an issue that has been removed. I will also try and run
>> this on pig10, to see if I can see te same issue.
>>
>> 2012/8/8 Chun Yang <cy...@contractor.salesforce.com>
>>
>> > Thanks Jonathan,
>> >
>> > Here are some numbers that I'm getting from Pig 0.10 and Pig 0.9.1:
>> >
>> > pig10 -b -e 'explain -script students-a.pig'  35.35s user 8.52s system
>> 63%
>> > cpu 1:08.77 total
>> >
>> > pig10 -b -e 'explain -script students-b.pig'  5.32s user 0.48s system
>> 130%
>> > cpu 4.460 total
>> >
>> > pig9 -b -e 'explain -script students-a.pig'  4.93s user 0.51s system 131%
>> > cpu 4.153 total
>> >
>> > pig9 -b -e 'explain -script students-b.pig'  3.86s user 0.41s system 131%
>> > cpu 3.254 total
>> >
>> > Seems like the first run is always slower, but subsequent runs are about
>> > the
>> > same:
>> >
>> > pig10 -b -e 'explain -script students-a.pig'  35.17s user 8.20s system
>> 123%
>> > cpu 35.017 total
>> >
>> > pig10 -b -e 'explain -script students-a.pig'  35.41s user 8.55s system
>> 122%
>> > cpu 35.803 total
>> >
>> > A little more than 1.5s slowdown :)
>> >
>> > Thanks,
>> > Chun
>> >
>> > On 8/8/12 5:38 PM, "Jonathan Coveney" <jc...@gmail.com> wrote:
>> >
>> > > Thanks for putting that together, Chun.
>> > >
>> > > So, it looks like there are ~400 instantiations of the class, and the
>> > time
>> > > from the first instantiation to the last one is about ~1.5s. Is that on
>> > the
>> > > order of the slowdown your experiencing?
>> > >
>> > > (note: I'm testing with Pig 11...if your slowdown is much higher than
>> > that,
>> > > I'll test on Pig 10)
>> > >
>> > > Either way, it seems like the slowdown is directly attributable to UDF
>> > > invocations. Have you seen slowdowns much larger than this?
>> > >
>> > > 2012/8/8 Chun Yang <cy...@contractor.salesforce.com>
>> > >
>> > >> Hi Jonathan,
>> > >>
>> > >> Here is a more self-contained example than what I had before:
>> > >> http://ews.illinois.edu/~yang43/shared/students.tar.gz
>> > >>
>> > >> I wrote a trivial GFV class, but the slowdown still exists.
>> > >> students-a.pig starts up noticeably slower than students-b.pig .
>> > >>
>> > >> Thanks,
>> > >> Chun
>> > >>
>> > >> On 8/8/12 12:22 PM, "Jonathan Coveney" <jc...@gmail.com> wrote:
>> > >>
>> > >>> Thanks for this info. Can you go ahead and paste the whole GFV class?
>> > >>>
>> > >>> Thanks
>> > >>>
>> > >>> 2012/8/8 Chun Yang <cy...@contractor.salesforce.com>
>> > >>>
>> > >>>> Thanks Jonathan,
>> > >>>>
>> > >>>> I've tried to produce an example script which exhibits the slowdown
>> > and
>> > >>>> posted it on Pastebin: http://pastebin.com/kTSsDUr3
>> > >>>>
>> > >>>> The slowdown seems to occur when we are using a lot of UDFs to parse
>> > our
>> > >>>> input data. Variant A in the script is noticeably slower than
>> variant
>> > B
>> > >> in
>> > >>>> Pig 0.10 while performance is similar in Pig 0.9.1
>> > >>>>
>> > >>>> I've pasted the exec() function of the GFV function on Pastebin as
>> > well:
>> > >>>> http://pastebin.com/FVnkQCJ5
>> > >>>>
>> > >>>> Please let us know if you need more details.
>> > >>>>
>> > >>>> Thanks,
>> > >>>> Chun
>> > >>>>
>> > >>>> On 8/7/12 10:07 PM, "Jonathan Coveney" <jc...@gmail.com> wrote:
>> > >>>>
>> > >>>>> Can you guys give a script that has the issue? My tactic would be
>> to
>> > >> use
>> > >>>>> some sort of profiler (we have access to YourKit for open source
>> Pig
>> > >>>>> contribution work) and try and isolate what is triggering GC.
>> > >>>>>
>> > >>>>> 2012/8/7 Prashant Kommireddi <pr...@gmail.com>
>> > >>>>>
>> > >>>>>> Hi All,
>> > >>>>>>
>> > >>>>>> Just wanted to follow-up on Chun's question. Several of our Pig
>> > users
>> > >>>> have
>> > >>>>>> been experiencing slow start-ups with Pig 0.10.0, when the same
>> > script
>> > >>>> runs
>> > >>>>>> fine with 0.9.1. Anyone else facing similar issues?
>> > >>>>>>
>> > >>>>>> Thanks,
>> > >>>>>> Prashant
>> > >>>>>>
>> > >>>>>> Hi all,
>> > >>>>>>
>> > >>>>>> I'm trying to move from Pig 0.9.1 to Pig 0.10.0 . When I try to
>> run
>> > >> the
>> > >>>>>> same
>> > >>>>>> script using the two Pig versions, 0.9.1 starts off fast and
>> almost
>> > >>>>>> immediately submits the job to the cluster. On the other hand, Pig
>> > >>>> 0.10.0
>> > >>>>>> takes forever to submit the job. When I use the java option
>> > >>>>>> -XX:+PrintGCDetails, I see that for 0.10.0 the GC is being run
>> many
>> > >>>> times
>> > >>>>>> before and after the job is submitted to the cluster.
>> > >>>>>>
>> > >>>>>> Does anyone know what is causing this and/or how I might be able
>> to
>> > >>>>>> troubleshoot it?
>> > >>>>>>
>> > >>>>>> I've uploaded truncated output showing when GC happens to
>> > >>>>>> Pastebin:http://pastebin.com/B8WTHW9r
>> > >>>>>>
>> > >>>>>> Thanks,
>> > >>>>>> Chun
>> > >>>>>>
>> > >>>>
>> > >>>>
>> > >>
>> > >>
>> >
>> >
>>

Re: Pig 0.10.0 slow startup

Posted by Prashant Kommireddi <pr...@gmail.com>.
Thanks Chun.

Jon, any idea what on 0.11 might have fixed it?

On Thu, Aug 9, 2012 at 3:32 PM, Chun Yang
<cy...@contractor.salesforce.com>wrote:

> I tried with pig11 (from git), timing for the two variants are more
> comparable.
>
> stats for `pig11 -b -e 'explain -script students-a.pig'`
> 6.33s user 0.74s system 153% cpu 4.611 total
> 6.55s user 0.68s system 155% cpu 4.664 total
> 6.40s user 0.79s system 157% cpu 4.560 total
> 6.47s user 0.62s system 155% cpu 4.560 total
>
> stats for `pig11 -b -e 'explain -script students-b.pig'`
> 5.66s user 0.62s system 169% cpu 3.707 total
> 5.69s user 0.53s system 165% cpu 3.758 total
> 5.44s user 0.70s system 165% cpu 3.706 total
> 5.68s user 0.51s system 166% cpu 3.708 total
>
> So looks like it was fixed somewhere for 0.11?
> ________________________________________
> From: Jonathan Coveney [jcoveney@gmail.com]
> Sent: Thursday, August 09, 2012 11:00 AM
> To: user@pig.apache.org
> Subject: Re: Pig 0.10.0 slow startup
>
> Can you do me a favor and run the exact same stuff with pig11? Just to
> isolate if this is an issue that has been removed. I will also try and run
> this on pig10, to see if I can see te same issue.
>
> 2012/8/8 Chun Yang <cy...@contractor.salesforce.com>
>
> > Thanks Jonathan,
> >
> > Here are some numbers that I'm getting from Pig 0.10 and Pig 0.9.1:
> >
> > pig10 -b -e 'explain -script students-a.pig'  35.35s user 8.52s system
> 63%
> > cpu 1:08.77 total
> >
> > pig10 -b -e 'explain -script students-b.pig'  5.32s user 0.48s system
> 130%
> > cpu 4.460 total
> >
> > pig9 -b -e 'explain -script students-a.pig'  4.93s user 0.51s system 131%
> > cpu 4.153 total
> >
> > pig9 -b -e 'explain -script students-b.pig'  3.86s user 0.41s system 131%
> > cpu 3.254 total
> >
> > Seems like the first run is always slower, but subsequent runs are about
> > the
> > same:
> >
> > pig10 -b -e 'explain -script students-a.pig'  35.17s user 8.20s system
> 123%
> > cpu 35.017 total
> >
> > pig10 -b -e 'explain -script students-a.pig'  35.41s user 8.55s system
> 122%
> > cpu 35.803 total
> >
> > A little more than 1.5s slowdown :)
> >
> > Thanks,
> > Chun
> >
> > On 8/8/12 5:38 PM, "Jonathan Coveney" <jc...@gmail.com> wrote:
> >
> > > Thanks for putting that together, Chun.
> > >
> > > So, it looks like there are ~400 instantiations of the class, and the
> > time
> > > from the first instantiation to the last one is about ~1.5s. Is that on
> > the
> > > order of the slowdown your experiencing?
> > >
> > > (note: I'm testing with Pig 11...if your slowdown is much higher than
> > that,
> > > I'll test on Pig 10)
> > >
> > > Either way, it seems like the slowdown is directly attributable to UDF
> > > invocations. Have you seen slowdowns much larger than this?
> > >
> > > 2012/8/8 Chun Yang <cy...@contractor.salesforce.com>
> > >
> > >> Hi Jonathan,
> > >>
> > >> Here is a more self-contained example than what I had before:
> > >> http://ews.illinois.edu/~yang43/shared/students.tar.gz
> > >>
> > >> I wrote a trivial GFV class, but the slowdown still exists.
> > >> students-a.pig starts up noticeably slower than students-b.pig .
> > >>
> > >> Thanks,
> > >> Chun
> > >>
> > >> On 8/8/12 12:22 PM, "Jonathan Coveney" <jc...@gmail.com> wrote:
> > >>
> > >>> Thanks for this info. Can you go ahead and paste the whole GFV class?
> > >>>
> > >>> Thanks
> > >>>
> > >>> 2012/8/8 Chun Yang <cy...@contractor.salesforce.com>
> > >>>
> > >>>> Thanks Jonathan,
> > >>>>
> > >>>> I've tried to produce an example script which exhibits the slowdown
> > and
> > >>>> posted it on Pastebin: http://pastebin.com/kTSsDUr3
> > >>>>
> > >>>> The slowdown seems to occur when we are using a lot of UDFs to parse
> > our
> > >>>> input data. Variant A in the script is noticeably slower than
> variant
> > B
> > >> in
> > >>>> Pig 0.10 while performance is similar in Pig 0.9.1
> > >>>>
> > >>>> I've pasted the exec() function of the GFV function on Pastebin as
> > well:
> > >>>> http://pastebin.com/FVnkQCJ5
> > >>>>
> > >>>> Please let us know if you need more details.
> > >>>>
> > >>>> Thanks,
> > >>>> Chun
> > >>>>
> > >>>> On 8/7/12 10:07 PM, "Jonathan Coveney" <jc...@gmail.com> wrote:
> > >>>>
> > >>>>> Can you guys give a script that has the issue? My tactic would be
> to
> > >> use
> > >>>>> some sort of profiler (we have access to YourKit for open source
> Pig
> > >>>>> contribution work) and try and isolate what is triggering GC.
> > >>>>>
> > >>>>> 2012/8/7 Prashant Kommireddi <pr...@gmail.com>
> > >>>>>
> > >>>>>> Hi All,
> > >>>>>>
> > >>>>>> Just wanted to follow-up on Chun's question. Several of our Pig
> > users
> > >>>> have
> > >>>>>> been experiencing slow start-ups with Pig 0.10.0, when the same
> > script
> > >>>> runs
> > >>>>>> fine with 0.9.1. Anyone else facing similar issues?
> > >>>>>>
> > >>>>>> Thanks,
> > >>>>>> Prashant
> > >>>>>>
> > >>>>>> Hi all,
> > >>>>>>
> > >>>>>> I'm trying to move from Pig 0.9.1 to Pig 0.10.0 . When I try to
> run
> > >> the
> > >>>>>> same
> > >>>>>> script using the two Pig versions, 0.9.1 starts off fast and
> almost
> > >>>>>> immediately submits the job to the cluster. On the other hand, Pig
> > >>>> 0.10.0
> > >>>>>> takes forever to submit the job. When I use the java option
> > >>>>>> -XX:+PrintGCDetails, I see that for 0.10.0 the GC is being run
> many
> > >>>> times
> > >>>>>> before and after the job is submitted to the cluster.
> > >>>>>>
> > >>>>>> Does anyone know what is causing this and/or how I might be able
> to
> > >>>>>> troubleshoot it?
> > >>>>>>
> > >>>>>> I've uploaded truncated output showing when GC happens to
> > >>>>>> Pastebin:http://pastebin.com/B8WTHW9r
> > >>>>>>
> > >>>>>> Thanks,
> > >>>>>> Chun
> > >>>>>>
> > >>>>
> > >>>>
> > >>
> > >>
> >
> >
>

RE: Pig 0.10.0 slow startup

Posted by Chun Yang <cy...@contractor.salesforce.com>.
I tried with pig11 (from git), timing for the two variants are more comparable.

stats for `pig11 -b -e 'explain -script students-a.pig'`
6.33s user 0.74s system 153% cpu 4.611 total
6.55s user 0.68s system 155% cpu 4.664 total
6.40s user 0.79s system 157% cpu 4.560 total
6.47s user 0.62s system 155% cpu 4.560 total

stats for `pig11 -b -e 'explain -script students-b.pig'`
5.66s user 0.62s system 169% cpu 3.707 total
5.69s user 0.53s system 165% cpu 3.758 total
5.44s user 0.70s system 165% cpu 3.706 total
5.68s user 0.51s system 166% cpu 3.708 total

So looks like it was fixed somewhere for 0.11?
________________________________________
From: Jonathan Coveney [jcoveney@gmail.com]
Sent: Thursday, August 09, 2012 11:00 AM
To: user@pig.apache.org
Subject: Re: Pig 0.10.0 slow startup

Can you do me a favor and run the exact same stuff with pig11? Just to
isolate if this is an issue that has been removed. I will also try and run
this on pig10, to see if I can see te same issue.

2012/8/8 Chun Yang <cy...@contractor.salesforce.com>

> Thanks Jonathan,
>
> Here are some numbers that I'm getting from Pig 0.10 and Pig 0.9.1:
>
> pig10 -b -e 'explain -script students-a.pig'  35.35s user 8.52s system 63%
> cpu 1:08.77 total
>
> pig10 -b -e 'explain -script students-b.pig'  5.32s user 0.48s system 130%
> cpu 4.460 total
>
> pig9 -b -e 'explain -script students-a.pig'  4.93s user 0.51s system 131%
> cpu 4.153 total
>
> pig9 -b -e 'explain -script students-b.pig'  3.86s user 0.41s system 131%
> cpu 3.254 total
>
> Seems like the first run is always slower, but subsequent runs are about
> the
> same:
>
> pig10 -b -e 'explain -script students-a.pig'  35.17s user 8.20s system 123%
> cpu 35.017 total
>
> pig10 -b -e 'explain -script students-a.pig'  35.41s user 8.55s system 122%
> cpu 35.803 total
>
> A little more than 1.5s slowdown :)
>
> Thanks,
> Chun
>
> On 8/8/12 5:38 PM, "Jonathan Coveney" <jc...@gmail.com> wrote:
>
> > Thanks for putting that together, Chun.
> >
> > So, it looks like there are ~400 instantiations of the class, and the
> time
> > from the first instantiation to the last one is about ~1.5s. Is that on
> the
> > order of the slowdown your experiencing?
> >
> > (note: I'm testing with Pig 11...if your slowdown is much higher than
> that,
> > I'll test on Pig 10)
> >
> > Either way, it seems like the slowdown is directly attributable to UDF
> > invocations. Have you seen slowdowns much larger than this?
> >
> > 2012/8/8 Chun Yang <cy...@contractor.salesforce.com>
> >
> >> Hi Jonathan,
> >>
> >> Here is a more self-contained example than what I had before:
> >> http://ews.illinois.edu/~yang43/shared/students.tar.gz
> >>
> >> I wrote a trivial GFV class, but the slowdown still exists.
> >> students-a.pig starts up noticeably slower than students-b.pig .
> >>
> >> Thanks,
> >> Chun
> >>
> >> On 8/8/12 12:22 PM, "Jonathan Coveney" <jc...@gmail.com> wrote:
> >>
> >>> Thanks for this info. Can you go ahead and paste the whole GFV class?
> >>>
> >>> Thanks
> >>>
> >>> 2012/8/8 Chun Yang <cy...@contractor.salesforce.com>
> >>>
> >>>> Thanks Jonathan,
> >>>>
> >>>> I've tried to produce an example script which exhibits the slowdown
> and
> >>>> posted it on Pastebin: http://pastebin.com/kTSsDUr3
> >>>>
> >>>> The slowdown seems to occur when we are using a lot of UDFs to parse
> our
> >>>> input data. Variant A in the script is noticeably slower than variant
> B
> >> in
> >>>> Pig 0.10 while performance is similar in Pig 0.9.1
> >>>>
> >>>> I've pasted the exec() function of the GFV function on Pastebin as
> well:
> >>>> http://pastebin.com/FVnkQCJ5
> >>>>
> >>>> Please let us know if you need more details.
> >>>>
> >>>> Thanks,
> >>>> Chun
> >>>>
> >>>> On 8/7/12 10:07 PM, "Jonathan Coveney" <jc...@gmail.com> wrote:
> >>>>
> >>>>> Can you guys give a script that has the issue? My tactic would be to
> >> use
> >>>>> some sort of profiler (we have access to YourKit for open source Pig
> >>>>> contribution work) and try and isolate what is triggering GC.
> >>>>>
> >>>>> 2012/8/7 Prashant Kommireddi <pr...@gmail.com>
> >>>>>
> >>>>>> Hi All,
> >>>>>>
> >>>>>> Just wanted to follow-up on Chun's question. Several of our Pig
> users
> >>>> have
> >>>>>> been experiencing slow start-ups with Pig 0.10.0, when the same
> script
> >>>> runs
> >>>>>> fine with 0.9.1. Anyone else facing similar issues?
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Prashant
> >>>>>>
> >>>>>> Hi all,
> >>>>>>
> >>>>>> I'm trying to move from Pig 0.9.1 to Pig 0.10.0 . When I try to run
> >> the
> >>>>>> same
> >>>>>> script using the two Pig versions, 0.9.1 starts off fast and almost
> >>>>>> immediately submits the job to the cluster. On the other hand, Pig
> >>>> 0.10.0
> >>>>>> takes forever to submit the job. When I use the java option
> >>>>>> -XX:+PrintGCDetails, I see that for 0.10.0 the GC is being run many
> >>>> times
> >>>>>> before and after the job is submitted to the cluster.
> >>>>>>
> >>>>>> Does anyone know what is causing this and/or how I might be able to
> >>>>>> troubleshoot it?
> >>>>>>
> >>>>>> I've uploaded truncated output showing when GC happens to
> >>>>>> Pastebin:http://pastebin.com/B8WTHW9r
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Chun
> >>>>>>
> >>>>
> >>>>
> >>
> >>
>
>

Re: Pig 0.10.0 slow startup

Posted by Jonathan Coveney <jc...@gmail.com>.
Can you do me a favor and run the exact same stuff with pig11? Just to
isolate if this is an issue that has been removed. I will also try and run
this on pig10, to see if I can see te same issue.

2012/8/8 Chun Yang <cy...@contractor.salesforce.com>

> Thanks Jonathan,
>
> Here are some numbers that I'm getting from Pig 0.10 and Pig 0.9.1:
>
> pig10 -b -e 'explain -script students-a.pig'  35.35s user 8.52s system 63%
> cpu 1:08.77 total
>
> pig10 -b -e 'explain -script students-b.pig'  5.32s user 0.48s system 130%
> cpu 4.460 total
>
> pig9 -b -e 'explain -script students-a.pig'  4.93s user 0.51s system 131%
> cpu 4.153 total
>
> pig9 -b -e 'explain -script students-b.pig'  3.86s user 0.41s system 131%
> cpu 3.254 total
>
> Seems like the first run is always slower, but subsequent runs are about
> the
> same:
>
> pig10 -b -e 'explain -script students-a.pig'  35.17s user 8.20s system 123%
> cpu 35.017 total
>
> pig10 -b -e 'explain -script students-a.pig'  35.41s user 8.55s system 122%
> cpu 35.803 total
>
> A little more than 1.5s slowdown :)
>
> Thanks,
> Chun
>
> On 8/8/12 5:38 PM, "Jonathan Coveney" <jc...@gmail.com> wrote:
>
> > Thanks for putting that together, Chun.
> >
> > So, it looks like there are ~400 instantiations of the class, and the
> time
> > from the first instantiation to the last one is about ~1.5s. Is that on
> the
> > order of the slowdown your experiencing?
> >
> > (note: I'm testing with Pig 11...if your slowdown is much higher than
> that,
> > I'll test on Pig 10)
> >
> > Either way, it seems like the slowdown is directly attributable to UDF
> > invocations. Have you seen slowdowns much larger than this?
> >
> > 2012/8/8 Chun Yang <cy...@contractor.salesforce.com>
> >
> >> Hi Jonathan,
> >>
> >> Here is a more self-contained example than what I had before:
> >> http://ews.illinois.edu/~yang43/shared/students.tar.gz
> >>
> >> I wrote a trivial GFV class, but the slowdown still exists.
> >> students-a.pig starts up noticeably slower than students-b.pig .
> >>
> >> Thanks,
> >> Chun
> >>
> >> On 8/8/12 12:22 PM, "Jonathan Coveney" <jc...@gmail.com> wrote:
> >>
> >>> Thanks for this info. Can you go ahead and paste the whole GFV class?
> >>>
> >>> Thanks
> >>>
> >>> 2012/8/8 Chun Yang <cy...@contractor.salesforce.com>
> >>>
> >>>> Thanks Jonathan,
> >>>>
> >>>> I've tried to produce an example script which exhibits the slowdown
> and
> >>>> posted it on Pastebin: http://pastebin.com/kTSsDUr3
> >>>>
> >>>> The slowdown seems to occur when we are using a lot of UDFs to parse
> our
> >>>> input data. Variant A in the script is noticeably slower than variant
> B
> >> in
> >>>> Pig 0.10 while performance is similar in Pig 0.9.1
> >>>>
> >>>> I've pasted the exec() function of the GFV function on Pastebin as
> well:
> >>>> http://pastebin.com/FVnkQCJ5
> >>>>
> >>>> Please let us know if you need more details.
> >>>>
> >>>> Thanks,
> >>>> Chun
> >>>>
> >>>> On 8/7/12 10:07 PM, "Jonathan Coveney" <jc...@gmail.com> wrote:
> >>>>
> >>>>> Can you guys give a script that has the issue? My tactic would be to
> >> use
> >>>>> some sort of profiler (we have access to YourKit for open source Pig
> >>>>> contribution work) and try and isolate what is triggering GC.
> >>>>>
> >>>>> 2012/8/7 Prashant Kommireddi <pr...@gmail.com>
> >>>>>
> >>>>>> Hi All,
> >>>>>>
> >>>>>> Just wanted to follow-up on Chun's question. Several of our Pig
> users
> >>>> have
> >>>>>> been experiencing slow start-ups with Pig 0.10.0, when the same
> script
> >>>> runs
> >>>>>> fine with 0.9.1. Anyone else facing similar issues?
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Prashant
> >>>>>>
> >>>>>> Hi all,
> >>>>>>
> >>>>>> I'm trying to move from Pig 0.9.1 to Pig 0.10.0 . When I try to run
> >> the
> >>>>>> same
> >>>>>> script using the two Pig versions, 0.9.1 starts off fast and almost
> >>>>>> immediately submits the job to the cluster. On the other hand, Pig
> >>>> 0.10.0
> >>>>>> takes forever to submit the job. When I use the java option
> >>>>>> -XX:+PrintGCDetails, I see that for 0.10.0 the GC is being run many
> >>>> times
> >>>>>> before and after the job is submitted to the cluster.
> >>>>>>
> >>>>>> Does anyone know what is causing this and/or how I might be able to
> >>>>>> troubleshoot it?
> >>>>>>
> >>>>>> I've uploaded truncated output showing when GC happens to
> >>>>>> Pastebin:http://pastebin.com/B8WTHW9r
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Chun
> >>>>>>
> >>>>
> >>>>
> >>
> >>
>
>

Re: Pig 0.10.0 slow startup

Posted by Chun Yang <cy...@contractor.salesforce.com>.
Thanks Jonathan,

Here are some numbers that I'm getting from Pig 0.10 and Pig 0.9.1:

pig10 -b -e 'explain -script students-a.pig'  35.35s user 8.52s system 63%
cpu 1:08.77 total

pig10 -b -e 'explain -script students-b.pig'  5.32s user 0.48s system 130%
cpu 4.460 total

pig9 -b -e 'explain -script students-a.pig'  4.93s user 0.51s system 131%
cpu 4.153 total

pig9 -b -e 'explain -script students-b.pig'  3.86s user 0.41s system 131%
cpu 3.254 total

Seems like the first run is always slower, but subsequent runs are about the
same:

pig10 -b -e 'explain -script students-a.pig'  35.17s user 8.20s system 123%
cpu 35.017 total

pig10 -b -e 'explain -script students-a.pig'  35.41s user 8.55s system 122%
cpu 35.803 total

A little more than 1.5s slowdown :)

Thanks,
Chun

On 8/8/12 5:38 PM, "Jonathan Coveney" <jc...@gmail.com> wrote:

> Thanks for putting that together, Chun.
> 
> So, it looks like there are ~400 instantiations of the class, and the time
> from the first instantiation to the last one is about ~1.5s. Is that on the
> order of the slowdown your experiencing?
> 
> (note: I'm testing with Pig 11...if your slowdown is much higher than that,
> I'll test on Pig 10)
> 
> Either way, it seems like the slowdown is directly attributable to UDF
> invocations. Have you seen slowdowns much larger than this?
> 
> 2012/8/8 Chun Yang <cy...@contractor.salesforce.com>
> 
>> Hi Jonathan,
>> 
>> Here is a more self-contained example than what I had before:
>> http://ews.illinois.edu/~yang43/shared/students.tar.gz
>> 
>> I wrote a trivial GFV class, but the slowdown still exists.
>> students-a.pig starts up noticeably slower than students-b.pig .
>> 
>> Thanks,
>> Chun
>> 
>> On 8/8/12 12:22 PM, "Jonathan Coveney" <jc...@gmail.com> wrote:
>> 
>>> Thanks for this info. Can you go ahead and paste the whole GFV class?
>>> 
>>> Thanks
>>> 
>>> 2012/8/8 Chun Yang <cy...@contractor.salesforce.com>
>>> 
>>>> Thanks Jonathan,
>>>> 
>>>> I've tried to produce an example script which exhibits the slowdown and
>>>> posted it on Pastebin: http://pastebin.com/kTSsDUr3
>>>> 
>>>> The slowdown seems to occur when we are using a lot of UDFs to parse our
>>>> input data. Variant A in the script is noticeably slower than variant B
>> in
>>>> Pig 0.10 while performance is similar in Pig 0.9.1
>>>> 
>>>> I've pasted the exec() function of the GFV function on Pastebin as well:
>>>> http://pastebin.com/FVnkQCJ5
>>>> 
>>>> Please let us know if you need more details.
>>>> 
>>>> Thanks,
>>>> Chun
>>>> 
>>>> On 8/7/12 10:07 PM, "Jonathan Coveney" <jc...@gmail.com> wrote:
>>>> 
>>>>> Can you guys give a script that has the issue? My tactic would be to
>> use
>>>>> some sort of profiler (we have access to YourKit for open source Pig
>>>>> contribution work) and try and isolate what is triggering GC.
>>>>> 
>>>>> 2012/8/7 Prashant Kommireddi <pr...@gmail.com>
>>>>> 
>>>>>> Hi All,
>>>>>> 
>>>>>> Just wanted to follow-up on Chun's question. Several of our Pig users
>>>> have
>>>>>> been experiencing slow start-ups with Pig 0.10.0, when the same script
>>>> runs
>>>>>> fine with 0.9.1. Anyone else facing similar issues?
>>>>>> 
>>>>>> Thanks,
>>>>>> Prashant
>>>>>> 
>>>>>> Hi all,
>>>>>> 
>>>>>> I'm trying to move from Pig 0.9.1 to Pig 0.10.0 . When I try to run
>> the
>>>>>> same
>>>>>> script using the two Pig versions, 0.9.1 starts off fast and almost
>>>>>> immediately submits the job to the cluster. On the other hand, Pig
>>>> 0.10.0
>>>>>> takes forever to submit the job. When I use the java option
>>>>>> -XX:+PrintGCDetails, I see that for 0.10.0 the GC is being run many
>>>> times
>>>>>> before and after the job is submitted to the cluster.
>>>>>> 
>>>>>> Does anyone know what is causing this and/or how I might be able to
>>>>>> troubleshoot it?
>>>>>> 
>>>>>> I've uploaded truncated output showing when GC happens to
>>>>>> Pastebin:http://pastebin.com/B8WTHW9r
>>>>>> 
>>>>>> Thanks,
>>>>>> Chun
>>>>>> 
>>>> 
>>>> 
>> 
>> 


Re: Pig 0.10.0 slow startup

Posted by Jonathan Coveney <jc...@gmail.com>.
Thanks for putting that together, Chun.

So, it looks like there are ~400 instantiations of the class, and the time
from the first instantiation to the last one is about ~1.5s. Is that on the
order of the slowdown your experiencing?

(note: I'm testing with Pig 11...if your slowdown is much higher than that,
I'll test on Pig 10)

Either way, it seems like the slowdown is directly attributable to UDF
invocations. Have you seen slowdowns much larger than this?

2012/8/8 Chun Yang <cy...@contractor.salesforce.com>

> Hi Jonathan,
>
> Here is a more self-contained example than what I had before:
> http://ews.illinois.edu/~yang43/shared/students.tar.gz
>
> I wrote a trivial GFV class, but the slowdown still exists.
> students-a.pig starts up noticeably slower than students-b.pig .
>
> Thanks,
> Chun
>
> On 8/8/12 12:22 PM, "Jonathan Coveney" <jc...@gmail.com> wrote:
>
> > Thanks for this info. Can you go ahead and paste the whole GFV class?
> >
> > Thanks
> >
> > 2012/8/8 Chun Yang <cy...@contractor.salesforce.com>
> >
> >> Thanks Jonathan,
> >>
> >> I've tried to produce an example script which exhibits the slowdown and
> >> posted it on Pastebin: http://pastebin.com/kTSsDUr3
> >>
> >> The slowdown seems to occur when we are using a lot of UDFs to parse our
> >> input data. Variant A in the script is noticeably slower than variant B
> in
> >> Pig 0.10 while performance is similar in Pig 0.9.1
> >>
> >> I've pasted the exec() function of the GFV function on Pastebin as well:
> >> http://pastebin.com/FVnkQCJ5
> >>
> >> Please let us know if you need more details.
> >>
> >> Thanks,
> >> Chun
> >>
> >> On 8/7/12 10:07 PM, "Jonathan Coveney" <jc...@gmail.com> wrote:
> >>
> >>> Can you guys give a script that has the issue? My tactic would be to
> use
> >>> some sort of profiler (we have access to YourKit for open source Pig
> >>> contribution work) and try and isolate what is triggering GC.
> >>>
> >>> 2012/8/7 Prashant Kommireddi <pr...@gmail.com>
> >>>
> >>>> Hi All,
> >>>>
> >>>> Just wanted to follow-up on Chun's question. Several of our Pig users
> >> have
> >>>> been experiencing slow start-ups with Pig 0.10.0, when the same script
> >> runs
> >>>> fine with 0.9.1. Anyone else facing similar issues?
> >>>>
> >>>> Thanks,
> >>>> Prashant
> >>>>
> >>>> Hi all,
> >>>>
> >>>> I'm trying to move from Pig 0.9.1 to Pig 0.10.0 . When I try to run
> the
> >>>> same
> >>>> script using the two Pig versions, 0.9.1 starts off fast and almost
> >>>> immediately submits the job to the cluster. On the other hand, Pig
> >> 0.10.0
> >>>> takes forever to submit the job. When I use the java option
> >>>> -XX:+PrintGCDetails, I see that for 0.10.0 the GC is being run many
> >> times
> >>>> before and after the job is submitted to the cluster.
> >>>>
> >>>> Does anyone know what is causing this and/or how I might be able to
> >>>> troubleshoot it?
> >>>>
> >>>> I've uploaded truncated output showing when GC happens to
> >>>> Pastebin:http://pastebin.com/B8WTHW9r
> >>>>
> >>>> Thanks,
> >>>> Chun
> >>>>
> >>
> >>
>
>

Re: Pig 0.10.0 slow startup

Posted by Chun Yang <cy...@contractor.salesforce.com>.
Hi Jonathan,

Here is a more self-contained example than what I had before:
http://ews.illinois.edu/~yang43/shared/students.tar.gz

I wrote a trivial GFV class, but the slowdown still exists.
students-a.pig starts up noticeably slower than students-b.pig .

Thanks,
Chun

On 8/8/12 12:22 PM, "Jonathan Coveney" <jc...@gmail.com> wrote:

> Thanks for this info. Can you go ahead and paste the whole GFV class?
> 
> Thanks
> 
> 2012/8/8 Chun Yang <cy...@contractor.salesforce.com>
> 
>> Thanks Jonathan,
>> 
>> I've tried to produce an example script which exhibits the slowdown and
>> posted it on Pastebin: http://pastebin.com/kTSsDUr3
>> 
>> The slowdown seems to occur when we are using a lot of UDFs to parse our
>> input data. Variant A in the script is noticeably slower than variant B in
>> Pig 0.10 while performance is similar in Pig 0.9.1
>> 
>> I've pasted the exec() function of the GFV function on Pastebin as well:
>> http://pastebin.com/FVnkQCJ5
>> 
>> Please let us know if you need more details.
>> 
>> Thanks,
>> Chun
>> 
>> On 8/7/12 10:07 PM, "Jonathan Coveney" <jc...@gmail.com> wrote:
>> 
>>> Can you guys give a script that has the issue? My tactic would be to use
>>> some sort of profiler (we have access to YourKit for open source Pig
>>> contribution work) and try and isolate what is triggering GC.
>>> 
>>> 2012/8/7 Prashant Kommireddi <pr...@gmail.com>
>>> 
>>>> Hi All,
>>>> 
>>>> Just wanted to follow-up on Chun's question. Several of our Pig users
>> have
>>>> been experiencing slow start-ups with Pig 0.10.0, when the same script
>> runs
>>>> fine with 0.9.1. Anyone else facing similar issues?
>>>> 
>>>> Thanks,
>>>> Prashant
>>>> 
>>>> Hi all,
>>>> 
>>>> I'm trying to move from Pig 0.9.1 to Pig 0.10.0 . When I try to run the
>>>> same
>>>> script using the two Pig versions, 0.9.1 starts off fast and almost
>>>> immediately submits the job to the cluster. On the other hand, Pig
>> 0.10.0
>>>> takes forever to submit the job. When I use the java option
>>>> -XX:+PrintGCDetails, I see that for 0.10.0 the GC is being run many
>> times
>>>> before and after the job is submitted to the cluster.
>>>> 
>>>> Does anyone know what is causing this and/or how I might be able to
>>>> troubleshoot it?
>>>> 
>>>> I've uploaded truncated output showing when GC happens to
>>>> Pastebin:http://pastebin.com/B8WTHW9r
>>>> 
>>>> Thanks,
>>>> Chun
>>>> 
>> 
>> 


Re: Pig 0.10.0 slow startup

Posted by Jonathan Coveney <jc...@gmail.com>.
Thanks for this info. Can you go ahead and paste the whole GFV class?

Thanks

2012/8/8 Chun Yang <cy...@contractor.salesforce.com>

> Thanks Jonathan,
>
> I've tried to produce an example script which exhibits the slowdown and
> posted it on Pastebin: http://pastebin.com/kTSsDUr3
>
> The slowdown seems to occur when we are using a lot of UDFs to parse our
> input data. Variant A in the script is noticeably slower than variant B in
> Pig 0.10 while performance is similar in Pig 0.9.1
>
> I've pasted the exec() function of the GFV function on Pastebin as well:
> http://pastebin.com/FVnkQCJ5
>
> Please let us know if you need more details.
>
> Thanks,
> Chun
>
> On 8/7/12 10:07 PM, "Jonathan Coveney" <jc...@gmail.com> wrote:
>
> > Can you guys give a script that has the issue? My tactic would be to use
> > some sort of profiler (we have access to YourKit for open source Pig
> > contribution work) and try and isolate what is triggering GC.
> >
> > 2012/8/7 Prashant Kommireddi <pr...@gmail.com>
> >
> >> Hi All,
> >>
> >> Just wanted to follow-up on Chun's question. Several of our Pig users
> have
> >> been experiencing slow start-ups with Pig 0.10.0, when the same script
> runs
> >> fine with 0.9.1. Anyone else facing similar issues?
> >>
> >> Thanks,
> >> Prashant
> >>
> >> Hi all,
> >>
> >> I'm trying to move from Pig 0.9.1 to Pig 0.10.0 . When I try to run the
> >> same
> >> script using the two Pig versions, 0.9.1 starts off fast and almost
> >> immediately submits the job to the cluster. On the other hand, Pig
> 0.10.0
> >> takes forever to submit the job. When I use the java option
> >> -XX:+PrintGCDetails, I see that for 0.10.0 the GC is being run many
> times
> >> before and after the job is submitted to the cluster.
> >>
> >> Does anyone know what is causing this and/or how I might be able to
> >> troubleshoot it?
> >>
> >> I've uploaded truncated output showing when GC happens to
> >> Pastebin:http://pastebin.com/B8WTHW9r
> >>
> >> Thanks,
> >> Chun
> >>
>
>

Re: Pig 0.10.0 slow startup

Posted by Chun Yang <cy...@contractor.salesforce.com>.
Thanks Jonathan,

I've tried to produce an example script which exhibits the slowdown and
posted it on Pastebin: http://pastebin.com/kTSsDUr3

The slowdown seems to occur when we are using a lot of UDFs to parse our
input data. Variant A in the script is noticeably slower than variant B in
Pig 0.10 while performance is similar in Pig 0.9.1

I've pasted the exec() function of the GFV function on Pastebin as well:
http://pastebin.com/FVnkQCJ5

Please let us know if you need more details.

Thanks,
Chun

On 8/7/12 10:07 PM, "Jonathan Coveney" <jc...@gmail.com> wrote:

> Can you guys give a script that has the issue? My tactic would be to use
> some sort of profiler (we have access to YourKit for open source Pig
> contribution work) and try and isolate what is triggering GC.
> 
> 2012/8/7 Prashant Kommireddi <pr...@gmail.com>
> 
>> Hi All,
>> 
>> Just wanted to follow-up on Chun's question. Several of our Pig users have
>> been experiencing slow start-ups with Pig 0.10.0, when the same script runs
>> fine with 0.9.1. Anyone else facing similar issues?
>> 
>> Thanks,
>> Prashant
>> 
>> Hi all,
>> 
>> I'm trying to move from Pig 0.9.1 to Pig 0.10.0 . When I try to run the
>> same
>> script using the two Pig versions, 0.9.1 starts off fast and almost
>> immediately submits the job to the cluster. On the other hand, Pig 0.10.0
>> takes forever to submit the job. When I use the java option
>> -XX:+PrintGCDetails, I see that for 0.10.0 the GC is being run many times
>> before and after the job is submitted to the cluster.
>> 
>> Does anyone know what is causing this and/or how I might be able to
>> troubleshoot it?
>> 
>> I've uploaded truncated output showing when GC happens to
>> Pastebin:http://pastebin.com/B8WTHW9r
>> 
>> Thanks,
>> Chun
>> 


Re: Pig 0.10.0 slow startup

Posted by Jonathan Coveney <jc...@gmail.com>.
Can you guys give a script that has the issue? My tactic would be to use
some sort of profiler (we have access to YourKit for open source Pig
contribution work) and try and isolate what is triggering GC.

2012/8/7 Prashant Kommireddi <pr...@gmail.com>

> Hi All,
>
> Just wanted to follow-up on Chun's question. Several of our Pig users have
> been experiencing slow start-ups with Pig 0.10.0, when the same script runs
> fine with 0.9.1. Anyone else facing similar issues?
>
> Thanks,
> Prashant
>
> Hi all,
>
> I'm trying to move from Pig 0.9.1 to Pig 0.10.0 . When I try to run the
> same
> script using the two Pig versions, 0.9.1 starts off fast and almost
> immediately submits the job to the cluster. On the other hand, Pig 0.10.0
> takes forever to submit the job. When I use the java option
> -XX:+PrintGCDetails, I see that for 0.10.0 the GC is being run many times
> before and after the job is submitted to the cluster.
>
> Does anyone know what is causing this and/or how I might be able to
> troubleshoot it?
>
> I've uploaded truncated output showing when GC happens to
> Pastebin:http://pastebin.com/B8WTHW9r
>
> Thanks,
> Chun
>