You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Kevin Burton <bu...@spinn3r.com> on 2014/05/20 21:02:38 UTC

Is pig maddening to work with because it's so slow?

I've noticed that while working with pig my stress level and frustration
with the system is higher than other systems I've worked with.

I think it's because the iteration cycle is longer.

Even pig -x local takes a while to execute.

Is this just me?

If you're trying to learn and debug python lists, dictionaries, etc.  It's
almost instant response time.

But with pig literally everything takes 30-60 seconds to play with.

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
Skype: *burtonator*
blog: http://burtonator.wordpress.com
… or check out my Google+
profile<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>
War is peace. Freedom is slavery. Ignorance is strength. Corporations are
people.

Re: Is pig maddening to work with because it's so slow?

Posted by "Dan DeCapria, CivicScience" <da...@civicscience.com>.
Seconded for PigUnit.

As for a faster debugging procedure, I've gone modular. First I JUnit test
individual UDFs against their functional requirements and use cases a
priori.  Then I mockup my whiteboard workflow as multiple pig script
logical blocks (multiple pig files to test), start a pig -x local, and try
each aliased line one-by-one per each logical block, with a DESCRIBE after
each.  This ensures that I have correct syntactical formulation in the
scripting, schemas, desired re-aliasing, etc., and you can merge logical
blocks back together for optimizations when blocks are completed.

Once a block is completed, you can do an ILLUSTRATE on each block to
spot-check results as well, but be forewarned, I've had issues with larger
scripts failing prematurely in this regard due to complexity.

Hope this helps,

-Dan


On Tue, May 20, 2014 at 3:26 PM, Suraj Nayak <sn...@gmail.com> wrote:

> Also,  Pig is data flow language where the statements gets converted to
> java and then run. In case of python, its native. Thus runs faster.
> On 21-May-2014 12:52 AM, "Suraj Nayak" <sn...@gmail.com> wrote:
>
> > Why not consider PigUnit? PigUnit gives flexibility to test locally. Also
> > debugging is pretty simple, almost similar to JUnit.
> >
> > --
> > Suraj
> > On 21-May-2014 12:47 AM, "Paul Houle" <on...@gmail.com> wrote:
> >
> >> Slow iteration is a problem with Pig.
> >>
> >> I still write MR jobs mainly in Java because (1) I control the
> >> execution plan,  (2) can do things nearly zero-copy,  and (3) I can
> >> get a quick iteration cycle by using JUnit to test mappers,  reducers,
> >>  and other components.
> >>
> >> On Tue, May 20, 2014 at 3:02 PM, Kevin Burton <bu...@spinn3r.com>
> wrote:
> >> > I've noticed that while working with pig my stress level and
> frustration
> >> > with the system is higher than other systems I've worked with.
> >> >
> >> > I think it's because the iteration cycle is longer.
> >> >
> >> > Even pig -x local takes a while to execute.
> >> >
> >> > Is this just me?
> >> >
> >> > If you're trying to learn and debug python lists, dictionaries, etc.
> >>  It's
> >> > almost instant response time.
> >> >
> >> > But with pig literally everything takes 30-60 seconds to play with.
> >> >
> >> > --
> >> >
> >> > Founder/CEO Spinn3r.com
> >> > Location: *San Francisco, CA*
> >> > Skype: *burtonator*
> >> > blog: http://burtonator.wordpress.com
> >> > … or check out my Google+
> >> > profile<https://plus.google.com/102718274791889610666/posts>
> >> > <http://spinn3r.com>
> >> > War is peace. Freedom is slavery. Ignorance is strength. Corporations
> >> are
> >> > people.
> >>
> >>
> >>
> >> --
> >> Paul Houle
> >> Expert on Freebase, DBpedia, Hadoop and RDF
> >> (607) 539 6254    paul.houle on Skype   ontology2@gmail.com
> >>
> >
>

Re: Is pig maddening to work with because it's so slow?

Posted by Suraj Nayak <sn...@gmail.com>.
Also,  Pig is data flow language where the statements gets converted to
java and then run. In case of python, its native. Thus runs faster.
On 21-May-2014 12:52 AM, "Suraj Nayak" <sn...@gmail.com> wrote:

> Why not consider PigUnit? PigUnit gives flexibility to test locally. Also
> debugging is pretty simple, almost similar to JUnit.
>
> --
> Suraj
> On 21-May-2014 12:47 AM, "Paul Houle" <on...@gmail.com> wrote:
>
>> Slow iteration is a problem with Pig.
>>
>> I still write MR jobs mainly in Java because (1) I control the
>> execution plan,  (2) can do things nearly zero-copy,  and (3) I can
>> get a quick iteration cycle by using JUnit to test mappers,  reducers,
>>  and other components.
>>
>> On Tue, May 20, 2014 at 3:02 PM, Kevin Burton <bu...@spinn3r.com> wrote:
>> > I've noticed that while working with pig my stress level and frustration
>> > with the system is higher than other systems I've worked with.
>> >
>> > I think it's because the iteration cycle is longer.
>> >
>> > Even pig -x local takes a while to execute.
>> >
>> > Is this just me?
>> >
>> > If you're trying to learn and debug python lists, dictionaries, etc.
>>  It's
>> > almost instant response time.
>> >
>> > But with pig literally everything takes 30-60 seconds to play with.
>> >
>> > --
>> >
>> > Founder/CEO Spinn3r.com
>> > Location: *San Francisco, CA*
>> > Skype: *burtonator*
>> > blog: http://burtonator.wordpress.com
>> > … or check out my Google+
>> > profile<https://plus.google.com/102718274791889610666/posts>
>> > <http://spinn3r.com>
>> > War is peace. Freedom is slavery. Ignorance is strength. Corporations
>> are
>> > people.
>>
>>
>>
>> --
>> Paul Houle
>> Expert on Freebase, DBpedia, Hadoop and RDF
>> (607) 539 6254    paul.houle on Skype   ontology2@gmail.com
>>
>

Re: Is pig maddening to work with because it's so slow?

Posted by Suraj Nayak <sn...@gmail.com>.
Why not consider PigUnit? PigUnit gives flexibility to test locally. Also
debugging is pretty simple, almost similar to JUnit.

--
Suraj
On 21-May-2014 12:47 AM, "Paul Houle" <on...@gmail.com> wrote:

> Slow iteration is a problem with Pig.
>
> I still write MR jobs mainly in Java because (1) I control the
> execution plan,  (2) can do things nearly zero-copy,  and (3) I can
> get a quick iteration cycle by using JUnit to test mappers,  reducers,
>  and other components.
>
> On Tue, May 20, 2014 at 3:02 PM, Kevin Burton <bu...@spinn3r.com> wrote:
> > I've noticed that while working with pig my stress level and frustration
> > with the system is higher than other systems I've worked with.
> >
> > I think it's because the iteration cycle is longer.
> >
> > Even pig -x local takes a while to execute.
> >
> > Is this just me?
> >
> > If you're trying to learn and debug python lists, dictionaries, etc.
>  It's
> > almost instant response time.
> >
> > But with pig literally everything takes 30-60 seconds to play with.
> >
> > --
> >
> > Founder/CEO Spinn3r.com
> > Location: *San Francisco, CA*
> > Skype: *burtonator*
> > blog: http://burtonator.wordpress.com
> > … or check out my Google+
> > profile<https://plus.google.com/102718274791889610666/posts>
> > <http://spinn3r.com>
> > War is peace. Freedom is slavery. Ignorance is strength. Corporations are
> > people.
>
>
>
> --
> Paul Houle
> Expert on Freebase, DBpedia, Hadoop and RDF
> (607) 539 6254    paul.houle on Skype   ontology2@gmail.com
>

Re: Is pig maddening to work with because it's so slow?

Posted by Paul Houle <on...@gmail.com>.
Slow iteration is a problem with Pig.

I still write MR jobs mainly in Java because (1) I control the
execution plan,  (2) can do things nearly zero-copy,  and (3) I can
get a quick iteration cycle by using JUnit to test mappers,  reducers,
 and other components.

On Tue, May 20, 2014 at 3:02 PM, Kevin Burton <bu...@spinn3r.com> wrote:
> I've noticed that while working with pig my stress level and frustration
> with the system is higher than other systems I've worked with.
>
> I think it's because the iteration cycle is longer.
>
> Even pig -x local takes a while to execute.
>
> Is this just me?
>
> If you're trying to learn and debug python lists, dictionaries, etc.  It's
> almost instant response time.
>
> But with pig literally everything takes 30-60 seconds to play with.
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> Skype: *burtonator*
> blog: http://burtonator.wordpress.com
> … or check out my Google+
> profile<https://plus.google.com/102718274791889610666/posts>
> <http://spinn3r.com>
> War is peace. Freedom is slavery. Ignorance is strength. Corporations are
> people.



-- 
Paul Houle
Expert on Freebase, DBpedia, Hadoop and RDF
(607) 539 6254    paul.houle on Skype   ontology2@gmail.com