You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Stefan Groschupf <sg...@101tec.com> on 2008/02/29 02:59:24 UTC

pigScriptParser

Hi,
I try to better understand the pig script parser. So two questions:
in <DEFAULT> MORE : what is <"split"> all about?
What stands   <(~[])> in <PIG_START> MORE : and all the others for?

Also in case of a A = LOAD,
I understand that A would be the matched in the default state but in  
which state or token would LOAD be matched?
Thanks for any help.
Stefan


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
101tec Inc.
Menlo Park, California, USA
http://www.101tec.com



Re: pigScriptParser

Posted by pi song <pi...@gmail.com>.
You started off with a good philosophy that I also believe in, Ben. And I
totally agree that Pig should be embedded in a "high level enough",
turing-complete language.

Once you have the plan setup (might be a long way to go) , don't forget to
let us know.


On 3/5/08, pi song <pi...@gmail.com> wrote:
>
> You started off with a good philosophy that I also believe in, Ben. And I
> totally agree that Pig should be embedded in a "high level enough",
> turing-complete language.
>
> Once you have the plan setup (might be a long way to go) , don't forget to
> let us know.
>  On 3/5/08, Benjamin Reed <br...@yahoo-inc.com> wrote:
> >
> > I guess there is one other tenet to the Pig philosophy that should be
> > added:
> > the world doesn't need another crappy programming language.
> >
> > I have always thought that Pig is meant to be embedded into another
> > language.
> > Grunt was really just a shell I used for testing Pig. It isn't even
> > worthy of
> > the name Oink. I would be very disappointed if Grunt became a turing
> > complete
> > language. I'm also a bit concerned that people have started to conclude
> > that
> > Grunt == Pig Latin.
> >
> > Java is the only language for which we currently have a binding, but
> > that
> > binding is very raw and not very pleasant to use. For a first cut I'm
> > hoping
> > to get a really nice embedding of Pig Latin in Python: YthonPay.
> >
> > My dream would be to make it so that you could use Pig in Python much
> > like TCL
> > in Expect. Basically you could write:
> >
> > #!/bin/ythonpay
> > A = load 'input' using MyParser();
> > B = group A by $0;
> > C = foreach B generate Foo($1);
> >
> > but then you realize that you actually want to iterate until Foo()
> > converges,
> > meaning there is only one group left in B. So, you would write:
> >
> > #!/bin/ythonpay
> > A = load 'input' using MyParser();
> > B = group A by $0;
> > C = foreach B generate Foo($1);
> > while B.cardinality() > 1:
> >        B = group C by $0
> >        C = foreach B generate Foo($1);
> >
> > That is the dream at least. It would be also cool to get UbyRay and
> > ErlPay.
> > (Perhaps even IcleTay). For Yahoo ErlPay or YthonPay would have the
> > biggest
> > impact, but I'm too young to like Perl, so my money is on YthonPay, but
> > my
> > hope is that more than one embedding gets implemented.
> >
> > ben
> >
> > On Monday 03 March 2008 19:54:54 pi song wrote:
> > > Olga,
> > >
> > > This is off the topic but I'm really interested in the last bit
> > "Python
> > > shell integration". How do you see Python fit in Pig?
> > >
> > > Cheers,
> > > Pi
> > >
> > > On 3/4/08, Olga Natkovich <ol...@yahoo-inc.com> wrote:
> > > > Stefan,
> > > >
> > > > The main reason is that, why we would always be parsing pig
> > statement
> > > > within Pig, for shell we might choose a completely different
> > environment
> > > > like integration into the Python shell so we probably don't want to
> > put
> > > > two together.
> > > >
> > > > Olga
> > > >
> > > > > -----Original Message-----
> > > > > From: Stefan Groschupf [mailto:sg@101tec.com]
> > > > > Sent: Friday, February 29, 2008 6:15 PM
> > > > > To: pig-dev@incubator.apache.org
> > > > > Subject: Re: pigScriptParser
> > > > >
> > > > > Olga,
> > > > > thanks for the clarification.
> > > > >
> > > > > > We have a 2 level parser:
> > > > > >
> > > > > > Grunt parser handles all commands other than Pig commands
> > > > >
> > > > > and passes
> > > > >
> > > > > > Pig commands to the pig parser. To do so, it needs to parse the
> > pig
> > > > > > command enough to figure out that it needs to go to pig parser.
> > > > >
> > > > > Why does pig has two parsers? Even if I use embedded pig the
> > > > > pig latin is the same as in grunt, isn't it?
> > > > > Isn't that more overhead of maintain two javacc files?
> > > > >
> > > > > Stefan
> >
> >
> >
>

Re: pigScriptParser

Posted by Benjamin Reed <br...@yahoo-inc.com>.
I guess there is one other tenet to the Pig philosophy that should be added: 
the world doesn't need another crappy programming language.

I have always thought that Pig is meant to be embedded into another language. 
Grunt was really just a shell I used for testing Pig. It isn't even worthy of 
the name Oink. I would be very disappointed if Grunt became a turing complete 
language. I'm also a bit concerned that people have started to conclude that 
Grunt == Pig Latin.

Java is the only language for which we currently have a binding, but that 
binding is very raw and not very pleasant to use. For a first cut I'm hoping 
to get a really nice embedding of Pig Latin in Python: YthonPay.

My dream would be to make it so that you could use Pig in Python much like TCL 
in Expect. Basically you could write:

#!/bin/ythonpay
A = load 'input' using MyParser();
B = group A by $0;
C = foreach B generate Foo($1);

but then you realize that you actually want to iterate until Foo() converges, 
meaning there is only one group left in B. So, you would write:

#!/bin/ythonpay
A = load 'input' using MyParser();
B = group A by $0;
C = foreach B generate Foo($1);
while B.cardinality() > 1:
	B = group C by $0
	C = foreach B generate Foo($1);
	
That is the dream at least. It would be also cool to get UbyRay and ErlPay. 
(Perhaps even IcleTay). For Yahoo ErlPay or YthonPay would have the biggest 
impact, but I'm too young to like Perl, so my money is on YthonPay, but my 
hope is that more than one embedding gets implemented.

ben

On Monday 03 March 2008 19:54:54 pi song wrote:
> Olga,
>
> This is off the topic but I'm really interested in the last bit "Python
> shell integration". How do you see Python fit in Pig?
>
> Cheers,
> Pi
>
> On 3/4/08, Olga Natkovich <ol...@yahoo-inc.com> wrote:
> > Stefan,
> >
> > The main reason is that, why we would always be parsing pig statement
> > within Pig, for shell we might choose a completely different environment
> > like integration into the Python shell so we probably don't want to put
> > two together.
> >
> > Olga
> >
> > > -----Original Message-----
> > > From: Stefan Groschupf [mailto:sg@101tec.com]
> > > Sent: Friday, February 29, 2008 6:15 PM
> > > To: pig-dev@incubator.apache.org
> > > Subject: Re: pigScriptParser
> > >
> > > Olga,
> > > thanks for the clarification.
> > >
> > > > We have a 2 level parser:
> > > >
> > > > Grunt parser handles all commands other than Pig commands
> > >
> > > and passes
> > >
> > > > Pig commands to the pig parser. To do so, it needs to parse the pig
> > > > command enough to figure out that it needs to go to pig parser.
> > >
> > > Why does pig has two parsers? Even if I use embedded pig the
> > > pig latin is the same as in grunt, isn't it?
> > > Isn't that more overhead of maintain two javacc files?
> > >
> > > Stefan



Re: pigScriptParser

Posted by pi song <pi...@gmail.com>.
Olga,

This is off the topic but I'm really interested in the last bit "Python
shell integration". How do you see Python fit in Pig?

Cheers,
Pi

On 3/4/08, Olga Natkovich <ol...@yahoo-inc.com> wrote:
>
> Stefan,
>
> The main reason is that, why we would always be parsing pig statement
> within Pig, for shell we might choose a completely different environment
> like integration into the Python shell so we probably don't want to put
> two together.
>
> Olga
>
> > -----Original Message-----
> > From: Stefan Groschupf [mailto:sg@101tec.com]
> > Sent: Friday, February 29, 2008 6:15 PM
> > To: pig-dev@incubator.apache.org
> > Subject: Re: pigScriptParser
> >
> > Olga,
> > thanks for the clarification.
> > > We have a 2 level parser:
> > >
> > > Grunt parser handles all commands other than Pig commands
> > and passes
> > > Pig commands to the pig parser. To do so, it needs to parse the pig
> > > command enough to figure out that it needs to go to pig parser.
> >
> > Why does pig has two parsers? Even if I use embedded pig the
> > pig latin is the same as in grunt, isn't it?
> > Isn't that more overhead of maintain two javacc files?
> >
> > Stefan
> >
> >
>

RE: pigScriptParser

Posted by Olga Natkovich <ol...@yahoo-inc.com>.
Stefan,

The main reason is that, why we would always be parsing pig statement
within Pig, for shell we might choose a completely different environment
like integration into the Python shell so we probably don't want to put
two together.

Olga 

> -----Original Message-----
> From: Stefan Groschupf [mailto:sg@101tec.com] 
> Sent: Friday, February 29, 2008 6:15 PM
> To: pig-dev@incubator.apache.org
> Subject: Re: pigScriptParser
> 
> Olga,
> thanks for the clarification.
> > We have a 2 level parser:
> >
> > Grunt parser handles all commands other than Pig commands 
> and passes 
> > Pig commands to the pig parser. To do so, it needs to parse the pig 
> > command enough to figure out that it needs to go to pig parser.
> 
> Why does pig has two parsers? Even if I use embedded pig the 
> pig latin is the same as in grunt, isn't it?
> Isn't that more overhead of maintain two javacc files?
> 
> Stefan
> 
> 

Re: pigScriptParser

Posted by Stefan Groschupf <sg...@101tec.com>.
Olga,
thanks for the clarification.
> We have a 2 level parser:
>
> Grunt parser handles all commands other than Pig commands and passes  
> Pig
> commands to the pig parser. To do so, it needs to parse the pig  
> command
> enough to figure out that it needs to go to pig parser.

Why does pig has two parsers? Even if I use embedded pig the pig latin  
is the same as in grunt, isn't it?
Isn't that more overhead of maintain two javacc files?

Stefan


RE: pigScriptParser

Posted by Olga Natkovich <ol...@yahoo-inc.com>.
 

> -----Original Message-----
> From: Stefan Groschupf [mailto:sg@101tec.com] 
> Sent: Thursday, February 28, 2008 5:59 PM
> To: pig-dev@incubator.apache.org
> Subject: pigScriptParser
> 
> Hi,
> I try to better understand the pig script parser. So two questions:
> in <DEFAULT> MORE : what is <"split"> all about?

Pig has a split command which is the only one that does not follow the
pattern <alias> = <stuff> because it actually splits the stream and
causes several aliases to be produced. Here is the example from PigLatin
page:

SPLIT A INTO X IF $0 < 7, Y IF ($0 > 2 AND $0<> 7);

> What stands   <(~[])> in <PIG_START> MORE : and all the others for?

This is how you specify *any character* in javacc.

> 
> Also in case of a A = LOAD,
> I understand that A would be the matched in the default state 
> but in which state or token would LOAD be matched?

We have a 2 level parser:

Grunt parser handles all commands other than Pig commands and passes Pig
commands to the pig parser. To do so, it needs to parse the pig command
enough to figure out that it needs to go to pig parser.

Olga