You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Stefan Groschupf <sg...@101tec.com> on 2008/02/29 02:59:24 UTC
pigScriptParser
Hi,
I try to better understand the pig script parser. So two questions:
in <DEFAULT> MORE : what is <"split"> all about?
What stands <(~[])> in <PIG_START> MORE : and all the others for?
Also in case of a A = LOAD,
I understand that A would be the matched in the default state but in
which state or token would LOAD be matched?
Thanks for any help.
Stefan
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
101tec Inc.
Menlo Park, California, USA
http://www.101tec.com
Re: pigScriptParser
Posted by pi song <pi...@gmail.com>.
You started off with a good philosophy that I also believe in, Ben. And I
totally agree that Pig should be embedded in a "high level enough",
turing-complete language.
Once you have the plan setup (might be a long way to go) , don't forget to
let us know.
On 3/5/08, pi song <pi...@gmail.com> wrote:
>
> You started off with a good philosophy that I also believe in, Ben. And I
> totally agree that Pig should be embedded in a "high level enough",
> turing-complete language.
>
> Once you have the plan setup (might be a long way to go) , don't forget to
> let us know.
> On 3/5/08, Benjamin Reed <br...@yahoo-inc.com> wrote:
> >
> > I guess there is one other tenet to the Pig philosophy that should be
> > added:
> > the world doesn't need another crappy programming language.
> >
> > I have always thought that Pig is meant to be embedded into another
> > language.
> > Grunt was really just a shell I used for testing Pig. It isn't even
> > worthy of
> > the name Oink. I would be very disappointed if Grunt became a turing
> > complete
> > language. I'm also a bit concerned that people have started to conclude
> > that
> > Grunt == Pig Latin.
> >
> > Java is the only language for which we currently have a binding, but
> > that
> > binding is very raw and not very pleasant to use. For a first cut I'm
> > hoping
> > to get a really nice embedding of Pig Latin in Python: YthonPay.
> >
> > My dream would be to make it so that you could use Pig in Python much
> > like TCL
> > in Expect. Basically you could write:
> >
> > #!/bin/ythonpay
> > A = load 'input' using MyParser();
> > B = group A by $0;
> > C = foreach B generate Foo($1);
> >
> > but then you realize that you actually want to iterate until Foo()
> > converges,
> > meaning there is only one group left in B. So, you would write:
> >
> > #!/bin/ythonpay
> > A = load 'input' using MyParser();
> > B = group A by $0;
> > C = foreach B generate Foo($1);
> > while B.cardinality() > 1:
> > B = group C by $0
> > C = foreach B generate Foo($1);
> >
> > That is the dream at least. It would be also cool to get UbyRay and
> > ErlPay.
> > (Perhaps even IcleTay). For Yahoo ErlPay or YthonPay would have the
> > biggest
> > impact, but I'm too young to like Perl, so my money is on YthonPay, but
> > my
> > hope is that more than one embedding gets implemented.
> >
> > ben
> >
> > On Monday 03 March 2008 19:54:54 pi song wrote:
> > > Olga,
> > >
> > > This is off the topic but I'm really interested in the last bit
> > "Python
> > > shell integration". How do you see Python fit in Pig?
> > >
> > > Cheers,
> > > Pi
> > >
> > > On 3/4/08, Olga Natkovich <ol...@yahoo-inc.com> wrote:
> > > > Stefan,
> > > >
> > > > The main reason is that, why we would always be parsing pig
> > statement
> > > > within Pig, for shell we might choose a completely different
> > environment
> > > > like integration into the Python shell so we probably don't want to
> > put
> > > > two together.
> > > >
> > > > Olga
> > > >
> > > > > -----Original Message-----
> > > > > From: Stefan Groschupf [mailto:sg@101tec.com]
> > > > > Sent: Friday, February 29, 2008 6:15 PM
> > > > > To: pig-dev@incubator.apache.org
> > > > > Subject: Re: pigScriptParser
> > > > >
> > > > > Olga,
> > > > > thanks for the clarification.
> > > > >
> > > > > > We have a 2 level parser:
> > > > > >
> > > > > > Grunt parser handles all commands other than Pig commands
> > > > >
> > > > > and passes
> > > > >
> > > > > > Pig commands to the pig parser. To do so, it needs to parse the
> > pig
> > > > > > command enough to figure out that it needs to go to pig parser.
> > > > >
> > > > > Why does pig has two parsers? Even if I use embedded pig the
> > > > > pig latin is the same as in grunt, isn't it?
> > > > > Isn't that more overhead of maintain two javacc files?
> > > > >
> > > > > Stefan
> >
> >
> >
>
Re: pigScriptParser
Posted by Benjamin Reed <br...@yahoo-inc.com>.
I guess there is one other tenet to the Pig philosophy that should be added:
the world doesn't need another crappy programming language.
I have always thought that Pig is meant to be embedded into another language.
Grunt was really just a shell I used for testing Pig. It isn't even worthy of
the name Oink. I would be very disappointed if Grunt became a turing complete
language. I'm also a bit concerned that people have started to conclude that
Grunt == Pig Latin.
Java is the only language for which we currently have a binding, but that
binding is very raw and not very pleasant to use. For a first cut I'm hoping
to get a really nice embedding of Pig Latin in Python: YthonPay.
My dream would be to make it so that you could use Pig in Python much like TCL
in Expect. Basically you could write:
#!/bin/ythonpay
A = load 'input' using MyParser();
B = group A by $0;
C = foreach B generate Foo($1);
but then you realize that you actually want to iterate until Foo() converges,
meaning there is only one group left in B. So, you would write:
#!/bin/ythonpay
A = load 'input' using MyParser();
B = group A by $0;
C = foreach B generate Foo($1);
while B.cardinality() > 1:
B = group C by $0
C = foreach B generate Foo($1);
That is the dream at least. It would be also cool to get UbyRay and ErlPay.
(Perhaps even IcleTay). For Yahoo ErlPay or YthonPay would have the biggest
impact, but I'm too young to like Perl, so my money is on YthonPay, but my
hope is that more than one embedding gets implemented.
ben
On Monday 03 March 2008 19:54:54 pi song wrote:
> Olga,
>
> This is off the topic but I'm really interested in the last bit "Python
> shell integration". How do you see Python fit in Pig?
>
> Cheers,
> Pi
>
> On 3/4/08, Olga Natkovich <ol...@yahoo-inc.com> wrote:
> > Stefan,
> >
> > The main reason is that, why we would always be parsing pig statement
> > within Pig, for shell we might choose a completely different environment
> > like integration into the Python shell so we probably don't want to put
> > two together.
> >
> > Olga
> >
> > > -----Original Message-----
> > > From: Stefan Groschupf [mailto:sg@101tec.com]
> > > Sent: Friday, February 29, 2008 6:15 PM
> > > To: pig-dev@incubator.apache.org
> > > Subject: Re: pigScriptParser
> > >
> > > Olga,
> > > thanks for the clarification.
> > >
> > > > We have a 2 level parser:
> > > >
> > > > Grunt parser handles all commands other than Pig commands
> > >
> > > and passes
> > >
> > > > Pig commands to the pig parser. To do so, it needs to parse the pig
> > > > command enough to figure out that it needs to go to pig parser.
> > >
> > > Why does pig has two parsers? Even if I use embedded pig the
> > > pig latin is the same as in grunt, isn't it?
> > > Isn't that more overhead of maintain two javacc files?
> > >
> > > Stefan
Re: pigScriptParser
Posted by pi song <pi...@gmail.com>.
Olga,
This is off the topic but I'm really interested in the last bit "Python
shell integration". How do you see Python fit in Pig?
Cheers,
Pi
On 3/4/08, Olga Natkovich <ol...@yahoo-inc.com> wrote:
>
> Stefan,
>
> The main reason is that, why we would always be parsing pig statement
> within Pig, for shell we might choose a completely different environment
> like integration into the Python shell so we probably don't want to put
> two together.
>
> Olga
>
> > -----Original Message-----
> > From: Stefan Groschupf [mailto:sg@101tec.com]
> > Sent: Friday, February 29, 2008 6:15 PM
> > To: pig-dev@incubator.apache.org
> > Subject: Re: pigScriptParser
> >
> > Olga,
> > thanks for the clarification.
> > > We have a 2 level parser:
> > >
> > > Grunt parser handles all commands other than Pig commands
> > and passes
> > > Pig commands to the pig parser. To do so, it needs to parse the pig
> > > command enough to figure out that it needs to go to pig parser.
> >
> > Why does pig has two parsers? Even if I use embedded pig the
> > pig latin is the same as in grunt, isn't it?
> > Isn't that more overhead of maintain two javacc files?
> >
> > Stefan
> >
> >
>
RE: pigScriptParser
Posted by Olga Natkovich <ol...@yahoo-inc.com>.
Stefan,
The main reason is that, why we would always be parsing pig statement
within Pig, for shell we might choose a completely different environment
like integration into the Python shell so we probably don't want to put
two together.
Olga
> -----Original Message-----
> From: Stefan Groschupf [mailto:sg@101tec.com]
> Sent: Friday, February 29, 2008 6:15 PM
> To: pig-dev@incubator.apache.org
> Subject: Re: pigScriptParser
>
> Olga,
> thanks for the clarification.
> > We have a 2 level parser:
> >
> > Grunt parser handles all commands other than Pig commands
> and passes
> > Pig commands to the pig parser. To do so, it needs to parse the pig
> > command enough to figure out that it needs to go to pig parser.
>
> Why does pig has two parsers? Even if I use embedded pig the
> pig latin is the same as in grunt, isn't it?
> Isn't that more overhead of maintain two javacc files?
>
> Stefan
>
>
Re: pigScriptParser
Posted by Stefan Groschupf <sg...@101tec.com>.
Olga,
thanks for the clarification.
> We have a 2 level parser:
>
> Grunt parser handles all commands other than Pig commands and passes
> Pig
> commands to the pig parser. To do so, it needs to parse the pig
> command
> enough to figure out that it needs to go to pig parser.
Why does pig has two parsers? Even if I use embedded pig the pig latin
is the same as in grunt, isn't it?
Isn't that more overhead of maintain two javacc files?
Stefan
RE: pigScriptParser
Posted by Olga Natkovich <ol...@yahoo-inc.com>.
> -----Original Message-----
> From: Stefan Groschupf [mailto:sg@101tec.com]
> Sent: Thursday, February 28, 2008 5:59 PM
> To: pig-dev@incubator.apache.org
> Subject: pigScriptParser
>
> Hi,
> I try to better understand the pig script parser. So two questions:
> in <DEFAULT> MORE : what is <"split"> all about?
Pig has a split command which is the only one that does not follow the
pattern <alias> = <stuff> because it actually splits the stream and
causes several aliases to be produced. Here is the example from PigLatin
page:
SPLIT A INTO X IF $0 < 7, Y IF ($0 > 2 AND $0<> 7);
> What stands <(~[])> in <PIG_START> MORE : and all the others for?
This is how you specify *any character* in javacc.
>
> Also in case of a A = LOAD,
> I understand that A would be the matched in the default state
> but in which state or token would LOAD be matched?
We have a 2 level parser:
Grunt parser handles all commands other than Pig commands and passes Pig
commands to the pig parser. To do so, it needs to parse the pig command
enough to figure out that it needs to go to pig parser.
Olga