You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by zhang jianfeng <zj...@gmail.com> on 2009/07/09 03:24:12 UTC
Is there any document about the JobControlCompiler
Hi all,
I found that the following script will be converted into 3 mapreduce jobs:
A = *LOAD* '/user/zjffdu/input.txt' *USING* PigStorage();
B = *GROUP* A *BY* $0;
B = *FOREACH* B *GENERATE* *group*,COUNT($1);
B = *ORDER* B *BY* $1;
*DUMP* B;
I am very interested to know How Pig compile the script to jobs, reading the
source code is a way, but If there’s any document, that would be better.
Does anyone know where can I find the related documents ? Or is there any
JIRA item related to this ?
Thank you in advance.
Jeff Zhang.
Re: Is there any document about the JobControlCompiler
Posted by zhang jianfeng <zj...@gmail.com>.
Dmitriy ,
Thank you for your help.
On Thu, Jul 9, 2009 at 9:34 AM, Dmitriy Ryaboy <dv...@cloudera.com>wrote:
> Jeff,
> Chris Olston answered this a while back:
>
> http://markmail.org/thread/xnwutstlftnyycxs
>
> (by the way, MarkMail is awesome for searching mailing list archives.
> Highly
> recommended.)
>
> There are some changes that have to do with sampling and multi-store, but
> that email will give you the general idea.
>
> Also, remember you can always get the MR plan by running "describe" on a
> relation.
>
> Hope this helps
> -Dmitriy
>
>
> On Wed, Jul 8, 2009 at 6:24 PM, zhang jianfeng <zj...@gmail.com> wrote:
>
> > Hi all,
> >
> >
> > I found that the following script will be converted into 3 mapreduce
> jobs:
> >
> > A = *LOAD* '/user/zjffdu/input.txt' *USING* PigStorage();
> >
> > B = *GROUP* A *BY* $0;
> >
> > B = *FOREACH* B *GENERATE* *group*,COUNT($1);
> >
> > B = *ORDER* B *BY* $1;
> >
> > *DUMP* B;
> >
> > I am very interested to know How Pig compile the script to jobs, reading
> > the
> > source code is a way, but If there’s any document, that would be better.
> > Does anyone know where can I find the related documents ? Or is there any
> > JIRA item related to this ?
> >
> > Thank you in advance.
> >
> >
> >
> > Jeff Zhang.
> >
>
Re: Is there any document about the JobControlCompiler
Posted by Dmitriy Ryaboy <dv...@cloudera.com>.
Jeff,
Chris Olston answered this a while back:
http://markmail.org/thread/xnwutstlftnyycxs
(by the way, MarkMail is awesome for searching mailing list archives. Highly
recommended.)
There are some changes that have to do with sampling and multi-store, but
that email will give you the general idea.
Also, remember you can always get the MR plan by running "describe" on a
relation.
Hope this helps
-Dmitriy
On Wed, Jul 8, 2009 at 6:24 PM, zhang jianfeng <zj...@gmail.com> wrote:
> Hi all,
>
>
> I found that the following script will be converted into 3 mapreduce jobs:
>
> A = *LOAD* '/user/zjffdu/input.txt' *USING* PigStorage();
>
> B = *GROUP* A *BY* $0;
>
> B = *FOREACH* B *GENERATE* *group*,COUNT($1);
>
> B = *ORDER* B *BY* $1;
>
> *DUMP* B;
>
> I am very interested to know How Pig compile the script to jobs, reading
> the
> source code is a way, but If there’s any document, that would be better.
> Does anyone know where can I find the related documents ? Or is there any
> JIRA item related to this ?
>
> Thank you in advance.
>
>
>
> Jeff Zhang.
>
Re: Is there any document about the JobControlCompiler
Posted by Dmitriy Ryaboy <dv...@cloudera.com>.
Jeff,
Chris Olston answered this a while back:
http://markmail.org/thread/xnwutstlftnyycxs
(by the way, MarkMail is awesome for searching mailing list archives. Highly
recommended.)
There are some changes that have to do with sampling and multi-store, but
that email will give you the general idea.
Also, remember you can always get the MR plan by running "describe" on a
relation.
Hope this helps
-Dmitriy
On Wed, Jul 8, 2009 at 6:24 PM, zhang jianfeng <zj...@gmail.com> wrote:
> Hi all,
>
>
> I found that the following script will be converted into 3 mapreduce jobs:
>
> A = *LOAD* '/user/zjffdu/input.txt' *USING* PigStorage();
>
> B = *GROUP* A *BY* $0;
>
> B = *FOREACH* B *GENERATE* *group*,COUNT($1);
>
> B = *ORDER* B *BY* $1;
>
> *DUMP* B;
>
> I am very interested to know How Pig compile the script to jobs, reading
> the
> source code is a way, but If there’s any document, that would be better.
> Does anyone know where can I find the related documents ? Or is there any
> JIRA item related to this ?
>
> Thank you in advance.
>
>
>
> Jeff Zhang.
>