You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Baraa Mohamad <ba...@gmail.com> on 2011/01/24 21:48:05 UTC

dataflow in logical plan

Hello all:

I'm new user of Pig , and I'm very interested in the architecture of Pig.
I have a question about the logical plan

In the logical plan of this example: (in attach)

a = load 'myfile';

b = filter a by $0 > 5;

store b into 'myfilteredfile';



Does all the data in 'myfile' will be sent in it's totality to the Proj(0)
operator and to the Filter Operator ??
More generally what are runing on the arrows in the logical plan ??

what is the best documentation to understand the architecture of Pig not
only how to use it because I'll try to use it in the medical domain but
first I have to understand it
deeply

thank you very much for your help


Baraa MOHAMAD
Doctorante en informatique
ISIMA-LIMOS
Université Blaise Pascal
Clermont-Ferrand
France
Tél:  +33 658900080

Re: dataflow in logical plan

Posted by Baraa Mohamad <ba...@gmail.com>.
WAW very interesting !!

So the expression operators are passed to the relational operators as inputs
to these operators ?
and on the other hand, generally when we want to draw the logical plan (when
Pig create the logical plan) we dont need to consider the expression
operators as nodes
we draw as you said just

Load --. Filter --> Store

We dont have to add other nodes for Proj(0) ,  '>'  , Const(5)

regards

On Mon, Jan 24, 2011 at 10:32 PM, Alan Gates <ga...@yahoo-inc.com> wrote:

> Pig has two levels of operators in its logical (and physical) plans,
> relational and expression.  Projection is an expression operator in Pig, not
> a relational operator (as it is in most databases).  So (ignoring the
> affects of the optimizer for now) all of your data will be sent to the
> filter relational operator.  Your filter will see 1 3 4 etc., not 1 etc.
>  Inside that filter the tuples will be trimmed by the projection operator as
> part of the expression plan for '>'.
>
> Alan.
>
>
> On Jan 24, 2011, at 1:23 PM, Baraa Mohamad wrote:
>
>  Thank you very much for your explination ,
>> Just to verify that I understood correctly
>> For example if myfile contains the following data
>> 1 3 4
>> 3 4 6
>> 7 8 2
>> 4 5 9
>> 9 3 5
>> 6 6 2
>>
>> so all this data will be sent to Proj(0) operator which gives as a results
>> 1
>> 3
>> 7
>> 4
>> 9
>> 6
>>
>> After that all this data in myfile will be sent to the filter operator, so
>> that the filter take tow inputs the myfile data and the result of the
>> proj(0) > 5 which is
>> 7
>> 9
>> 6
>>
>> regards
>>
>>
>> On Mon, Jan 24, 2011 at 10:08 PM, Alan Gates <ga...@yahoo-inc.com> wrote:
>> The logical plan for your script will look like:
>>
>> Load -> Filter -> Store
>>
>> Filter will have an expression plan that looks like Proj($0) > const(5)
>>
>> So yes, all your data will go through the filter operator.  But keep in
>> mind that there is a filter operator in each map task, so all your code will
>> not go through any one instance of the operator (unless myfile is small).
>>  Hope that helps.
>>
>> Unfortunately, there is not any great architecture document on Pig.
>>  Probably the best substitute is a paper we published in VLDB 2009, which
>> you can get here:
>> http://infolab.stanford.edu/~olston/publications/vldb09.pdf.  Since this
>> is almost 2 years old now some of the specific information is out of date
>> but the basic structure is still correct.
>>
>> Alan.
>>
>>
>> On Jan 24, 2011, at 12:48 PM, Baraa Mohamad wrote:
>>
>> Hello all:
>>
>> I'm new user of Pig , and I'm very interested in the architecture of Pig.
>> I have a question about the logical plan
>>
>> In the logical plan of this example: (in attach)
>> a = load 'myfile';
>> b = filter a by $0 > 5;
>> store b into 'myfilteredfile';
>>
>>
>> Does all the data in 'myfile' will be sent in it's totality to the Proj(0)
>> operator and to the Filter Operator ??
>> More generally what are runing on the arrows in the logical plan ??
>>
>> what is the best documentation to understand the architecture of Pig not
>> only how to use it because I'll try to use it in the medical domain but
>> first I have to understand it
>> deeply
>>
>> thank you very much for your help
>>
>>
>> Baraa MOHAMAD
>> Doctorante en informatique
>> ISIMA-LIMOS
>> Université Blaise Pascal
>> Clermont-Ferrand
>> France
>> Tél:  +33 658900080
>>
>>
>>
>

Re: dataflow in logical plan

Posted by Alan Gates <ga...@yahoo-inc.com>.
Pig has two levels of operators in its logical (and physical) plans,  
relational and expression.  Projection is an expression operator in  
Pig, not a relational operator (as it is in most databases).  So  
(ignoring the affects of the optimizer for now) all of your data will  
be sent to the filter relational operator.  Your filter will see 1 3 4  
etc., not 1 etc.  Inside that filter the tuples will be trimmed by the  
projection operator as part of the expression plan for '>'.

Alan.

On Jan 24, 2011, at 1:23 PM, Baraa Mohamad wrote:

> Thank you very much for your explination ,
> Just to verify that I understood correctly
> For example if myfile contains the following data
> 1 3 4
> 3 4 6
> 7 8 2
> 4 5 9
> 9 3 5
> 6 6 2
>
> so all this data will be sent to Proj(0) operator which gives as a  
> results
> 1
> 3
> 7
> 4
> 9
> 6
>
> After that all this data in myfile will be sent to the filter  
> operator, so that the filter take tow inputs the myfile data and the  
> result of the proj(0) > 5 which is
> 7
> 9
> 6
>
> regards
>
>
> On Mon, Jan 24, 2011 at 10:08 PM, Alan Gates <ga...@yahoo-inc.com>  
> wrote:
> The logical plan for your script will look like:
>
> Load -> Filter -> Store
>
> Filter will have an expression plan that looks like Proj($0) >  
> const(5)
>
> So yes, all your data will go through the filter operator.  But keep  
> in mind that there is a filter operator in each map task, so all  
> your code will not go through any one instance of the operator  
> (unless myfile is small).  Hope that helps.
>
> Unfortunately, there is not any great architecture document on Pig.   
> Probably the best substitute is a paper we published in VLDB 2009,  
> which you can get here:  http://infolab.stanford.edu/~olston/publications/vldb09.pdf 
> .  Since this is almost 2 years old now some of the specific  
> information is out of date but the basic structure is still correct.
>
> Alan.
>
>
> On Jan 24, 2011, at 12:48 PM, Baraa Mohamad wrote:
>
> Hello all:
>
> I'm new user of Pig , and I'm very interested in the architecture of  
> Pig.
> I have a question about the logical plan
>
> In the logical plan of this example: (in attach)
> a = load 'myfile';
> b = filter a by $0 > 5;
> store b into 'myfilteredfile';
>
>
> Does all the data in 'myfile' will be sent in it's totality to the  
> Proj(0) operator and to the Filter Operator ??
> More generally what are runing on the arrows in the logical plan ??
>
> what is the best documentation to understand the architecture of Pig  
> not only how to use it because I'll try to use it in the medical  
> domain but first I have to understand it
> deeply
>
> thank you very much for your help
>
>
> Baraa MOHAMAD
> Doctorante en informatique
> ISIMA-LIMOS
> Université Blaise Pascal
> Clermont-Ferrand
> France
> Tél:  +33 658900080
>
>


Re: dataflow in logical plan

Posted by Baraa Mohamad <ba...@gmail.com>.
Thank you very much for your explination ,
Just to verify that I understood correctly
For example if myfile contains the following data
1 3 4
3 4 6
7 8 2
4 5 9
9 3 5
6 6 2

so all this data will be sent to Proj(0) operator which gives as a results
1
3
7
4
9
6

After that all this data in myfile will be sent to the filter operator, so
that the filter take tow inputs the myfile data and the result of the
proj(0) > 5 which is
7
9
6

regards


On Mon, Jan 24, 2011 at 10:08 PM, Alan Gates <ga...@yahoo-inc.com> wrote:

> The logical plan for your script will look like:
>
> Load -> Filter -> Store
>
> Filter will have an expression plan that looks like Proj($0) > const(5)
>
> So yes, all your data will go through the filter operator.  But keep in
> mind that there is a filter operator in each map task, so all your code will
> not go through any one instance of the operator (unless myfile is small).
>  Hope that helps.
>
> Unfortunately, there is not any great architecture document on Pig.
>  Probably the best substitute is a paper we published in VLDB 2009, which
> you can get here:
> http://infolab.stanford.edu/~olston/publications/vldb09.pdf.  Since this
> is almost 2 years old now some of the specific information is out of date
> but the basic structure is still correct.
>
> Alan.
>
>
> On Jan 24, 2011, at 12:48 PM, Baraa Mohamad wrote:
>
>  Hello all:
>>
>> I'm new user of Pig , and I'm very interested in the architecture of Pig.
>> I have a question about the logical plan
>>
>> In the logical plan of this example: (in attach)
>> a = load 'myfile';
>> b = filter a by $0 > 5;
>> store b into 'myfilteredfile';
>>
>>
>> Does all the data in 'myfile' will be sent in it's totality to the Proj(0)
>> operator and to the Filter Operator ??
>> More generally what are runing on the arrows in the logical plan ??
>>
>> what is the best documentation to understand the architecture of Pig not
>> only how to use it because I'll try to use it in the medical domain but
>> first I have to understand it
>> deeply
>>
>> thank you very much for your help
>>
>>
>> Baraa MOHAMAD
>> Doctorante en informatique
>> ISIMA-LIMOS
>> Université Blaise Pascal
>> Clermont-Ferrand
>> France
>> Tél:  +33 658900080
>>
>
>

Re: dataflow in logical plan

Posted by Alan Gates <ga...@yahoo-inc.com>.
The logical plan for your script will look like:

Load -> Filter -> Store

Filter will have an expression plan that looks like Proj($0) > const(5)

So yes, all your data will go through the filter operator.  But keep  
in mind that there is a filter operator in each map task, so all your  
code will not go through any one instance of the operator (unless  
myfile is small).  Hope that helps.

Unfortunately, there is not any great architecture document on Pig.   
Probably the best substitute is a paper we published in VLDB 2009,  
which you can get here:  http://infolab.stanford.edu/~olston/publications/vldb09.pdf 
.  Since this is almost 2 years old now some of the specific  
information is out of date but the basic structure is still correct.

Alan.

On Jan 24, 2011, at 12:48 PM, Baraa Mohamad wrote:

> Hello all:
>
> I'm new user of Pig , and I'm very interested in the architecture of  
> Pig.
> I have a question about the logical plan
>
> In the logical plan of this example: (in attach)
> a = load 'myfile';
> b = filter a by $0 > 5;
> store b into 'myfilteredfile';
>
>
> Does all the data in 'myfile' will be sent in it's totality to the  
> Proj(0) operator and to the Filter Operator ??
> More generally what are runing on the arrows in the logical plan ??
>
> what is the best documentation to understand the architecture of Pig  
> not only how to use it because I'll try to use it in the medical  
> domain but first I have to understand it
> deeply
>
> thank you very much for your help
>
>
> Baraa MOHAMAD
> Doctorante en informatique
> ISIMA-LIMOS
> Université Blaise Pascal
> Clermont-Ferrand
> France
> Tél:  +33 658900080