You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@pig.apache.org by Mridul Muralidharan <mr...@yahoo-inc.com> on 2009/02/09 13:03:54 UTC

Implicit project

Hi,

   If I understood it right, if there is a schema specified with load, 
only those fields will be available - that is, there is an implicit 
project after the load ?

To illustrate,
A = load 'myFile' using myLoader() AS (f1:int, f2:int);
B = FILTER A by myCondition(*);
STORE B into 'myOutput';

Then B will not have fields $2 and above ?
If this is the case, is these a way to disable this project (since input 
might have unpredictable number of fields) ?



Is the same applicable to udf output too ?
That is, AS or output schema specifies the 'default' schema which might 
be violated for some input tuples - will the output tuple project away 
those extra fields ? What if udf output has less number of fields ?

How does pig behave in these cases ?

Thanks,
Mridul

Re: Implicit project

Posted by Mridul Muralidharan <mr...@yahoo-inc.com>.

Dmitriy Ryaboy wrote:
> $ cat simpletest
> 1	2	3
> 1	2	3	4
> 
> $./bin/pig -x local
> grunt> A = LOAD 'simpletest';
> grunt> dump A;
> (1,2,3)
> (1,2,3,4)
> grunt> B = LOAD 'simpletest' AS (c1:int, c2:int);
> grunt> dump B;
> (1,2)
> (1,2)


Thanks, this matches my observation.
My requirement is to be able to disable this latter behavior 
(declaratively) without having to not use schema at all.

I am yet to try what happens with udf output though - that is schema for 
udf output says arity of n and the udf returns a tuple with arity n + k.

Regards,
Mridul

> 
> -D
> 
> On Mon, Feb 9, 2009 at 7:03 AM, Mridul Muralidharan
> <mr...@yahoo-inc.com> wrote:
>> Hi,
>>
>>  If I understood it right, if there is a schema specified with load, only
>> those fields will be available - that is, there is an implicit project after
>> the load ?
>>
>> To illustrate,
>> A = load 'myFile' using myLoader() AS (f1:int, f2:int);
>> B = FILTER A by myCondition(*);
>> STORE B into 'myOutput';
>>
>> Then B will not have fields $2 and above ?
>> If this is the case, is these a way to disable this project (since input
>> might have unpredictable number of fields) ?
>>
>>
>>
>> Is the same applicable to udf output too ?
>> That is, AS or output schema specifies the 'default' schema which might be
>> violated for some input tuples - will the output tuple project away those
>> extra fields ? What if udf output has less number of fields ?
>>
>> How does pig behave in these cases ?
>>
>> Thanks,
>> Mridul
>>

Re: Implicit project

Posted by Dmitriy Ryaboy <dv...@gmail.com>.

$ cat simpletest
1	2	3
1	2	3	4

$./bin/pig -x local
grunt> A = LOAD 'simpletest';
grunt> dump A;
(1,2,3)
(1,2,3,4)
grunt> B = LOAD 'simpletest' AS (c1:int, c2:int);
grunt> dump B;
(1,2)
(1,2)

-D

On Mon, Feb 9, 2009 at 7:03 AM, Mridul Muralidharan
<mr...@yahoo-inc.com> wrote:
>
> Hi,
>
>  If I understood it right, if there is a schema specified with load, only
> those fields will be available - that is, there is an implicit project after
> the load ?
>
> To illustrate,
> A = load 'myFile' using myLoader() AS (f1:int, f2:int);
> B = FILTER A by myCondition(*);
> STORE B into 'myOutput';
>
> Then B will not have fields $2 and above ?
> If this is the case, is these a way to disable this project (since input
> might have unpredictable number of fields) ?
>
>
>
> Is the same applicable to udf output too ?
> That is, AS or output schema specifies the 'default' schema which might be
> violated for some input tuples - will the output tuple project away those
> extra fields ? What if udf output has less number of fields ?
>
> How does pig behave in these cases ?
>
> Thanks,
> Mridul
>