You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Mridul Muralidharan <mr...@yahoo-inc.com> on 2009/02/09 13:03:54 UTC
Implicit project
Hi,
If I understood it right, if there is a schema specified with load,
only those fields will be available - that is, there is an implicit
project after the load ?
To illustrate,
A = load 'myFile' using myLoader() AS (f1:int, f2:int);
B = FILTER A by myCondition(*);
STORE B into 'myOutput';
Then B will not have fields $2 and above ?
If this is the case, is these a way to disable this project (since input
might have unpredictable number of fields) ?
Is the same applicable to udf output too ?
That is, AS or output schema specifies the 'default' schema which might
be violated for some input tuples - will the output tuple project away
those extra fields ? What if udf output has less number of fields ?
How does pig behave in these cases ?
Thanks,
Mridul
Re: Implicit project
Posted by Mridul Muralidharan <mr...@yahoo-inc.com>.
Dmitriy Ryaboy wrote:
> $ cat simpletest
> 1 2 3
> 1 2 3 4
>
> $./bin/pig -x local
> grunt> A = LOAD 'simpletest';
> grunt> dump A;
> (1,2,3)
> (1,2,3,4)
> grunt> B = LOAD 'simpletest' AS (c1:int, c2:int);
> grunt> dump B;
> (1,2)
> (1,2)
Thanks, this matches my observation.
My requirement is to be able to disable this latter behavior
(declaratively) without having to not use schema at all.
I am yet to try what happens with udf output though - that is schema for
udf output says arity of n and the udf returns a tuple with arity n + k.
Regards,
Mridul
>
> -D
>
> On Mon, Feb 9, 2009 at 7:03 AM, Mridul Muralidharan
> <mr...@yahoo-inc.com> wrote:
>> Hi,
>>
>> If I understood it right, if there is a schema specified with load, only
>> those fields will be available - that is, there is an implicit project after
>> the load ?
>>
>> To illustrate,
>> A = load 'myFile' using myLoader() AS (f1:int, f2:int);
>> B = FILTER A by myCondition(*);
>> STORE B into 'myOutput';
>>
>> Then B will not have fields $2 and above ?
>> If this is the case, is these a way to disable this project (since input
>> might have unpredictable number of fields) ?
>>
>>
>>
>> Is the same applicable to udf output too ?
>> That is, AS or output schema specifies the 'default' schema which might be
>> violated for some input tuples - will the output tuple project away those
>> extra fields ? What if udf output has less number of fields ?
>>
>> How does pig behave in these cases ?
>>
>> Thanks,
>> Mridul
>>
Re: Implicit project
Posted by Dmitriy Ryaboy <dv...@gmail.com>.
$ cat simpletest
1 2 3
1 2 3 4
$./bin/pig -x local
grunt> A = LOAD 'simpletest';
grunt> dump A;
(1,2,3)
(1,2,3,4)
grunt> B = LOAD 'simpletest' AS (c1:int, c2:int);
grunt> dump B;
(1,2)
(1,2)
-D
On Mon, Feb 9, 2009 at 7:03 AM, Mridul Muralidharan
<mr...@yahoo-inc.com> wrote:
>
> Hi,
>
> If I understood it right, if there is a schema specified with load, only
> those fields will be available - that is, there is an implicit project after
> the load ?
>
> To illustrate,
> A = load 'myFile' using myLoader() AS (f1:int, f2:int);
> B = FILTER A by myCondition(*);
> STORE B into 'myOutput';
>
> Then B will not have fields $2 and above ?
> If this is the case, is these a way to disable this project (since input
> might have unpredictable number of fields) ?
>
>
>
> Is the same applicable to udf output too ?
> That is, AS or output schema specifies the 'default' schema which might be
> violated for some input tuples - will the output tuple project away those
> extra fields ? What if udf output has less number of fields ?
>
> How does pig behave in these cases ?
>
> Thanks,
> Mridul
>