You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Christopher Maier <ch...@gm.com> on 2015/11/13 19:48:27 UTC

RE: Schema changes based on subquery

Hi,

I haven't received a response on this, has anyone had a chance to reproduce the error?

Thanks,
Kit

From: Christopher Maier
Sent: Tuesday, October 20, 2015 4:02 PM
To: 'user@pig.apache.org' <us...@pig.apache.org>
Subject: Schema changes based on subquery

Hi,

I am getting the wrong counts from Pig for a certain query. I have simplified the query to what's below, which shows as a failure instead of a wrong count.

Why does the first line of the subquery cause the output schema to revert to be the same as the input schema? This line should not have any impact on the output.

(I've removed some of the extra logging output.)

pig -version
Apache Pig version 0.12.0 (rexported)
compiled Oct 26 2014, 23:43:04

Query
grunt> a = load 'test1.txt' using PigStorage(',') as (A:chararray,B:chararray,C:chararray);
grunt> b = group a by (A,B);
grunt> c = foreach b {
>>     asdf = filter $1 by (1==1);
>>     generate COUNT_STAR($1) as TARGET;
>> };
grunt> d = limit c 10;

Values
grunt> dump a;
(a,b,c)
grunt> dump b;
((a,b),{(a,b,c)})
grunt> dump c;
(1)
grunt> dump d;
(1)

Schema 'describe' at each step looks good
grunt> describe a;
a: {A: chararray,B: chararray,C: chararray}
grunt> describe b;
b: {group: (A: chararray,B: chararray),a: {(A: chararray,B: chararray,C: chararray)}}
grunt> describe c;
c: {TARGET: long}
grunt> describe d;
d: {TARGET: long}

Attempted next step fails
grunt> e = foreach d generate TARGET;
<line 8, column 23> Invalid field projection. Projected field [TARGET] does not exist in schema: A:chararray,B:chararray,C:chararray.

Progress of real schema through query
grunt> z = foreach a generate FAKE;
<line 8, column 23> Invalid field projection. Projected field [FAKE] does not exist in schema: A:chararray,B:chararray,C:chararray.
grunt> z = foreach b generate FAKE;
<line 8, column 23> Invalid field projection. Projected field [FAKE] does not exist in schema: group:tuple(A:chararray,B:chararray),a:bag{:tuple(A:chararray,B:chararray,C:chararray)}.
grunt> z = foreach c generate FAKE;
<line 8, column 23> Invalid field projection. Projected field [FAKE] does not exist in schema: TARGET:long.
grunt> z = foreach d generate FAKE;
<line 8, column 23> Invalid field projection. Projected field [FAKE] does not exist in schema: A:chararray,B:chararray,C:chararray.

Alternate query shows no error
grunt> c = foreach b {
>> generate COUNT_STAR($1) as TARGET;
>> };
grunt> d = limit c 10;
grunt> e = foreach d generate TARGET;
grunt> dump e;
(1)

Thanks,
Kit Maier



Nothing in this message is intended to constitute an electronic signature unless a specific statement to the contrary is included in this message.

Confidentiality Note: This message is intended only for the person or entity to which it is addressed. It may contain confidential and/or privileged material. Any review, transmission, dissemination or other use, or taking of any action in reliance upon this message by persons or entities other than the intended recipient is prohibited and may be unlawful. If you received this message in error, please contact the sender and delete it from your computer.

Re: Schema changes based on subquery

Posted by Debabrata Pani <an...@gmail.com>.
To clarify,
it fails for 0.12.1 . I did not try pig versions before that.


Sorry for any confusion.
Regards,
Debabrata Pani

On Sat, Nov 21, 2015 at 9:55 AM, Debabrata Pani <an...@gmail.com>
wrote:

> This seems to be a problem with apache pig upto 0.12.1
>
> I tried the script with pig 0.13 and it does not throw any errors
>
> The script evaluated:
>
> a = load 'test1.txt' using PigStorage(',') as (A:chararray,B:chararray,C:chararray);
> b = group a by (A,B);
> c = foreach b {
>      asdf = filter $1 by 1==1;
>      generate COUNT_STAR($1) as TARGET:int;
> };
>
> DESCRIBE c;
> store c into 'output' USING PigStorage(',');
> d = limit c 10;
> describe d;
> e = foreach d generate TARGET;
> DESCRIBE e;
> store e into 'output-e' USING PigStorage(','); -- store command necessary for explain to work
>
>
>
> and instead of running the entire program, I just tried to do an "explain"
> of the script .
> The explain command :
>
> pig -e 'explain -script testpigschema.pig'
>
> EXPLAIN fails in pig upto 0.12.1
> But goes through for  0.13.0
>
> Unfortunately, this does not really solve the problem for you except
> hinting that this may be a bug in apache pig ?
>
> Regards,
> Debabrata Pani
>
> On Mon, Nov 16, 2015 at 9:24 AM, Arvind S <ar...@gmail.com> wrote:
>
>> does not seem to be a issue in pig 0.15 .. (tested in local mode only as
>> of
>> now)
>>
>> a = load '/tmp/test/test.txt' using PigStorage(',') as
>> (A:chararray,B:chararray,C:chararray);
>> b = group a by (A,B);
>> c = foreach b {
>> asdf = filter $1 by (1==1);
>> generate COUNT_STAR($1) as TARGET;
>>  };
>> d = limit c 10;
>> e = foreach d generate TARGET;
>> dump e;
>>
>> end output ...
>> (1)
>>
>>
>> *Cheers !!*
>> Arvind
>>
>> On Sat, Nov 14, 2015 at 12:18 AM, Christopher Maier <
>> christopher.maier@gm.com> wrote:
>>
>> > Hi,
>> >
>> > I haven't received a response on this, has anyone had a chance to
>> > reproduce the error?
>> >
>> > Thanks,
>> > Kit
>> >
>> > From: Christopher Maier
>> > Sent: Tuesday, October 20, 2015 4:02 PM
>> > To: 'user@pig.apache.org' <us...@pig.apache.org>
>> > Subject: Schema changes based on subquery
>> >
>> > Hi,
>> >
>> > I am getting the wrong counts from Pig for a certain query. I have
>> > simplified the query to what's below, which shows as a failure instead
>> of a
>> > wrong count.
>> >
>> > Why does the first line of the subquery cause the output schema to
>> revert
>> > to be the same as the input schema? This line should not have any
>> impact on
>> > the output.
>> >
>> > (I've removed some of the extra logging output.)
>> >
>> > pig -version
>> > Apache Pig version 0.12.0 (rexported)
>> > compiled Oct 26 2014, 23:43:04
>> >
>> > Query
>> > grunt> a = load 'test1.txt' using PigStorage(',') as
>> > (A:chararray,B:chararray,C:chararray);
>> > grunt> b = group a by (A,B);
>> > grunt> c = foreach b {
>> > >>     asdf = filter $1 by (1==1);
>> > >>     generate COUNT_STAR($1) as TARGET;
>> > >> };
>> > grunt> d = limit c 10;
>> >
>> > Values
>> > grunt> dump a;
>> > (a,b,c)
>> > grunt> dump b;
>> > ((a,b),{(a,b,c)})
>> > grunt> dump c;
>> > (1)
>> > grunt> dump d;
>> > (1)
>> >
>> > Schema 'describe' at each step looks good
>> > grunt> describe a;
>> > a: {A: chararray,B: chararray,C: chararray}
>> > grunt> describe b;
>> > b: {group: (A: chararray,B: chararray),a: {(A: chararray,B: chararray,C:
>> > chararray)}}
>> > grunt> describe c;
>> > c: {TARGET: long}
>> > grunt> describe d;
>> > d: {TARGET: long}
>> >
>> > Attempted next step fails
>> > grunt> e = foreach d generate TARGET;
>> > <line 8, column 23> Invalid field projection. Projected field [TARGET]
>> > does not exist in schema: A:chararray,B:chararray,C:chararray.
>> >
>> > Progress of real schema through query
>> > grunt> z = foreach a generate FAKE;
>> > <line 8, column 23> Invalid field projection. Projected field [FAKE]
>> does
>> > not exist in schema: A:chararray,B:chararray,C:chararray.
>> > grunt> z = foreach b generate FAKE;
>> > <line 8, column 23> Invalid field projection. Projected field [FAKE]
>> does
>> > not exist in schema:
>> >
>> group:tuple(A:chararray,B:chararray),a:bag{:tuple(A:chararray,B:chararray,C:chararray)}.
>> > grunt> z = foreach c generate FAKE;
>> > <line 8, column 23> Invalid field projection. Projected field [FAKE]
>> does
>> > not exist in schema: TARGET:long.
>> > grunt> z = foreach d generate FAKE;
>> > <line 8, column 23> Invalid field projection. Projected field [FAKE]
>> does
>> > not exist in schema: A:chararray,B:chararray,C:chararray.
>> >
>> > Alternate query shows no error
>> > grunt> c = foreach b {
>> > >> generate COUNT_STAR($1) as TARGET;
>> > >> };
>> > grunt> d = limit c 10;
>> > grunt> e = foreach d generate TARGET;
>> > grunt> dump e;
>> > (1)
>> >
>> > Thanks,
>> > Kit Maier
>> >
>> >
>> >
>> > Nothing in this message is intended to constitute an electronic
>> signature
>> > unless a specific statement to the contrary is included in this message.
>> >
>> > Confidentiality Note: This message is intended only for the person or
>> > entity to which it is addressed. It may contain confidential and/or
>> > privileged material. Any review, transmission, dissemination or other
>> use,
>> > or taking of any action in reliance upon this message by persons or
>> > entities other than the intended recipient is prohibited and may be
>> > unlawful. If you received this message in error, please contact the
>> sender
>> > and delete it from your computer.
>> >
>>
>
>

Re: Schema changes based on subquery

Posted by Debabrata Pani <an...@gmail.com>.
This seems to be a problem with apache pig upto 0.12.1

I tried the script with pig 0.13 and it does not throw any errors

The script evaluated:

a = load 'test1.txt' using PigStorage(',') as
(A:chararray,B:chararray,C:chararray);
b = group a by (A,B);
c = foreach b {
     asdf = filter $1 by 1==1;
     generate COUNT_STAR($1) as TARGET:int;
};

DESCRIBE c;
store c into 'output' USING PigStorage(',');
d = limit c 10;
describe d;
e = foreach d generate TARGET;
DESCRIBE e;
store e into 'output-e' USING PigStorage(','); -- store command
necessary for explain to work



and instead of running the entire program, I just tried to do an "explain"
of the script .
The explain command :

pig -e 'explain -script testpigschema.pig'

EXPLAIN fails in pig upto 0.12.1
But goes through for  0.13.0

Unfortunately, this does not really solve the problem for you except
hinting that this may be a bug in apache pig ?

Regards,
Debabrata Pani

On Mon, Nov 16, 2015 at 9:24 AM, Arvind S <ar...@gmail.com> wrote:

> does not seem to be a issue in pig 0.15 .. (tested in local mode only as of
> now)
>
> a = load '/tmp/test/test.txt' using PigStorage(',') as
> (A:chararray,B:chararray,C:chararray);
> b = group a by (A,B);
> c = foreach b {
> asdf = filter $1 by (1==1);
> generate COUNT_STAR($1) as TARGET;
>  };
> d = limit c 10;
> e = foreach d generate TARGET;
> dump e;
>
> end output ...
> (1)
>
>
> *Cheers !!*
> Arvind
>
> On Sat, Nov 14, 2015 at 12:18 AM, Christopher Maier <
> christopher.maier@gm.com> wrote:
>
> > Hi,
> >
> > I haven't received a response on this, has anyone had a chance to
> > reproduce the error?
> >
> > Thanks,
> > Kit
> >
> > From: Christopher Maier
> > Sent: Tuesday, October 20, 2015 4:02 PM
> > To: 'user@pig.apache.org' <us...@pig.apache.org>
> > Subject: Schema changes based on subquery
> >
> > Hi,
> >
> > I am getting the wrong counts from Pig for a certain query. I have
> > simplified the query to what's below, which shows as a failure instead
> of a
> > wrong count.
> >
> > Why does the first line of the subquery cause the output schema to revert
> > to be the same as the input schema? This line should not have any impact
> on
> > the output.
> >
> > (I've removed some of the extra logging output.)
> >
> > pig -version
> > Apache Pig version 0.12.0 (rexported)
> > compiled Oct 26 2014, 23:43:04
> >
> > Query
> > grunt> a = load 'test1.txt' using PigStorage(',') as
> > (A:chararray,B:chararray,C:chararray);
> > grunt> b = group a by (A,B);
> > grunt> c = foreach b {
> > >>     asdf = filter $1 by (1==1);
> > >>     generate COUNT_STAR($1) as TARGET;
> > >> };
> > grunt> d = limit c 10;
> >
> > Values
> > grunt> dump a;
> > (a,b,c)
> > grunt> dump b;
> > ((a,b),{(a,b,c)})
> > grunt> dump c;
> > (1)
> > grunt> dump d;
> > (1)
> >
> > Schema 'describe' at each step looks good
> > grunt> describe a;
> > a: {A: chararray,B: chararray,C: chararray}
> > grunt> describe b;
> > b: {group: (A: chararray,B: chararray),a: {(A: chararray,B: chararray,C:
> > chararray)}}
> > grunt> describe c;
> > c: {TARGET: long}
> > grunt> describe d;
> > d: {TARGET: long}
> >
> > Attempted next step fails
> > grunt> e = foreach d generate TARGET;
> > <line 8, column 23> Invalid field projection. Projected field [TARGET]
> > does not exist in schema: A:chararray,B:chararray,C:chararray.
> >
> > Progress of real schema through query
> > grunt> z = foreach a generate FAKE;
> > <line 8, column 23> Invalid field projection. Projected field [FAKE] does
> > not exist in schema: A:chararray,B:chararray,C:chararray.
> > grunt> z = foreach b generate FAKE;
> > <line 8, column 23> Invalid field projection. Projected field [FAKE] does
> > not exist in schema:
> >
> group:tuple(A:chararray,B:chararray),a:bag{:tuple(A:chararray,B:chararray,C:chararray)}.
> > grunt> z = foreach c generate FAKE;
> > <line 8, column 23> Invalid field projection. Projected field [FAKE] does
> > not exist in schema: TARGET:long.
> > grunt> z = foreach d generate FAKE;
> > <line 8, column 23> Invalid field projection. Projected field [FAKE] does
> > not exist in schema: A:chararray,B:chararray,C:chararray.
> >
> > Alternate query shows no error
> > grunt> c = foreach b {
> > >> generate COUNT_STAR($1) as TARGET;
> > >> };
> > grunt> d = limit c 10;
> > grunt> e = foreach d generate TARGET;
> > grunt> dump e;
> > (1)
> >
> > Thanks,
> > Kit Maier
> >
> >
> >
> > Nothing in this message is intended to constitute an electronic signature
> > unless a specific statement to the contrary is included in this message.
> >
> > Confidentiality Note: This message is intended only for the person or
> > entity to which it is addressed. It may contain confidential and/or
> > privileged material. Any review, transmission, dissemination or other
> use,
> > or taking of any action in reliance upon this message by persons or
> > entities other than the intended recipient is prohibited and may be
> > unlawful. If you received this message in error, please contact the
> sender
> > and delete it from your computer.
> >
>

Re: Schema changes based on subquery

Posted by Arvind S <ar...@gmail.com>.
does not seem to be a issue in pig 0.15 .. (tested in local mode only as of
now)

a = load '/tmp/test/test.txt' using PigStorage(',') as
(A:chararray,B:chararray,C:chararray);
b = group a by (A,B);
c = foreach b {
asdf = filter $1 by (1==1);
generate COUNT_STAR($1) as TARGET;
 };
d = limit c 10;
e = foreach d generate TARGET;
dump e;

end output ...
(1)


*Cheers !!*
Arvind

On Sat, Nov 14, 2015 at 12:18 AM, Christopher Maier <
christopher.maier@gm.com> wrote:

> Hi,
>
> I haven't received a response on this, has anyone had a chance to
> reproduce the error?
>
> Thanks,
> Kit
>
> From: Christopher Maier
> Sent: Tuesday, October 20, 2015 4:02 PM
> To: 'user@pig.apache.org' <us...@pig.apache.org>
> Subject: Schema changes based on subquery
>
> Hi,
>
> I am getting the wrong counts from Pig for a certain query. I have
> simplified the query to what's below, which shows as a failure instead of a
> wrong count.
>
> Why does the first line of the subquery cause the output schema to revert
> to be the same as the input schema? This line should not have any impact on
> the output.
>
> (I've removed some of the extra logging output.)
>
> pig -version
> Apache Pig version 0.12.0 (rexported)
> compiled Oct 26 2014, 23:43:04
>
> Query
> grunt> a = load 'test1.txt' using PigStorage(',') as
> (A:chararray,B:chararray,C:chararray);
> grunt> b = group a by (A,B);
> grunt> c = foreach b {
> >>     asdf = filter $1 by (1==1);
> >>     generate COUNT_STAR($1) as TARGET;
> >> };
> grunt> d = limit c 10;
>
> Values
> grunt> dump a;
> (a,b,c)
> grunt> dump b;
> ((a,b),{(a,b,c)})
> grunt> dump c;
> (1)
> grunt> dump d;
> (1)
>
> Schema 'describe' at each step looks good
> grunt> describe a;
> a: {A: chararray,B: chararray,C: chararray}
> grunt> describe b;
> b: {group: (A: chararray,B: chararray),a: {(A: chararray,B: chararray,C:
> chararray)}}
> grunt> describe c;
> c: {TARGET: long}
> grunt> describe d;
> d: {TARGET: long}
>
> Attempted next step fails
> grunt> e = foreach d generate TARGET;
> <line 8, column 23> Invalid field projection. Projected field [TARGET]
> does not exist in schema: A:chararray,B:chararray,C:chararray.
>
> Progress of real schema through query
> grunt> z = foreach a generate FAKE;
> <line 8, column 23> Invalid field projection. Projected field [FAKE] does
> not exist in schema: A:chararray,B:chararray,C:chararray.
> grunt> z = foreach b generate FAKE;
> <line 8, column 23> Invalid field projection. Projected field [FAKE] does
> not exist in schema:
> group:tuple(A:chararray,B:chararray),a:bag{:tuple(A:chararray,B:chararray,C:chararray)}.
> grunt> z = foreach c generate FAKE;
> <line 8, column 23> Invalid field projection. Projected field [FAKE] does
> not exist in schema: TARGET:long.
> grunt> z = foreach d generate FAKE;
> <line 8, column 23> Invalid field projection. Projected field [FAKE] does
> not exist in schema: A:chararray,B:chararray,C:chararray.
>
> Alternate query shows no error
> grunt> c = foreach b {
> >> generate COUNT_STAR($1) as TARGET;
> >> };
> grunt> d = limit c 10;
> grunt> e = foreach d generate TARGET;
> grunt> dump e;
> (1)
>
> Thanks,
> Kit Maier
>
>
>
> Nothing in this message is intended to constitute an electronic signature
> unless a specific statement to the contrary is included in this message.
>
> Confidentiality Note: This message is intended only for the person or
> entity to which it is addressed. It may contain confidential and/or
> privileged material. Any review, transmission, dissemination or other use,
> or taking of any action in reliance upon this message by persons or
> entities other than the intended recipient is prohibited and may be
> unlawful. If you received this message in error, please contact the sender
> and delete it from your computer.
>