You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by hc busy <hc...@gmail.com> on 2010/05/06 21:29:26 UTC

Re: ugh! casting migraine

It doesn't surprise me, but the fact that it doesn't scream an error or a
very loud warning is annoying. consider this sequence of changes

timestamp 1:
describe A;
A: {id: int, bad: (a: int,b: int)}
B = foreach A generate id, FLATTEN(bad) as (a, b);

timestamp 2:
describe A;
A: {id: int, bad: (a: int,b: int, c: chararray)}
B = foreach A generate id, FLATTEN(bad) as (a, b);

timestamp 3:
describe A;
A: {id: int, bad: (a: int,b: int, c: chararray,d: int)}
B = foreach A generate id, FLATTEN(bad) as (a, b, c);


Migraine ensues as multiple developers scramble to try to figure out why the
script didn't work after their seemingly harmless change.




On Thu, May 6, 2010 at 12:14 AM, Dmitriy Ryaboy <dv...@gmail.com> wrote:

> Does it surprise you that "select a as foo, b, d" return 3 columns?
> You only gave one alias... this works the same way.
>
> It's the opposite that surprises me -- that if you load multi-column
> data and only provide names for the first few columns, you can't
> access the rest by ordinal.
>
> -D
>
> On Wed, May 5, 2010 at 11:24 PM, hc busy <hc...@gmail.com> wrote:
> > okay, I have to blow some steam here, did you know that if
> >
> > describe A;
> > A: {id: int, bad: (a: int,b: int,z: int)}
> >
> > and I do
> >
> > B = foreach A generate id, FLATTEN(bad) as c;
> >
> > That this would actually run without error and that c takes value of a,
> and
> > then an anonymous field is created for b. (So, b is not dropped by this
> > cast)
> >
> > I wonder if either the "B =" statement should generate an error, OR
> > it would rename a to c and drop the column b ?
> > The statement:
> >
> > B = foreach A generate id, FLATTEN(bad) as (c,d);
> > describe B;
> > B: {id: int,c: int,d:int}
> >
> > Seems to make more sense than a silent non-dropping result.
> >
>

Re: ugh! casting migraine

Posted by hc busy <hc...@gmail.com>.
right, and obviously when head really starts crack'n is when we get to
timestamp 4:


timestamp 4:
describe A;
A: {id: int, bad: (a: int,boo: int,b: int, c: chararray,d: int)}
B = foreach A generate id, FLATTEN(bad) as (a, b, c, d);


On Thu, May 6, 2010 at 12:29 PM, hc busy <hc...@gmail.com> wrote:

> It doesn't surprise me, but the fact that it doesn't scream an error or a
> very loud warning is annoying. consider this sequence of changes
>
> timestamp 1:
> describe A;
> A: {id: int, bad: (a: int,b: int)}
> B = foreach A generate id, FLATTEN(bad) as (a, b);
>
> timestamp 2:
> describe A;
> A: {id: int, bad: (a: int,b: int, c: chararray)}
> B = foreach A generate id, FLATTEN(bad) as (a, b);
>
> timestamp 3:
> describe A;
> A: {id: int, bad: (a: int,b: int, c: chararray,d: int)}
> B = foreach A generate id, FLATTEN(bad) as (a, b, c);
>
>
> Migraine ensues as multiple developers scramble to try to figure out why
> the script didn't work after their seemingly harmless change.
>
>
>
>
> On Thu, May 6, 2010 at 12:14 AM, Dmitriy Ryaboy <dv...@gmail.com>wrote:
>
>> Does it surprise you that "select a as foo, b, d" return 3 columns?
>> You only gave one alias... this works the same way.
>>
>> It's the opposite that surprises me -- that if you load multi-column
>> data and only provide names for the first few columns, you can't
>> access the rest by ordinal.
>>
>> -D
>>
>> On Wed, May 5, 2010 at 11:24 PM, hc busy <hc...@gmail.com> wrote:
>> > okay, I have to blow some steam here, did you know that if
>> >
>> > describe A;
>> > A: {id: int, bad: (a: int,b: int,z: int)}
>> >
>> > and I do
>> >
>> > B = foreach A generate id, FLATTEN(bad) as c;
>> >
>> > That this would actually run without error and that c takes value of a,
>> and
>> > then an anonymous field is created for b. (So, b is not dropped by this
>> > cast)
>> >
>> > I wonder if either the "B =" statement should generate an error, OR
>> > it would rename a to c and drop the column b ?
>> > The statement:
>> >
>> > B = foreach A generate id, FLATTEN(bad) as (c,d);
>> > describe B;
>> > B: {id: int,c: int,d:int}
>> >
>> > Seems to make more sense than a silent non-dropping result.
>> >
>>
>
>