You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Yang <te...@gmail.com> on 2012/06/24 12:40:46 UTC

FLATTEN() behavior difference in 0.8.1 and 0.10.0 ?

my UDF returns a bag of tuples : mybag:bag{ mytuple: tuple ( x: int, y:int)}

in my pig script:

I do

K = foreach blah generate UDF( xxx);

M = foreach K generate x;


here PIG 0.8.1 says x can not be found in schema, since

describe K

shows:
{ mytuple:tuple(x:int , y:int) }

while 0.10.0

shows
{x:int, y:int}

Re: FLATTEN() behavior difference in 0.8.1 and 0.10.0 ?

Posted by Yang <te...@gmail.com>.
actually FLATTEN(FLATTEN(....)) is not syntactically correct , at least in
0.8. also semantically it's not what I wanted either, cuz FLATTEN works on
bags, while I wanted to project ALL fields of a tuple.

I ended up adding a T:tuple(  ) to the AS clause, and adding an explicit
projection after the udf call.

Thanks
Yang



On Mon, Jun 25, 2012 at 6:45 AM, Yang <te...@gmail.com> wrote:

> thanks Robert, I'll try it
> On Jun 25, 2012 3:56 AM, "Norbert Burger" <no...@gmail.com>
> wrote:
>
>> Yang -- I think you'll get the representation you're looking for by
>> applying the FLATTEN a second time.  Each instance of a FLATTEN strips off
>> a single layer.
>>
>> Norbert
>>
>> On Sun, Jun 24, 2012 at 5:57 PM, Jonathan Coveney <jcoveney@gmail.com
>> >wrote:
>>
>> > generate K.(x1), K.(x2), K.(x3) .... , K.(x100); and generate
>> > K(x1,...,x100) are actually very different.
>> >
>> > The latter is a bag, with columns x1, x2..x100. This is generally what
>> is
>> > desired.
>> >
>> > The former is a bag of column x1, then a bag of column x2, then a bag of
>> > column x3, etc. Each will be unordered and independent.
>> >
>> > 2012/6/24 yonghu <yo...@gmail.com>
>> >
>> > > You can also write like
>> > >
>> > > K1.(x1,x2,...,x100).
>> > >
>> > > regards!
>> > >
>> > > Yong
>> > >
>> > > On Sun, Jun 24, 2012 at 8:40 PM, Yang <te...@gmail.com> wrote:
>> > > > thanks,
>> > > >
>> > > > but this is a bit more cumbersome: if I have
>> > > >
>> > > > generate K.(x1), K.(x2), K.(x3) .... , K.(x100);
>> > > >
>> > > > I'd have to re-write each xn by adding K.( )
>> > > >
>> > > >
>> > > > it would be nice if the schema of K can strip off the surrounding {(
>> > )}.
>> > > > actually it should,
>> > > > since this is after a FLATTEN()
>> > > >
>> > > >
>> > > > Yang
>> > > >
>> > > > On Sun, Jun 24, 2012 at 11:17 AM, yonghu <yo...@gmail.com>
>> > wrote:
>> > > >
>> > > >> So, I think you want to project the x in K. You can write the pig
>> as:
>> > > >>
>> > > >> M = foreach K generate K.(x) as X;
>> > > >>
>> > > >> Hope this can help you.
>> > > >>
>> > > >> Yong
>> > > >>
>> > > >> On Sun, Jun 24, 2012 at 12:40 PM, Yang <te...@gmail.com>
>> wrote:
>> > > >> > my UDF returns a bag of tuples : mybag:bag{ mytuple: tuple ( x:
>> int,
>> > > >> y:int)}
>> > > >> >
>> > > >> > in my pig script:
>> > > >> >
>> > > >> > I do
>> > > >> >
>> > > >> > K = foreach blah generate UDF( xxx);
>> > > >> >
>> > > >> > M = foreach K generate x;
>> > > >> >
>> > > >> >
>> > > >> > here PIG 0.8.1 says x can not be found in schema, since
>> > > >> >
>> > > >> > describe K
>> > > >> >
>> > > >> > shows:
>> > > >> > { mytuple:tuple(x:int , y:int) }
>> > > >> >
>> > > >> > while 0.10.0
>> > > >> >
>> > > >> > shows
>> > > >> > {x:int, y:int}
>> > > >>
>> > >
>> >
>>
>

Re: FLATTEN() behavior difference in 0.8.1 and 0.10.0 ?

Posted by Yang <te...@gmail.com>.
thanks Robert, I'll try it
On Jun 25, 2012 3:56 AM, "Norbert Burger" <no...@gmail.com> wrote:

> Yang -- I think you'll get the representation you're looking for by
> applying the FLATTEN a second time.  Each instance of a FLATTEN strips off
> a single layer.
>
> Norbert
>
> On Sun, Jun 24, 2012 at 5:57 PM, Jonathan Coveney <jcoveney@gmail.com
> >wrote:
>
> > generate K.(x1), K.(x2), K.(x3) .... , K.(x100); and generate
> > K(x1,...,x100) are actually very different.
> >
> > The latter is a bag, with columns x1, x2..x100. This is generally what is
> > desired.
> >
> > The former is a bag of column x1, then a bag of column x2, then a bag of
> > column x3, etc. Each will be unordered and independent.
> >
> > 2012/6/24 yonghu <yo...@gmail.com>
> >
> > > You can also write like
> > >
> > > K1.(x1,x2,...,x100).
> > >
> > > regards!
> > >
> > > Yong
> > >
> > > On Sun, Jun 24, 2012 at 8:40 PM, Yang <te...@gmail.com> wrote:
> > > > thanks,
> > > >
> > > > but this is a bit more cumbersome: if I have
> > > >
> > > > generate K.(x1), K.(x2), K.(x3) .... , K.(x100);
> > > >
> > > > I'd have to re-write each xn by adding K.( )
> > > >
> > > >
> > > > it would be nice if the schema of K can strip off the surrounding {(
> > )}.
> > > > actually it should,
> > > > since this is after a FLATTEN()
> > > >
> > > >
> > > > Yang
> > > >
> > > > On Sun, Jun 24, 2012 at 11:17 AM, yonghu <yo...@gmail.com>
> > wrote:
> > > >
> > > >> So, I think you want to project the x in K. You can write the pig
> as:
> > > >>
> > > >> M = foreach K generate K.(x) as X;
> > > >>
> > > >> Hope this can help you.
> > > >>
> > > >> Yong
> > > >>
> > > >> On Sun, Jun 24, 2012 at 12:40 PM, Yang <te...@gmail.com>
> wrote:
> > > >> > my UDF returns a bag of tuples : mybag:bag{ mytuple: tuple ( x:
> int,
> > > >> y:int)}
> > > >> >
> > > >> > in my pig script:
> > > >> >
> > > >> > I do
> > > >> >
> > > >> > K = foreach blah generate UDF( xxx);
> > > >> >
> > > >> > M = foreach K generate x;
> > > >> >
> > > >> >
> > > >> > here PIG 0.8.1 says x can not be found in schema, since
> > > >> >
> > > >> > describe K
> > > >> >
> > > >> > shows:
> > > >> > { mytuple:tuple(x:int , y:int) }
> > > >> >
> > > >> > while 0.10.0
> > > >> >
> > > >> > shows
> > > >> > {x:int, y:int}
> > > >>
> > >
> >
>

Re: FLATTEN() behavior difference in 0.8.1 and 0.10.0 ?

Posted by Norbert Burger <no...@gmail.com>.
Yang -- I think you'll get the representation you're looking for by
applying the FLATTEN a second time.  Each instance of a FLATTEN strips off
a single layer.

Norbert

On Sun, Jun 24, 2012 at 5:57 PM, Jonathan Coveney <jc...@gmail.com>wrote:

> generate K.(x1), K.(x2), K.(x3) .... , K.(x100); and generate
> K(x1,...,x100) are actually very different.
>
> The latter is a bag, with columns x1, x2..x100. This is generally what is
> desired.
>
> The former is a bag of column x1, then a bag of column x2, then a bag of
> column x3, etc. Each will be unordered and independent.
>
> 2012/6/24 yonghu <yo...@gmail.com>
>
> > You can also write like
> >
> > K1.(x1,x2,...,x100).
> >
> > regards!
> >
> > Yong
> >
> > On Sun, Jun 24, 2012 at 8:40 PM, Yang <te...@gmail.com> wrote:
> > > thanks,
> > >
> > > but this is a bit more cumbersome: if I have
> > >
> > > generate K.(x1), K.(x2), K.(x3) .... , K.(x100);
> > >
> > > I'd have to re-write each xn by adding K.( )
> > >
> > >
> > > it would be nice if the schema of K can strip off the surrounding {(
> )}.
> > > actually it should,
> > > since this is after a FLATTEN()
> > >
> > >
> > > Yang
> > >
> > > On Sun, Jun 24, 2012 at 11:17 AM, yonghu <yo...@gmail.com>
> wrote:
> > >
> > >> So, I think you want to project the x in K. You can write the pig as:
> > >>
> > >> M = foreach K generate K.(x) as X;
> > >>
> > >> Hope this can help you.
> > >>
> > >> Yong
> > >>
> > >> On Sun, Jun 24, 2012 at 12:40 PM, Yang <te...@gmail.com> wrote:
> > >> > my UDF returns a bag of tuples : mybag:bag{ mytuple: tuple ( x: int,
> > >> y:int)}
> > >> >
> > >> > in my pig script:
> > >> >
> > >> > I do
> > >> >
> > >> > K = foreach blah generate UDF( xxx);
> > >> >
> > >> > M = foreach K generate x;
> > >> >
> > >> >
> > >> > here PIG 0.8.1 says x can not be found in schema, since
> > >> >
> > >> > describe K
> > >> >
> > >> > shows:
> > >> > { mytuple:tuple(x:int , y:int) }
> > >> >
> > >> > while 0.10.0
> > >> >
> > >> > shows
> > >> > {x:int, y:int}
> > >>
> >
>

Re: FLATTEN() behavior difference in 0.8.1 and 0.10.0 ?

Posted by Jonathan Coveney <jc...@gmail.com>.
generate K.(x1), K.(x2), K.(x3) .... , K.(x100); and generate
K(x1,...,x100) are actually very different.

The latter is a bag, with columns x1, x2..x100. This is generally what is
desired.

The former is a bag of column x1, then a bag of column x2, then a bag of
column x3, etc. Each will be unordered and independent.

2012/6/24 yonghu <yo...@gmail.com>

> You can also write like
>
> K1.(x1,x2,...,x100).
>
> regards!
>
> Yong
>
> On Sun, Jun 24, 2012 at 8:40 PM, Yang <te...@gmail.com> wrote:
> > thanks,
> >
> > but this is a bit more cumbersome: if I have
> >
> > generate K.(x1), K.(x2), K.(x3) .... , K.(x100);
> >
> > I'd have to re-write each xn by adding K.( )
> >
> >
> > it would be nice if the schema of K can strip off the surrounding {( )}.
> > actually it should,
> > since this is after a FLATTEN()
> >
> >
> > Yang
> >
> > On Sun, Jun 24, 2012 at 11:17 AM, yonghu <yo...@gmail.com> wrote:
> >
> >> So, I think you want to project the x in K. You can write the pig as:
> >>
> >> M = foreach K generate K.(x) as X;
> >>
> >> Hope this can help you.
> >>
> >> Yong
> >>
> >> On Sun, Jun 24, 2012 at 12:40 PM, Yang <te...@gmail.com> wrote:
> >> > my UDF returns a bag of tuples : mybag:bag{ mytuple: tuple ( x: int,
> >> y:int)}
> >> >
> >> > in my pig script:
> >> >
> >> > I do
> >> >
> >> > K = foreach blah generate UDF( xxx);
> >> >
> >> > M = foreach K generate x;
> >> >
> >> >
> >> > here PIG 0.8.1 says x can not be found in schema, since
> >> >
> >> > describe K
> >> >
> >> > shows:
> >> > { mytuple:tuple(x:int , y:int) }
> >> >
> >> > while 0.10.0
> >> >
> >> > shows
> >> > {x:int, y:int}
> >>
>

Re: FLATTEN() behavior difference in 0.8.1 and 0.10.0 ?

Posted by yonghu <yo...@gmail.com>.
You can also write like

K1.(x1,x2,...,x100).

regards!

Yong

On Sun, Jun 24, 2012 at 8:40 PM, Yang <te...@gmail.com> wrote:
> thanks,
>
> but this is a bit more cumbersome: if I have
>
> generate K.(x1), K.(x2), K.(x3) .... , K.(x100);
>
> I'd have to re-write each xn by adding K.( )
>
>
> it would be nice if the schema of K can strip off the surrounding {( )}.
> actually it should,
> since this is after a FLATTEN()
>
>
> Yang
>
> On Sun, Jun 24, 2012 at 11:17 AM, yonghu <yo...@gmail.com> wrote:
>
>> So, I think you want to project the x in K. You can write the pig as:
>>
>> M = foreach K generate K.(x) as X;
>>
>> Hope this can help you.
>>
>> Yong
>>
>> On Sun, Jun 24, 2012 at 12:40 PM, Yang <te...@gmail.com> wrote:
>> > my UDF returns a bag of tuples : mybag:bag{ mytuple: tuple ( x: int,
>> y:int)}
>> >
>> > in my pig script:
>> >
>> > I do
>> >
>> > K = foreach blah generate UDF( xxx);
>> >
>> > M = foreach K generate x;
>> >
>> >
>> > here PIG 0.8.1 says x can not be found in schema, since
>> >
>> > describe K
>> >
>> > shows:
>> > { mytuple:tuple(x:int , y:int) }
>> >
>> > while 0.10.0
>> >
>> > shows
>> > {x:int, y:int}
>>

Re: FLATTEN() behavior difference in 0.8.1 and 0.10.0 ?

Posted by Yang <te...@gmail.com>.
thanks,

but this is a bit more cumbersome: if I have

generate K.(x1), K.(x2), K.(x3) .... , K.(x100);

I'd have to re-write each xn by adding K.( )


it would be nice if the schema of K can strip off the surrounding {( )}.
actually it should,
since this is after a FLATTEN()


Yang

On Sun, Jun 24, 2012 at 11:17 AM, yonghu <yo...@gmail.com> wrote:

> So, I think you want to project the x in K. You can write the pig as:
>
> M = foreach K generate K.(x) as X;
>
> Hope this can help you.
>
> Yong
>
> On Sun, Jun 24, 2012 at 12:40 PM, Yang <te...@gmail.com> wrote:
> > my UDF returns a bag of tuples : mybag:bag{ mytuple: tuple ( x: int,
> y:int)}
> >
> > in my pig script:
> >
> > I do
> >
> > K = foreach blah generate UDF( xxx);
> >
> > M = foreach K generate x;
> >
> >
> > here PIG 0.8.1 says x can not be found in schema, since
> >
> > describe K
> >
> > shows:
> > { mytuple:tuple(x:int , y:int) }
> >
> > while 0.10.0
> >
> > shows
> > {x:int, y:int}
>

Re: FLATTEN() behavior difference in 0.8.1 and 0.10.0 ?

Posted by yonghu <yo...@gmail.com>.
So, I think you want to project the x in K. You can write the pig as:

M = foreach K generate K.(x) as X;

Hope this can help you.

Yong

On Sun, Jun 24, 2012 at 12:40 PM, Yang <te...@gmail.com> wrote:
> my UDF returns a bag of tuples : mybag:bag{ mytuple: tuple ( x: int, y:int)}
>
> in my pig script:
>
> I do
>
> K = foreach blah generate UDF( xxx);
>
> M = foreach K generate x;
>
>
> here PIG 0.8.1 says x can not be found in schema, since
>
> describe K
>
> shows:
> { mytuple:tuple(x:int , y:int) }
>
> while 0.10.0
>
> shows
> {x:int, y:int}