You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Lauren Blau <la...@digitalreasoning.com> on 2012/08/24 20:28:45 UTC

group schema getting wrong fields?

I'm running pig 0.9.2 and seeing this:

grunt> describe cxels;
cxels: {messageId: chararray,celstart: int,celend: int,notcellabel:
chararray,notcelstart: int,notcelend: int}
grunt> gcxels = group cxels by (messageId,celstart,celend);
grunt> describe gcxels;
gcxels: {group: (messageId: chararray,notcelstart: int,notcelend:
int),cxels: {(messageId: chararray,celstart: int,celend: int,notcellabel:
chararray,notcelstart: int,notcelend: int)}}


why does the schema for gcxels::group show notcelstart and notcelend when I
gave it celstart,celend as the grouping fields?
Is the fieldname not being matched correctly?

Thanks,
lauren

Re: group schema getting wrong fields?

Posted by Jonathan Coveney <jc...@gmail.com>.
Yeah, I think this is a known issue with filters and relations. Use the
fix, but I think trunk has the fix.

Thanks

2012/8/24 Lauren Blau <la...@digitalreasoning.com>

> actually, if I replace the filters that create the original 2 relations
> with a split, the problem goes away. (i just saw split used in another
> message and realized I could use it)
>
> On Fri, Aug 24, 2012 at 4:11 PM, Lauren Blau <
> lauren.blau@digitalreasoning.com> wrote:
>
> > fcels and fnot are both filtered from the same original relation.
> >
> >
> > On Fri, Aug 24, 2012 at 4:11 PM, Lauren Blau <
> > lauren.blau@digitalreasoning.com> wrote:
> >
> >> how much more. Here's the cxels:
> >>
> >> bigcross = join fcels by (chararray)messageId, fnot by (chararray)
> >> messageId;
> >> filt1 = filter bigcross by (int)fcels::astart <= (int)fnot::astart;
> >> filt2 = filter filt1 by (int)fcels::aend >= (int)fnot::aend;
> >>
> >> cxels = foreach filt2 generate fcels::messageId as
> >> messageId:chararray,fcels::astart as celstart:int,fcels::aend as
> >> celend:int,fnot::alabel as notcellabel:chararray,fnot::astart as
> >> notcelstart:int, fnot::aend as notcelend:int;
> >>
> >>
> >> On Fri, Aug 24, 2012 at 3:07 PM, Jonathan Coveney <jcoveney@gmail.com
> >wrote:
> >>
> >>> Can you post more of your script?
> >>>
> >>> 2012/8/24 Lauren Blau <la...@digitalreasoning.com>
> >>>
> >>> > I'm running pig 0.9.2 and seeing this:
> >>> >
> >>> > grunt> describe cxels;
> >>> > cxels: {messageId: chararray,celstart: int,celend: int,notcellabel:
> >>> > chararray,notcelstart: int,notcelend: int}
> >>> > grunt> gcxels = group cxels by (messageId,celstart,celend);
> >>> > grunt> describe gcxels;
> >>> > gcxels: {group: (messageId: chararray,notcelstart: int,notcelend:
> >>> > int),cxels: {(messageId: chararray,celstart: int,celend:
> >>> int,notcellabel:
> >>> > chararray,notcelstart: int,notcelend: int)}}
> >>> >
> >>> >
> >>> > why does the schema for gcxels::group show notcelstart and notcelend
> >>> when I
> >>> > gave it celstart,celend as the grouping fields?
> >>> > Is the fieldname not being matched correctly?
> >>> >
> >>> > Thanks,
> >>> > lauren
> >>> >
> >>>
> >>
> >>
> >
>

Re: group schema getting wrong fields?

Posted by Lauren Blau <la...@digitalreasoning.com>.
actually, if I replace the filters that create the original 2 relations
with a split, the problem goes away. (i just saw split used in another
message and realized I could use it)

On Fri, Aug 24, 2012 at 4:11 PM, Lauren Blau <
lauren.blau@digitalreasoning.com> wrote:

> fcels and fnot are both filtered from the same original relation.
>
>
> On Fri, Aug 24, 2012 at 4:11 PM, Lauren Blau <
> lauren.blau@digitalreasoning.com> wrote:
>
>> how much more. Here's the cxels:
>>
>> bigcross = join fcels by (chararray)messageId, fnot by (chararray)
>> messageId;
>> filt1 = filter bigcross by (int)fcels::astart <= (int)fnot::astart;
>> filt2 = filter filt1 by (int)fcels::aend >= (int)fnot::aend;
>>
>> cxels = foreach filt2 generate fcels::messageId as
>> messageId:chararray,fcels::astart as celstart:int,fcels::aend as
>> celend:int,fnot::alabel as notcellabel:chararray,fnot::astart as
>> notcelstart:int, fnot::aend as notcelend:int;
>>
>>
>> On Fri, Aug 24, 2012 at 3:07 PM, Jonathan Coveney <jc...@gmail.com>wrote:
>>
>>> Can you post more of your script?
>>>
>>> 2012/8/24 Lauren Blau <la...@digitalreasoning.com>
>>>
>>> > I'm running pig 0.9.2 and seeing this:
>>> >
>>> > grunt> describe cxels;
>>> > cxels: {messageId: chararray,celstart: int,celend: int,notcellabel:
>>> > chararray,notcelstart: int,notcelend: int}
>>> > grunt> gcxels = group cxels by (messageId,celstart,celend);
>>> > grunt> describe gcxels;
>>> > gcxels: {group: (messageId: chararray,notcelstart: int,notcelend:
>>> > int),cxels: {(messageId: chararray,celstart: int,celend:
>>> int,notcellabel:
>>> > chararray,notcelstart: int,notcelend: int)}}
>>> >
>>> >
>>> > why does the schema for gcxels::group show notcelstart and notcelend
>>> when I
>>> > gave it celstart,celend as the grouping fields?
>>> > Is the fieldname not being matched correctly?
>>> >
>>> > Thanks,
>>> > lauren
>>> >
>>>
>>
>>
>

Re: group schema getting wrong fields?

Posted by Lauren Blau <la...@digitalreasoning.com>.
fcels and fnot are both filtered from the same original relation.

On Fri, Aug 24, 2012 at 4:11 PM, Lauren Blau <
lauren.blau@digitalreasoning.com> wrote:

> how much more. Here's the cxels:
>
> bigcross = join fcels by (chararray)messageId, fnot by (chararray)
> messageId;
> filt1 = filter bigcross by (int)fcels::astart <= (int)fnot::astart;
> filt2 = filter filt1 by (int)fcels::aend >= (int)fnot::aend;
>
> cxels = foreach filt2 generate fcels::messageId as
> messageId:chararray,fcels::astart as celstart:int,fcels::aend as
> celend:int,fnot::alabel as notcellabel:chararray,fnot::astart as
> notcelstart:int, fnot::aend as notcelend:int;
>
>
> On Fri, Aug 24, 2012 at 3:07 PM, Jonathan Coveney <jc...@gmail.com>wrote:
>
>> Can you post more of your script?
>>
>> 2012/8/24 Lauren Blau <la...@digitalreasoning.com>
>>
>> > I'm running pig 0.9.2 and seeing this:
>> >
>> > grunt> describe cxels;
>> > cxels: {messageId: chararray,celstart: int,celend: int,notcellabel:
>> > chararray,notcelstart: int,notcelend: int}
>> > grunt> gcxels = group cxels by (messageId,celstart,celend);
>> > grunt> describe gcxels;
>> > gcxels: {group: (messageId: chararray,notcelstart: int,notcelend:
>> > int),cxels: {(messageId: chararray,celstart: int,celend:
>> int,notcellabel:
>> > chararray,notcelstart: int,notcelend: int)}}
>> >
>> >
>> > why does the schema for gcxels::group show notcelstart and notcelend
>> when I
>> > gave it celstart,celend as the grouping fields?
>> > Is the fieldname not being matched correctly?
>> >
>> > Thanks,
>> > lauren
>> >
>>
>
>

Re: group schema getting wrong fields?

Posted by Lauren Blau <la...@digitalreasoning.com>.
how much more. Here's the cxels:

bigcross = join fcels by (chararray)messageId, fnot by (chararray)
messageId;
filt1 = filter bigcross by (int)fcels::astart <= (int)fnot::astart;
filt2 = filter filt1 by (int)fcels::aend >= (int)fnot::aend;

cxels = foreach filt2 generate fcels::messageId as
messageId:chararray,fcels::astart as celstart:int,fcels::aend as
celend:int,fnot::alabel as notcellabel:chararray,fnot::astart as
notcelstart:int, fnot::aend as notcelend:int;


On Fri, Aug 24, 2012 at 3:07 PM, Jonathan Coveney <jc...@gmail.com>wrote:

> Can you post more of your script?
>
> 2012/8/24 Lauren Blau <la...@digitalreasoning.com>
>
> > I'm running pig 0.9.2 and seeing this:
> >
> > grunt> describe cxels;
> > cxels: {messageId: chararray,celstart: int,celend: int,notcellabel:
> > chararray,notcelstart: int,notcelend: int}
> > grunt> gcxels = group cxels by (messageId,celstart,celend);
> > grunt> describe gcxels;
> > gcxels: {group: (messageId: chararray,notcelstart: int,notcelend:
> > int),cxels: {(messageId: chararray,celstart: int,celend: int,notcellabel:
> > chararray,notcelstart: int,notcelend: int)}}
> >
> >
> > why does the schema for gcxels::group show notcelstart and notcelend
> when I
> > gave it celstart,celend as the grouping fields?
> > Is the fieldname not being matched correctly?
> >
> > Thanks,
> > lauren
> >
>

Re: group schema getting wrong fields?

Posted by Jonathan Coveney <jc...@gmail.com>.
Can you post more of your script?

2012/8/24 Lauren Blau <la...@digitalreasoning.com>

> I'm running pig 0.9.2 and seeing this:
>
> grunt> describe cxels;
> cxels: {messageId: chararray,celstart: int,celend: int,notcellabel:
> chararray,notcelstart: int,notcelend: int}
> grunt> gcxels = group cxels by (messageId,celstart,celend);
> grunt> describe gcxels;
> gcxels: {group: (messageId: chararray,notcelstart: int,notcelend:
> int),cxels: {(messageId: chararray,celstart: int,celend: int,notcellabel:
> chararray,notcelstart: int,notcelend: int)}}
>
>
> why does the schema for gcxels::group show notcelstart and notcelend when I
> gave it celstart,celend as the grouping fields?
> Is the fieldname not being matched correctly?
>
> Thanks,
> lauren
>