You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by David LaBarbera <da...@localresponse.com> on 2012/10/30 18:22:34 UTC

force schema with TOBAG

I have a cogroup which effectively does a full outer join of two relations. Some of the relations are blank, so I have a FOREACH statement like

grouped = COGROUP relation1 BY x, relation2 BY y;
normalized = FOREACH grouped {
   normal1 = TOBAG('$ID_NULL', 0L);
   value1 = ( IsEmpty(relation1) ? normal1 : relation1 );
   GENERATE relation1, relation2;
}

I get an error on the bincond that left and right schemas don't match. I'm informed that TOBAG return 
bag{:tuple(:NULL)}
for the schema
and relation1 is
bag{:tuple(id:chararrat,timestamp:long)}

I'm running this on EMR which has a modified version of .0.9.2. Any thoughts on how to force TOBAG's schema to match relation1's?

David


Re: force schema with TOBAG

Posted by Cheolsoo Park <ch...@cloudera.com>.
Great! Thanks!

On Wed, Oct 31, 2012 at 9:54 AM, David LaBarbera <
davidlabarbera@localresponse.com> wrote:

> Cheolsoo
>
> That works. Thanks so much for the help.
> And congratulations on your new committer status!
>
> David
>
> On Oct 31, 2012, at 12:23 PM, Cheolsoo Park <ch...@cloudera.com> wrote:
>
> > Hi David,
> >
> > How about "TOBAG( TOTUPLE( $ID_NULL, 0L ) )" ? The "( )" is just
> > a syntactical sugar for "TOTUPLE( )" that was introduced in 0.10. (Sorry
> > that I forgot that "( )" doesn't work in 0.9.)
> >
> > Thanks,
> > Cheolsoo
> >
> > On Wed, Oct 31, 2012 at 4:53 AM, David LaBarbera <
> > davidlabarbera@localresponse.com> wrote:
> >
> >> Cheolsoo
> >>
> >> Thank you for the response. This works in 0.10, but not on 0.9.2-amzn. I
> >> get an error message that there's an unexpected symbol at or near
> $ID_NULL
> >> (ID_NULL is being replaced, I just thought it would be more clear here)
> >>
> >> David
> >>
> >> On Oct 30, 2012, at 2:29 PM, Cheolsoo Park <ch...@cloudera.com>
> wrote:
> >>
> >>> Hi David,
> >>>
> >>> Try to *add parentheses*  inside the TOBAG:
> >>>
> >>> normal1 = TOBAG( ('$ID_NULL', 0L) );
> >>>
> >>> or
> >>>
> >>> value1 = ( IsEmpty(relation1) ? TOBAG( ('$ID_NULL', 0L) ) : relation1
> );
> >>>
> >>> The reason is because by TOBAG('$ID_NULL', 0L), you mean {
> ('$ID_NULL'),
> >>> (0) }. But I believe that what you want is { ('$ID_NULL', 0) } given
> the
> >>> schema of relation 1.
> >>>
> >>> Thanks,
> >>> Cheolsoo
> >>>
> >>> On Tue, Oct 30, 2012 at 10:22 AM, David LaBarbera <
> >>> davidlabarbera@localresponse.com> wrote:
> >>>
> >>>> I have a cogroup which effectively does a full outer join of two
> >>>> relations. Some of the relations are blank, so I have a FOREACH
> >> statement
> >>>> like
> >>>>
> >>>> grouped = COGROUP relation1 BY x, relation2 BY y;
> >>>> normalized = FOREACH grouped {
> >>>>  normal1 = TOBAG('$ID_NULL', 0L);
> >>>>  value1 = ( IsEmpty(relation1) ? normal1 : relation1 );
> >>>>  GENERATE relation1, relation2;
> >>>> }
> >>>>
> >>>> I get an error on the bincond that left and right schemas don't match.
> >> I'm
> >>>> informed that TOBAG return
> >>>> bag{:tuple(:NULL)}
> >>>> for the schema
> >>>> and relation1 is
> >>>> bag{:tuple(id:chararrat,timestamp:long)}
> >>>>
> >>>> I'm running this on EMR which has a modified version of .0.9.2. Any
> >>>> thoughts on how to force TOBAG's schema to match relation1's?
> >>>>
> >>>> David
> >>>>
> >>>>
> >>
> >>
>
>

Re: force schema with TOBAG

Posted by David LaBarbera <da...@localresponse.com>.
Cheolsoo

That works. Thanks so much for the help.
And congratulations on your new committer status!

David

On Oct 31, 2012, at 12:23 PM, Cheolsoo Park <ch...@cloudera.com> wrote:

> Hi David,
> 
> How about "TOBAG( TOTUPLE( $ID_NULL, 0L ) )" ? The "( )" is just
> a syntactical sugar for "TOTUPLE( )" that was introduced in 0.10. (Sorry
> that I forgot that "( )" doesn't work in 0.9.)
> 
> Thanks,
> Cheolsoo
> 
> On Wed, Oct 31, 2012 at 4:53 AM, David LaBarbera <
> davidlabarbera@localresponse.com> wrote:
> 
>> Cheolsoo
>> 
>> Thank you for the response. This works in 0.10, but not on 0.9.2-amzn. I
>> get an error message that there's an unexpected symbol at or near $ID_NULL
>> (ID_NULL is being replaced, I just thought it would be more clear here)
>> 
>> David
>> 
>> On Oct 30, 2012, at 2:29 PM, Cheolsoo Park <ch...@cloudera.com> wrote:
>> 
>>> Hi David,
>>> 
>>> Try to *add parentheses*  inside the TOBAG:
>>> 
>>> normal1 = TOBAG( ('$ID_NULL', 0L) );
>>> 
>>> or
>>> 
>>> value1 = ( IsEmpty(relation1) ? TOBAG( ('$ID_NULL', 0L) ) : relation1 );
>>> 
>>> The reason is because by TOBAG('$ID_NULL', 0L), you mean { ('$ID_NULL'),
>>> (0) }. But I believe that what you want is { ('$ID_NULL', 0) } given the
>>> schema of relation 1.
>>> 
>>> Thanks,
>>> Cheolsoo
>>> 
>>> On Tue, Oct 30, 2012 at 10:22 AM, David LaBarbera <
>>> davidlabarbera@localresponse.com> wrote:
>>> 
>>>> I have a cogroup which effectively does a full outer join of two
>>>> relations. Some of the relations are blank, so I have a FOREACH
>> statement
>>>> like
>>>> 
>>>> grouped = COGROUP relation1 BY x, relation2 BY y;
>>>> normalized = FOREACH grouped {
>>>>  normal1 = TOBAG('$ID_NULL', 0L);
>>>>  value1 = ( IsEmpty(relation1) ? normal1 : relation1 );
>>>>  GENERATE relation1, relation2;
>>>> }
>>>> 
>>>> I get an error on the bincond that left and right schemas don't match.
>> I'm
>>>> informed that TOBAG return
>>>> bag{:tuple(:NULL)}
>>>> for the schema
>>>> and relation1 is
>>>> bag{:tuple(id:chararrat,timestamp:long)}
>>>> 
>>>> I'm running this on EMR which has a modified version of .0.9.2. Any
>>>> thoughts on how to force TOBAG's schema to match relation1's?
>>>> 
>>>> David
>>>> 
>>>> 
>> 
>> 


Re: force schema with TOBAG

Posted by Cheolsoo Park <ch...@cloudera.com>.
Hi David,

How about "TOBAG( TOTUPLE( $ID_NULL, 0L ) )" ? The "( )" is just
a syntactical sugar for "TOTUPLE( )" that was introduced in 0.10. (Sorry
that I forgot that "( )" doesn't work in 0.9.)

Thanks,
Cheolsoo

On Wed, Oct 31, 2012 at 4:53 AM, David LaBarbera <
davidlabarbera@localresponse.com> wrote:

> Cheolsoo
>
> Thank you for the response. This works in 0.10, but not on 0.9.2-amzn. I
> get an error message that there's an unexpected symbol at or near $ID_NULL
> (ID_NULL is being replaced, I just thought it would be more clear here)
>
> David
>
> On Oct 30, 2012, at 2:29 PM, Cheolsoo Park <ch...@cloudera.com> wrote:
>
> > Hi David,
> >
> > Try to *add parentheses*  inside the TOBAG:
> >
> > normal1 = TOBAG( ('$ID_NULL', 0L) );
> >
> > or
> >
> > value1 = ( IsEmpty(relation1) ? TOBAG( ('$ID_NULL', 0L) ) : relation1 );
> >
> > The reason is because by TOBAG('$ID_NULL', 0L), you mean { ('$ID_NULL'),
> > (0) }. But I believe that what you want is { ('$ID_NULL', 0) } given the
> > schema of relation 1.
> >
> > Thanks,
> > Cheolsoo
> >
> > On Tue, Oct 30, 2012 at 10:22 AM, David LaBarbera <
> > davidlabarbera@localresponse.com> wrote:
> >
> >> I have a cogroup which effectively does a full outer join of two
> >> relations. Some of the relations are blank, so I have a FOREACH
> statement
> >> like
> >>
> >> grouped = COGROUP relation1 BY x, relation2 BY y;
> >> normalized = FOREACH grouped {
> >>   normal1 = TOBAG('$ID_NULL', 0L);
> >>   value1 = ( IsEmpty(relation1) ? normal1 : relation1 );
> >>   GENERATE relation1, relation2;
> >> }
> >>
> >> I get an error on the bincond that left and right schemas don't match.
> I'm
> >> informed that TOBAG return
> >> bag{:tuple(:NULL)}
> >> for the schema
> >> and relation1 is
> >> bag{:tuple(id:chararrat,timestamp:long)}
> >>
> >> I'm running this on EMR which has a modified version of .0.9.2. Any
> >> thoughts on how to force TOBAG's schema to match relation1's?
> >>
> >> David
> >>
> >>
>
>

Re: force schema with TOBAG

Posted by David LaBarbera <da...@localresponse.com>.
Cheolsoo

Thank you for the response. This works in 0.10, but not on 0.9.2-amzn. I get an error message that there's an unexpected symbol at or near $ID_NULL (ID_NULL is being replaced, I just thought it would be more clear here)

David

On Oct 30, 2012, at 2:29 PM, Cheolsoo Park <ch...@cloudera.com> wrote:

> Hi David,
> 
> Try to *add parentheses*  inside the TOBAG:
> 
> normal1 = TOBAG( ('$ID_NULL', 0L) );
> 
> or
> 
> value1 = ( IsEmpty(relation1) ? TOBAG( ('$ID_NULL', 0L) ) : relation1 );
> 
> The reason is because by TOBAG('$ID_NULL', 0L), you mean { ('$ID_NULL'),
> (0) }. But I believe that what you want is { ('$ID_NULL', 0) } given the
> schema of relation 1.
> 
> Thanks,
> Cheolsoo
> 
> On Tue, Oct 30, 2012 at 10:22 AM, David LaBarbera <
> davidlabarbera@localresponse.com> wrote:
> 
>> I have a cogroup which effectively does a full outer join of two
>> relations. Some of the relations are blank, so I have a FOREACH statement
>> like
>> 
>> grouped = COGROUP relation1 BY x, relation2 BY y;
>> normalized = FOREACH grouped {
>>   normal1 = TOBAG('$ID_NULL', 0L);
>>   value1 = ( IsEmpty(relation1) ? normal1 : relation1 );
>>   GENERATE relation1, relation2;
>> }
>> 
>> I get an error on the bincond that left and right schemas don't match. I'm
>> informed that TOBAG return
>> bag{:tuple(:NULL)}
>> for the schema
>> and relation1 is
>> bag{:tuple(id:chararrat,timestamp:long)}
>> 
>> I'm running this on EMR which has a modified version of .0.9.2. Any
>> thoughts on how to force TOBAG's schema to match relation1's?
>> 
>> David
>> 
>> 


Re: force schema with TOBAG

Posted by Cheolsoo Park <ch...@cloudera.com>.
Hi David,

Try to *add parentheses*  inside the TOBAG:

normal1 = TOBAG( ('$ID_NULL', 0L) );

or

value1 = ( IsEmpty(relation1) ? TOBAG( ('$ID_NULL', 0L) ) : relation1 );

The reason is because by TOBAG('$ID_NULL', 0L), you mean { ('$ID_NULL'),
(0) }. But I believe that what you want is { ('$ID_NULL', 0) } given the
schema of relation 1.

Thanks,
Cheolsoo

On Tue, Oct 30, 2012 at 10:22 AM, David LaBarbera <
davidlabarbera@localresponse.com> wrote:

> I have a cogroup which effectively does a full outer join of two
> relations. Some of the relations are blank, so I have a FOREACH statement
> like
>
> grouped = COGROUP relation1 BY x, relation2 BY y;
> normalized = FOREACH grouped {
>    normal1 = TOBAG('$ID_NULL', 0L);
>    value1 = ( IsEmpty(relation1) ? normal1 : relation1 );
>    GENERATE relation1, relation2;
> }
>
> I get an error on the bincond that left and right schemas don't match. I'm
> informed that TOBAG return
> bag{:tuple(:NULL)}
> for the schema
> and relation1 is
> bag{:tuple(id:chararrat,timestamp:long)}
>
> I'm running this on EMR which has a modified version of .0.9.2. Any
> thoughts on how to force TOBAG's schema to match relation1's?
>
> David
>
>