You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Mridul Muralidharan <mr...@yahoo-inc.com> on 2009/02/09 12:10:15 UTC
Pig 2.0 operators
Hi,
Have following queries while going through types func spec.
a) What does MATCHES on two bytearrays mean ? Spec says it is supported
without any comment.
b) Multiplication/Division between bag/tuple and primitives - says it is
not implemented, but what is the expectation when it does get done ?
Apply to individual fields recursively ?
c) What does CONCAT of two bytearrays mean ? Just combining both arrays
into a new larger array through array copies ? (I am assuming this is
what concat of chararray does)
d) For aggregate functions MIN and MAX, can we provide our own
comparator (udf or otherwise) for the chararrays - to define what the
relative ordering is - like using Collators, instead of always assuming
lexicographical ordering (I assume this is what it uses by default ) ?
e) In the argument construction in function section - is the semantic
change applicable only to arthematic operations ? Only to aggregate udfs
? Or to all udfs ?
What happens in this case :
employee = LOAD 'employee' AS (name, salary, bonus_multiplier);
grouped = GROUP employee BY name;
total_compensation = FOREACH grouped {
T1 = employee.salary;
T2 = employee.bonus_multiplier);
GENERATE group, myUDF(T1 * T2) --- error ?
}
Similarly, for GENERATE group, myUDF(T1, T2) above ?
Thanks,
Mridul
Re: Pig 2.0 operators
Posted by Mridul Muralidharan <mr...@yahoo-inc.com>.
Sure.
I am still going through the 50 odd udfs and the pig scripts we have to
see what is involved in porting them.
If there are no immediate suggestions/comments for the q's I raised, I
will send out a more comprehensive list with those too included later on.
Regards,
Mridul
Olga Natkovich wrote:
> It would be good to have one list with all the questions that
> documentation did not clarify for you. I am hoping it addressed more
> than just NULL issues.
>
> Olga
>
>> -----Original Message-----
>> From: Mridul Muralidharan [mailto:mridulm@yahoo-inc.com]
>> Sent: Monday, February 09, 2009 1:48 PM
>> To: pig-user@hadoop.apache.org
>> Subject: Re: Pig 2.0 operators
>>
>>
>> All questions below and in other mails where there were no
>> responses (from me or others ?).
>>
>> Thanks,
>> Mridul
>>
>> Olga Natkovich wrote:
>>> Could you please summarize the list of question that you
>> feel are not
>>> adequately covered in the document so we can address them.
>>>
>>> Thanks,
>>>
>>> Olga
>>>
>>>> -----Original Message-----
>>>> From: Mridul Muralidharan [mailto:mridulm@yahoo-inc.com]
>>>> Sent: Monday, February 09, 2009 12:23 PM
>>>> To: pig-user@hadoop.apache.org
>>>> Subject: Re: Pig 2.0 operators
>>>>
>>>>
>>>> Hi all,
>>>>
>>>> To answer some of my questions below for general audience,
>> based on
>>>> doc Olga mentioned -
>>>> http://wiki.apache.org/pig-data/attachments/FrontPage/attachme
>>> nts/plrm.htm
>>>> (someone should update spec with this, way more informative
>>>> !) ... could not find something which explained the others though.
>>>>
>>>>
>>>> Regards,
>>>> Mridul
>>>>
>>>>
>>>> Mridul Muralidharan wrote:
>>>>> Hi,
>>>>>
>>>>> Have following queries while going through types func spec.
>>>>>
>>>>>
>>>>> a) What does MATCHES on two bytearrays mean ? Spec says it is
>>>>> supported without any comment.
>>>> Though not explicitly specified, my feeling is that it is gettig
>>>> casted to chararray.
>>>>
>>>>
>>>>> b) Multiplication/Division between bag/tuple and primitives
>>>> - says it is
>>>>> not implemented, but what is the expectation when it does
>>>> get done ?
>>>>> Apply to individual fields recursively ?
>>>>>
>>>>> c) What does CONCAT of two bytearrays mean ? Just combining
>>>> both arrays
>>>>> into a new larger array through array copies ? (I am
>>>> assuming this is
>>>>> what concat of chararray does)
>>>> New array with concat'ed contents from prev two bytearrays
>> ... imo,
>>>> use with caution since it is rude concat on binary blobs.
>>>>
>>>>> d) For aggregate functions MIN and MAX, can we provide our own
>>>>> comparator (udf or otherwise) for the chararrays - to
>>>> define what the
>>>>> relative ordering is - like using Collators, instead of
>>>> always assuming
>>>>> lexicographical ordering (I assume this is what it uses by
>>>> default ) ?
>>>>> e) In the argument construction in function section - is
>>>> the semantic
>>>>> change applicable only to arthematic operations ? Only to
>>>> aggregate udfs
>>>>> ? Or to all udfs ?
>>>>>
>>>>> What happens in this case :
>>>>>
>>>>> employee = LOAD 'employee' AS (name, salary, bonus_multiplier);
>>>>> grouped = GROUP employee BY name; total_compensation = FOREACH
>>>>> grouped {
>>>>> T1 = employee.salary;
>>>>> T2 = employee.bonus_multiplier);
>>>>> GENERATE group, myUDF(T1 * T2) --- error ?
>>>>> }
>>>>> Similarly, for GENERATE group, myUDF(T1, T2) above ?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Mridul
>>
RE: Pig 2.0 operators
Posted by Olga Natkovich <ol...@yahoo-inc.com>.
It would be good to have one list with all the questions that
documentation did not clarify for you. I am hoping it addressed more
than just NULL issues.
Olga
> -----Original Message-----
> From: Mridul Muralidharan [mailto:mridulm@yahoo-inc.com]
> Sent: Monday, February 09, 2009 1:48 PM
> To: pig-user@hadoop.apache.org
> Subject: Re: Pig 2.0 operators
>
>
> All questions below and in other mails where there were no
> responses (from me or others ?).
>
> Thanks,
> Mridul
>
> Olga Natkovich wrote:
> > Could you please summarize the list of question that you
> feel are not
> > adequately covered in the document so we can address them.
> >
> > Thanks,
> >
> > Olga
> >
> >> -----Original Message-----
> >> From: Mridul Muralidharan [mailto:mridulm@yahoo-inc.com]
> >> Sent: Monday, February 09, 2009 12:23 PM
> >> To: pig-user@hadoop.apache.org
> >> Subject: Re: Pig 2.0 operators
> >>
> >>
> >> Hi all,
> >>
> >> To answer some of my questions below for general audience,
> based on
> >> doc Olga mentioned -
> >> http://wiki.apache.org/pig-data/attachments/FrontPage/attachme
> > nts/plrm.htm
> >> (someone should update spec with this, way more informative
> >> !) ... could not find something which explained the others though.
> >>
> >>
> >> Regards,
> >> Mridul
> >>
> >>
> >> Mridul Muralidharan wrote:
> >>> Hi,
> >>>
> >>> Have following queries while going through types func spec.
> >>>
> >>>
> >>> a) What does MATCHES on two bytearrays mean ? Spec says it is
> >>> supported without any comment.
> >>
> >> Though not explicitly specified, my feeling is that it is gettig
> >> casted to chararray.
> >>
> >>
> >>> b) Multiplication/Division between bag/tuple and primitives
> >> - says it is
> >>> not implemented, but what is the expectation when it does
> >> get done ?
> >>> Apply to individual fields recursively ?
> >>>
> >>> c) What does CONCAT of two bytearrays mean ? Just combining
> >> both arrays
> >>> into a new larger array through array copies ? (I am
> >> assuming this is
> >>> what concat of chararray does)
> >> New array with concat'ed contents from prev two bytearrays
> ... imo,
> >> use with caution since it is rude concat on binary blobs.
> >>
> >>> d) For aggregate functions MIN and MAX, can we provide our own
> >>> comparator (udf or otherwise) for the chararrays - to
> >> define what the
> >>> relative ordering is - like using Collators, instead of
> >> always assuming
> >>> lexicographical ordering (I assume this is what it uses by
> >> default ) ?
> >>>
> >>> e) In the argument construction in function section - is
> >> the semantic
> >>> change applicable only to arthematic operations ? Only to
> >> aggregate udfs
> >>> ? Or to all udfs ?
> >>>
> >>> What happens in this case :
> >>>
> >>> employee = LOAD 'employee' AS (name, salary, bonus_multiplier);
> >>> grouped = GROUP employee BY name; total_compensation = FOREACH
> >>> grouped {
> >>> T1 = employee.salary;
> >>> T2 = employee.bonus_multiplier);
> >>> GENERATE group, myUDF(T1 * T2) --- error ?
> >>> }
> >>> Similarly, for GENERATE group, myUDF(T1, T2) above ?
> >>>
> >>>
> >>>
> >>>
> >>> Thanks,
> >>> Mridul
> >>
>
>
Re: Pig 2.0 operators
Posted by Mridul Muralidharan <mr...@yahoo-inc.com>.
All questions below and in other mails where there were no responses
(from me or others ?).
Thanks,
Mridul
Olga Natkovich wrote:
> Could you please summarize the list of question that you feel are not
> adequately covered in the document so we can address them.
>
> Thanks,
>
> Olga
>
>> -----Original Message-----
>> From: Mridul Muralidharan [mailto:mridulm@yahoo-inc.com]
>> Sent: Monday, February 09, 2009 12:23 PM
>> To: pig-user@hadoop.apache.org
>> Subject: Re: Pig 2.0 operators
>>
>>
>> Hi all,
>>
>> To answer some of my questions below for general audience,
>> based on doc Olga mentioned -
>> http://wiki.apache.org/pig-data/attachments/FrontPage/attachme
> nts/plrm.htm
>> (someone should update spec with this, way more informative
>> !) ... could not find something which explained the others though.
>>
>>
>> Regards,
>> Mridul
>>
>>
>> Mridul Muralidharan wrote:
>>> Hi,
>>>
>>> Have following queries while going through types func spec.
>>>
>>>
>>> a) What does MATCHES on two bytearrays mean ? Spec says it is
>>> supported without any comment.
>>
>> Though not explicitly specified, my feeling is that it is
>> gettig casted to chararray.
>>
>>
>>> b) Multiplication/Division between bag/tuple and primitives
>> - says it is
>>> not implemented, but what is the expectation when it does
>> get done ?
>>> Apply to individual fields recursively ?
>>>
>>> c) What does CONCAT of two bytearrays mean ? Just combining
>> both arrays
>>> into a new larger array through array copies ? (I am
>> assuming this is
>>> what concat of chararray does)
>> New array with concat'ed contents from prev two bytearrays
>> ... imo, use
>> with caution since it is rude concat on binary blobs.
>>
>>> d) For aggregate functions MIN and MAX, can we provide our own
>>> comparator (udf or otherwise) for the chararrays - to
>> define what the
>>> relative ordering is - like using Collators, instead of
>> always assuming
>>> lexicographical ordering (I assume this is what it uses by
>> default ) ?
>>>
>>> e) In the argument construction in function section - is
>> the semantic
>>> change applicable only to arthematic operations ? Only to
>> aggregate udfs
>>> ? Or to all udfs ?
>>>
>>> What happens in this case :
>>>
>>> employee = LOAD 'employee' AS (name, salary, bonus_multiplier);
>>> grouped = GROUP employee BY name;
>>> total_compensation = FOREACH grouped {
>>> T1 = employee.salary;
>>> T2 = employee.bonus_multiplier);
>>> GENERATE group, myUDF(T1 * T2) --- error ?
>>> }
>>> Similarly, for GENERATE group, myUDF(T1, T2) above ?
>>>
>>>
>>>
>>>
>>> Thanks,
>>> Mridul
>>
RE: Pig 2.0 operators
Posted by Olga Natkovich <ol...@yahoo-inc.com>.
Could you please summarize the list of question that you feel are not
adequately covered in the document so we can address them.
Thanks,
Olga
> -----Original Message-----
> From: Mridul Muralidharan [mailto:mridulm@yahoo-inc.com]
> Sent: Monday, February 09, 2009 12:23 PM
> To: pig-user@hadoop.apache.org
> Subject: Re: Pig 2.0 operators
>
>
> Hi all,
>
> To answer some of my questions below for general audience,
> based on doc Olga mentioned -
> http://wiki.apache.org/pig-data/attachments/FrontPage/attachme
nts/plrm.htm
> (someone should update spec with this, way more informative
> !) ... could not find something which explained the others though.
>
>
> Regards,
> Mridul
>
>
> Mridul Muralidharan wrote:
> > Hi,
> >
> > Have following queries while going through types func spec.
> >
> >
> > a) What does MATCHES on two bytearrays mean ? Spec says it is
> > supported without any comment.
>
>
> Though not explicitly specified, my feeling is that it is
> gettig casted to chararray.
>
>
> >
> > b) Multiplication/Division between bag/tuple and primitives
> - says it is
> > not implemented, but what is the expectation when it does
> get done ?
> > Apply to individual fields recursively ?
> >
> > c) What does CONCAT of two bytearrays mean ? Just combining
> both arrays
> > into a new larger array through array copies ? (I am
> assuming this is
> > what concat of chararray does)
>
> New array with concat'ed contents from prev two bytearrays
> ... imo, use
> with caution since it is rude concat on binary blobs.
>
> >
> > d) For aggregate functions MIN and MAX, can we provide our own
> > comparator (udf or otherwise) for the chararrays - to
> define what the
> > relative ordering is - like using Collators, instead of
> always assuming
> > lexicographical ordering (I assume this is what it uses by
> default ) ?
> >
> >
> > e) In the argument construction in function section - is
> the semantic
> > change applicable only to arthematic operations ? Only to
> aggregate udfs
> > ? Or to all udfs ?
> >
> > What happens in this case :
> >
> > employee = LOAD 'employee' AS (name, salary, bonus_multiplier);
> > grouped = GROUP employee BY name;
> > total_compensation = FOREACH grouped {
> > T1 = employee.salary;
> > T2 = employee.bonus_multiplier);
> > GENERATE group, myUDF(T1 * T2) --- error ?
> > }
> > Similarly, for GENERATE group, myUDF(T1, T2) above ?
> >
> >
> >
> >
> > Thanks,
> > Mridul
>
>
Re: Pig 2.0 operators
Posted by Mridul Muralidharan <mr...@yahoo-inc.com>.
Hi all,
To answer some of my questions below for general audience, based on doc
Olga mentioned -
http://wiki.apache.org/pig-data/attachments/FrontPage/attachments/plrm.htm
(someone should update spec with this, way more informative !) ... could
not find something which explained the others though.
Regards,
Mridul
Mridul Muralidharan wrote:
> Hi,
>
> Have following queries while going through types func spec.
>
>
> a) What does MATCHES on two bytearrays mean ? Spec says it is supported
> without any comment.
Though not explicitly specified, my feeling is that it is gettig casted
to chararray.
>
> b) Multiplication/Division between bag/tuple and primitives - says it is
> not implemented, but what is the expectation when it does get done ?
> Apply to individual fields recursively ?
>
> c) What does CONCAT of two bytearrays mean ? Just combining both arrays
> into a new larger array through array copies ? (I am assuming this is
> what concat of chararray does)
New array with concat'ed contents from prev two bytearrays ... imo, use
with caution since it is rude concat on binary blobs.
>
> d) For aggregate functions MIN and MAX, can we provide our own
> comparator (udf or otherwise) for the chararrays - to define what the
> relative ordering is - like using Collators, instead of always assuming
> lexicographical ordering (I assume this is what it uses by default ) ?
>
>
> e) In the argument construction in function section - is the semantic
> change applicable only to arthematic operations ? Only to aggregate udfs
> ? Or to all udfs ?
>
> What happens in this case :
>
> employee = LOAD 'employee' AS (name, salary, bonus_multiplier);
> grouped = GROUP employee BY name;
> total_compensation = FOREACH grouped {
> T1 = employee.salary;
> T2 = employee.bonus_multiplier);
> GENERATE group, myUDF(T1 * T2) --- error ?
> }
> Similarly, for GENERATE group, myUDF(T1, T2) above ?
>
>
>
>
> Thanks,
> Mridul