You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Jameson Li <ho...@gmail.com> on 2011/06/02 15:28:07 UTC

Re: how to operate a map type

Hi,

my pig code is like this:
register myudf.jar
a = load 'testurls' as (info:chararray);
b = foreach a generate info,com.company.pig.GetInfoScore($0) as m;
dump b;

The output is like this:
(65RFPRO800863GPT,[108#0.2])
(6JL6U6EA00863J0J,[352#0.5,25#0.15,108#0.07,26#0.06,4#0.16])
(6B7FF3E300052E97,[25#0.28,405#0.05,4#0.05])
(5498267_31,[108#0.05,25#0.19,12#0.19])

And I want to group by the map key, and count the info, just like the below
output:
108  3        /*65RFPRO800863GPT   6JL6U6EA00863J0J   5498267_31 */
352  1        /*6JL6U6EA00863J0J*/
25    3        /*6JL6U6EA00863J0J  6B7FF3E300052E97 5498267_31 */
26    1        /*6JL6U6EA00863J0J*/
4      2        /*6JL6U6EA00863J0J   6B7FF3E300052E97*/
405   1       /*6B7FF3E300052E97*/
12     1       /*5498267_31*/

I have a think that I have to split the map to many rows just as the below:
(65RFPRO800863GPT, 108, 0.2)
(6JL6U6EA00863J0J, 352, 0.5)
(6JL6U6EA00863J0J, 25, 0.15)
(6JL6U6EA00863J0J, 108, 0.07)
(6JL6U6EA00863J0J, 26, 0.06)
(6JL6U6EA00863J0J, 4, 0.16)
(6B7FF3E300052E97, 25, 0.28)
(6JL6U6EA00863J0J, 405, 0.05)
(6JL6U6EA00863J0J, 4, 0.05)
(5498267_31, 108, 0.05)
(6JL6U6EA00863J0J, 25, 0.19)
(6JL6U6EA00863J0J, 12, 0.19)

And then it is easy to group and count.
Am I right?
I have no idea how to split the map to many rows as the above show.
Help.

Thanks.

2011/5/25 Alan Gates <ga...@yahoo-inc.com>

> Can't you mimic dynamic key support with static keys by making your map
> have two static keys 'key' and 'value'?
>
> Alan.
>
>
> On May 24, 2011, at 3:05 AM, Jameson Li wrote:
>
>  OK.OK.I know that just write UDFs.
>> I have to write UDFs, and see you......
>> And I still think there should be grammar support for map operation both
>> static key and dynamic key.............
>>
>> Thanks.
>>
>> 2011/5/24 Daniel Dai <da...@earthlink.net>
>>
>>  GetKey(m) already get the key, so you can filter the key. For value, you
>>> may need to put into UDF.
>>>
>>> Grammar support for map is based on static key, eg: m#'key1'. Your use
>>> case
>>> is mostly dealing dynamic keys, which you may rely on yourself currently.
>>>
>>> Daniel
>>>
>>> -----Original Message----- From: Jameson Li
>>> Sent: Monday, May 23, 2011 7:07 PM
>>> To: Daniel Dai
>>> Cc: user@pig.apache.org
>>> Subject: Re: how to operate a map type
>>>
>>>
>>> And how to filter a map key or a map value? And also only UDF?
>>>
>>> b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1) as m;
>>> c = filter b by m.key == 'aaa' or m.value> 0.2;
>>>
>>> How could I write the code?
>>> Any other way without writing UDF?
>>>
>>> And I have a doubt since only writing UDF can operate a map type, why not
>>> have the official functions about the map type?
>>>
>>> Thanks.
>>>
>>> 2011/5/24 Daniel Dai <ji...@yahoo-inc.com>
>>>
>>> I cannot think of a way without writing UDF. You can write two UDF:
>>>
>>>> * GetKey, input a map, output the key of the map
>>>> * GetValues, input a bag of map, output a bag of map values
>>>>
>>>> The script is like:
>>>> b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1) as m;
>>>> c = foreach b generate GetKey(m) as key, m;
>>>> d = group c by key;
>>>> e = foreach c generate group, SUM(GetValues(c.m));
>>>>
>>>>
>>>> Daniel
>>>>
>>>>
>>>> On 05/23/2011 07:06 AM, Jameson Li wrote:
>>>>
>>>> Hi all,
>>>>
>>>>>
>>>>> I have the below pig code:
>>>>>
>>>>> register /home/uu/project/lib/pigudfs.jar
>>>>> ruls = load 'testurl' as (url:chararray);
>>>>>
>>>>> b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1);
>>>>>
>>>>> here when dump b, it will return:
>>>>> ([4#0.1677963])
>>>>> ([193#0.16985779,81#0.10994483])
>>>>> ([418#0.14138427,9#0.1107544,282#0.18699136])
>>>>>
>>>>> I just want group by the map key and sum the map value just like:
>>>>> c = group b by $0#key;
>>>>> d = foreach c generate group,SUM(b.$0#value);
>>>>>
>>>>> How could I write the code?
>>>>>
>>>>> Thanks,
>>>>> Jameson Li.
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>

Re: how to operate a map type

Posted by Thejas M Nair <te...@yahoo-inc.com>.
Another alternative is to write a udf that returns all keys in a map as a bag.
I think this will be useful addition to piggybank. It will also be useful to have getEntries(Map), getValues(Map) udfs in piggybank.
If you choose this option and you are in a position to contribute the udf code, please do so.

Thanks,
Thejas





On 6/2/11 8:55 AM, "Xiaomeng Wan" <sh...@gmail.com> wrote:

can't you udf return a bag of tuple with two fields (ie key and
value), then flatten it?

Shawn

On Thu, Jun 2, 2011 at 7:28 AM, Jameson Li <ho...@gmail.com> wrote:
> Hi,
>
> my pig code is like this:
> register myudf.jar
> a = load 'testurls' as (info:chararray);
> b = foreach a generate info,com.company.pig.GetInfoScore($0) as m;
> dump b;
>
> The output is like this:
> (65RFPRO800863GPT,[108#0.2])
> (6JL6U6EA00863J0J,[352#0.5,25#0.15,108#0.07,26#0.06,4#0.16])
> (6B7FF3E300052E97,[25#0.28,405#0.05,4#0.05])
> (5498267_31,[108#0.05,25#0.19,12#0.19])
>
> And I want to group by the map key, and count the info, just like the below
> output:
> 108  3        /*65RFPRO800863GPT   6JL6U6EA00863J0J   5498267_31 */
> 352  1        /*6JL6U6EA00863J0J*/
> 25    3        /*6JL6U6EA00863J0J  6B7FF3E300052E97 5498267_31 */
> 26    1        /*6JL6U6EA00863J0J*/
> 4      2        /*6JL6U6EA00863J0J   6B7FF3E300052E97*/
> 405   1       /*6B7FF3E300052E97*/
> 12     1       /*5498267_31*/
>
> I have a think that I have to split the map to many rows just as the below:
> (65RFPRO800863GPT, 108, 0.2)
> (6JL6U6EA00863J0J, 352, 0.5)
> (6JL6U6EA00863J0J, 25, 0.15)
> (6JL6U6EA00863J0J, 108, 0.07)
> (6JL6U6EA00863J0J, 26, 0.06)
> (6JL6U6EA00863J0J, 4, 0.16)
> (6B7FF3E300052E97, 25, 0.28)
> (6JL6U6EA00863J0J, 405, 0.05)
> (6JL6U6EA00863J0J, 4, 0.05)
> (5498267_31, 108, 0.05)
> (6JL6U6EA00863J0J, 25, 0.19)
> (6JL6U6EA00863J0J, 12, 0.19)
>
> And then it is easy to group and count.
> Am I right?
> I have no idea how to split the map to many rows as the above show.
> Help.
>
> Thanks.
>
> 2011/5/25 Alan Gates <ga...@yahoo-inc.com>
>
>> Can't you mimic dynamic key support with static keys by making your map
>> have two static keys 'key' and 'value'?
>>
>> Alan.
>>
>>
>> On May 24, 2011, at 3:05 AM, Jameson Li wrote:
>>
>>  OK.OK.I know that just write UDFs.
>>> I have to write UDFs, and see you......
>>> And I still think there should be grammar support for map operation both
>>> static key and dynamic key.............
>>>
>>> Thanks.
>>>
>>> 2011/5/24 Daniel Dai <da...@earthlink.net>
>>>
>>>  GetKey(m) already get the key, so you can filter the key. For value, you
>>>> may need to put into UDF.
>>>>
>>>> Grammar support for map is based on static key, eg: m#'key1'. Your use
>>>> case
>>>> is mostly dealing dynamic keys, which you may rely on yourself currently.
>>>>
>>>> Daniel
>>>>
>>>> -----Original Message----- From: Jameson Li
>>>> Sent: Monday, May 23, 2011 7:07 PM
>>>> To: Daniel Dai
>>>> Cc: user@pig.apache.org
>>>> Subject: Re: how to operate a map type
>>>>
>>>>
>>>> And how to filter a map key or a map value? And also only UDF?
>>>>
>>>> b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1) as m;
>>>> c = filter b by m.key == 'aaa' or m.value> 0.2;
>>>>
>>>> How could I write the code?
>>>> Any other way without writing UDF?
>>>>
>>>> And I have a doubt since only writing UDF can operate a map type, why not
>>>> have the official functions about the map type?
>>>>
>>>> Thanks.
>>>>
>>>> 2011/5/24 Daniel Dai <ji...@yahoo-inc.com>
>>>>
>>>> I cannot think of a way without writing UDF. You can write two UDF:
>>>>
>>>>> * GetKey, input a map, output the key of the map
>>>>> * GetValues, input a bag of map, output a bag of map values
>>>>>
>>>>> The script is like:
>>>>> b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1) as m;
>>>>> c = foreach b generate GetKey(m) as key, m;
>>>>> d = group c by key;
>>>>> e = foreach c generate group, SUM(GetValues(c.m));
>>>>>
>>>>>
>>>>> Daniel
>>>>>
>>>>>
>>>>> On 05/23/2011 07:06 AM, Jameson Li wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>>>
>>>>>> I have the below pig code:
>>>>>>
>>>>>> register /home/uu/project/lib/pigudfs.jar
>>>>>> ruls = load 'testurl' as (url:chararray);
>>>>>>
>>>>>> b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1);
>>>>>>
>>>>>> here when dump b, it will return:
>>>>>> ([4#0.1677963])
>>>>>> ([193#0.16985779,81#0.10994483])
>>>>>> ([418#0.14138427,9#0.1107544,282#0.18699136])
>>>>>>
>>>>>> I just want group by the map key and sum the map value just like:
>>>>>> c = group b by $0#key;
>>>>>> d = foreach c generate group,SUM(b.$0#value);
>>>>>>
>>>>>> How could I write the code?
>>>>>>
>>>>>> Thanks,
>>>>>> Jameson Li.
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>
>



--


Re: how to operate a map type

Posted by Xiaomeng Wan <sh...@gmail.com>.
can't you udf return a bag of tuple with two fields (ie key and
value), then flatten it?

Shawn

On Thu, Jun 2, 2011 at 7:28 AM, Jameson Li <ho...@gmail.com> wrote:
> Hi,
>
> my pig code is like this:
> register myudf.jar
> a = load 'testurls' as (info:chararray);
> b = foreach a generate info,com.company.pig.GetInfoScore($0) as m;
> dump b;
>
> The output is like this:
> (65RFPRO800863GPT,[108#0.2])
> (6JL6U6EA00863J0J,[352#0.5,25#0.15,108#0.07,26#0.06,4#0.16])
> (6B7FF3E300052E97,[25#0.28,405#0.05,4#0.05])
> (5498267_31,[108#0.05,25#0.19,12#0.19])
>
> And I want to group by the map key, and count the info, just like the below
> output:
> 108  3        /*65RFPRO800863GPT   6JL6U6EA00863J0J   5498267_31 */
> 352  1        /*6JL6U6EA00863J0J*/
> 25    3        /*6JL6U6EA00863J0J  6B7FF3E300052E97 5498267_31 */
> 26    1        /*6JL6U6EA00863J0J*/
> 4      2        /*6JL6U6EA00863J0J   6B7FF3E300052E97*/
> 405   1       /*6B7FF3E300052E97*/
> 12     1       /*5498267_31*/
>
> I have a think that I have to split the map to many rows just as the below:
> (65RFPRO800863GPT, 108, 0.2)
> (6JL6U6EA00863J0J, 352, 0.5)
> (6JL6U6EA00863J0J, 25, 0.15)
> (6JL6U6EA00863J0J, 108, 0.07)
> (6JL6U6EA00863J0J, 26, 0.06)
> (6JL6U6EA00863J0J, 4, 0.16)
> (6B7FF3E300052E97, 25, 0.28)
> (6JL6U6EA00863J0J, 405, 0.05)
> (6JL6U6EA00863J0J, 4, 0.05)
> (5498267_31, 108, 0.05)
> (6JL6U6EA00863J0J, 25, 0.19)
> (6JL6U6EA00863J0J, 12, 0.19)
>
> And then it is easy to group and count.
> Am I right?
> I have no idea how to split the map to many rows as the above show.
> Help.
>
> Thanks.
>
> 2011/5/25 Alan Gates <ga...@yahoo-inc.com>
>
>> Can't you mimic dynamic key support with static keys by making your map
>> have two static keys 'key' and 'value'?
>>
>> Alan.
>>
>>
>> On May 24, 2011, at 3:05 AM, Jameson Li wrote:
>>
>>  OK.OK.I know that just write UDFs.
>>> I have to write UDFs, and see you......
>>> And I still think there should be grammar support for map operation both
>>> static key and dynamic key.............
>>>
>>> Thanks.
>>>
>>> 2011/5/24 Daniel Dai <da...@earthlink.net>
>>>
>>>  GetKey(m) already get the key, so you can filter the key. For value, you
>>>> may need to put into UDF.
>>>>
>>>> Grammar support for map is based on static key, eg: m#'key1'. Your use
>>>> case
>>>> is mostly dealing dynamic keys, which you may rely on yourself currently.
>>>>
>>>> Daniel
>>>>
>>>> -----Original Message----- From: Jameson Li
>>>> Sent: Monday, May 23, 2011 7:07 PM
>>>> To: Daniel Dai
>>>> Cc: user@pig.apache.org
>>>> Subject: Re: how to operate a map type
>>>>
>>>>
>>>> And how to filter a map key or a map value? And also only UDF?
>>>>
>>>> b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1) as m;
>>>> c = filter b by m.key == 'aaa' or m.value> 0.2;
>>>>
>>>> How could I write the code?
>>>> Any other way without writing UDF?
>>>>
>>>> And I have a doubt since only writing UDF can operate a map type, why not
>>>> have the official functions about the map type?
>>>>
>>>> Thanks.
>>>>
>>>> 2011/5/24 Daniel Dai <ji...@yahoo-inc.com>
>>>>
>>>> I cannot think of a way without writing UDF. You can write two UDF:
>>>>
>>>>> * GetKey, input a map, output the key of the map
>>>>> * GetValues, input a bag of map, output a bag of map values
>>>>>
>>>>> The script is like:
>>>>> b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1) as m;
>>>>> c = foreach b generate GetKey(m) as key, m;
>>>>> d = group c by key;
>>>>> e = foreach c generate group, SUM(GetValues(c.m));
>>>>>
>>>>>
>>>>> Daniel
>>>>>
>>>>>
>>>>> On 05/23/2011 07:06 AM, Jameson Li wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>>>
>>>>>> I have the below pig code:
>>>>>>
>>>>>> register /home/uu/project/lib/pigudfs.jar
>>>>>> ruls = load 'testurl' as (url:chararray);
>>>>>>
>>>>>> b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1);
>>>>>>
>>>>>> here when dump b, it will return:
>>>>>> ([4#0.1677963])
>>>>>> ([193#0.16985779,81#0.10994483])
>>>>>> ([418#0.14138427,9#0.1107544,282#0.18699136])
>>>>>>
>>>>>> I just want group by the map key and sum the map value just like:
>>>>>> c = group b by $0#key;
>>>>>> d = foreach c generate group,SUM(b.$0#value);
>>>>>>
>>>>>> How could I write the code?
>>>>>>
>>>>>> Thanks,
>>>>>> Jameson Li.
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>
>