You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Jameson Li <ho...@gmail.com> on 2011/05/23 16:06:10 UTC

how to operate a map type

Hi all,

I have the below pig code:

register /home/uu/project/lib/pigudfs.jar
ruls = load 'testurl' as (url:chararray);

b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1);

here when dump b, it will return:
([4#0.1677963])
([193#0.16985779,81#0.10994483])
([418#0.14138427,9#0.1107544,282#0.18699136])

I just want group by the map key and sum the map value just like:
c = group b by $0#key;
d = foreach c generate group,SUM(b.$0#value);

How could I write the code?

Thanks,
Jameson Li.

Re: how to operate a map type

Posted by Thejas M Nair <te...@yahoo-inc.com>.
Another alternative is to write a udf that returns all keys in a map as a bag.
I think this will be useful addition to piggybank. It will also be useful to have getEntries(Map), getValues(Map) udfs in piggybank.
If you choose this option and you are in a position to contribute the udf code, please do so.

Thanks,
Thejas





On 6/2/11 8:55 AM, "Xiaomeng Wan" <sh...@gmail.com> wrote:

can't you udf return a bag of tuple with two fields (ie key and
value), then flatten it?

Shawn

On Thu, Jun 2, 2011 at 7:28 AM, Jameson Li <ho...@gmail.com> wrote:
> Hi,
>
> my pig code is like this:
> register myudf.jar
> a = load 'testurls' as (info:chararray);
> b = foreach a generate info,com.company.pig.GetInfoScore($0) as m;
> dump b;
>
> The output is like this:
> (65RFPRO800863GPT,[108#0.2])
> (6JL6U6EA00863J0J,[352#0.5,25#0.15,108#0.07,26#0.06,4#0.16])
> (6B7FF3E300052E97,[25#0.28,405#0.05,4#0.05])
> (5498267_31,[108#0.05,25#0.19,12#0.19])
>
> And I want to group by the map key, and count the info, just like the below
> output:
> 108  3        /*65RFPRO800863GPT   6JL6U6EA00863J0J   5498267_31 */
> 352  1        /*6JL6U6EA00863J0J*/
> 25    3        /*6JL6U6EA00863J0J  6B7FF3E300052E97 5498267_31 */
> 26    1        /*6JL6U6EA00863J0J*/
> 4      2        /*6JL6U6EA00863J0J   6B7FF3E300052E97*/
> 405   1       /*6B7FF3E300052E97*/
> 12     1       /*5498267_31*/
>
> I have a think that I have to split the map to many rows just as the below:
> (65RFPRO800863GPT, 108, 0.2)
> (6JL6U6EA00863J0J, 352, 0.5)
> (6JL6U6EA00863J0J, 25, 0.15)
> (6JL6U6EA00863J0J, 108, 0.07)
> (6JL6U6EA00863J0J, 26, 0.06)
> (6JL6U6EA00863J0J, 4, 0.16)
> (6B7FF3E300052E97, 25, 0.28)
> (6JL6U6EA00863J0J, 405, 0.05)
> (6JL6U6EA00863J0J, 4, 0.05)
> (5498267_31, 108, 0.05)
> (6JL6U6EA00863J0J, 25, 0.19)
> (6JL6U6EA00863J0J, 12, 0.19)
>
> And then it is easy to group and count.
> Am I right?
> I have no idea how to split the map to many rows as the above show.
> Help.
>
> Thanks.
>
> 2011/5/25 Alan Gates <ga...@yahoo-inc.com>
>
>> Can't you mimic dynamic key support with static keys by making your map
>> have two static keys 'key' and 'value'?
>>
>> Alan.
>>
>>
>> On May 24, 2011, at 3:05 AM, Jameson Li wrote:
>>
>>  OK.OK.I know that just write UDFs.
>>> I have to write UDFs, and see you......
>>> And I still think there should be grammar support for map operation both
>>> static key and dynamic key.............
>>>
>>> Thanks.
>>>
>>> 2011/5/24 Daniel Dai <da...@earthlink.net>
>>>
>>>  GetKey(m) already get the key, so you can filter the key. For value, you
>>>> may need to put into UDF.
>>>>
>>>> Grammar support for map is based on static key, eg: m#'key1'. Your use
>>>> case
>>>> is mostly dealing dynamic keys, which you may rely on yourself currently.
>>>>
>>>> Daniel
>>>>
>>>> -----Original Message----- From: Jameson Li
>>>> Sent: Monday, May 23, 2011 7:07 PM
>>>> To: Daniel Dai
>>>> Cc: user@pig.apache.org
>>>> Subject: Re: how to operate a map type
>>>>
>>>>
>>>> And how to filter a map key or a map value? And also only UDF?
>>>>
>>>> b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1) as m;
>>>> c = filter b by m.key == 'aaa' or m.value> 0.2;
>>>>
>>>> How could I write the code?
>>>> Any other way without writing UDF?
>>>>
>>>> And I have a doubt since only writing UDF can operate a map type, why not
>>>> have the official functions about the map type?
>>>>
>>>> Thanks.
>>>>
>>>> 2011/5/24 Daniel Dai <ji...@yahoo-inc.com>
>>>>
>>>> I cannot think of a way without writing UDF. You can write two UDF:
>>>>
>>>>> * GetKey, input a map, output the key of the map
>>>>> * GetValues, input a bag of map, output a bag of map values
>>>>>
>>>>> The script is like:
>>>>> b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1) as m;
>>>>> c = foreach b generate GetKey(m) as key, m;
>>>>> d = group c by key;
>>>>> e = foreach c generate group, SUM(GetValues(c.m));
>>>>>
>>>>>
>>>>> Daniel
>>>>>
>>>>>
>>>>> On 05/23/2011 07:06 AM, Jameson Li wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>>>
>>>>>> I have the below pig code:
>>>>>>
>>>>>> register /home/uu/project/lib/pigudfs.jar
>>>>>> ruls = load 'testurl' as (url:chararray);
>>>>>>
>>>>>> b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1);
>>>>>>
>>>>>> here when dump b, it will return:
>>>>>> ([4#0.1677963])
>>>>>> ([193#0.16985779,81#0.10994483])
>>>>>> ([418#0.14138427,9#0.1107544,282#0.18699136])
>>>>>>
>>>>>> I just want group by the map key and sum the map value just like:
>>>>>> c = group b by $0#key;
>>>>>> d = foreach c generate group,SUM(b.$0#value);
>>>>>>
>>>>>> How could I write the code?
>>>>>>
>>>>>> Thanks,
>>>>>> Jameson Li.
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>
>



--


Re: how to operate a map type

Posted by Xiaomeng Wan <sh...@gmail.com>.
can't you udf return a bag of tuple with two fields (ie key and
value), then flatten it?

Shawn

On Thu, Jun 2, 2011 at 7:28 AM, Jameson Li <ho...@gmail.com> wrote:
> Hi,
>
> my pig code is like this:
> register myudf.jar
> a = load 'testurls' as (info:chararray);
> b = foreach a generate info,com.company.pig.GetInfoScore($0) as m;
> dump b;
>
> The output is like this:
> (65RFPRO800863GPT,[108#0.2])
> (6JL6U6EA00863J0J,[352#0.5,25#0.15,108#0.07,26#0.06,4#0.16])
> (6B7FF3E300052E97,[25#0.28,405#0.05,4#0.05])
> (5498267_31,[108#0.05,25#0.19,12#0.19])
>
> And I want to group by the map key, and count the info, just like the below
> output:
> 108  3        /*65RFPRO800863GPT   6JL6U6EA00863J0J   5498267_31 */
> 352  1        /*6JL6U6EA00863J0J*/
> 25    3        /*6JL6U6EA00863J0J  6B7FF3E300052E97 5498267_31 */
> 26    1        /*6JL6U6EA00863J0J*/
> 4      2        /*6JL6U6EA00863J0J   6B7FF3E300052E97*/
> 405   1       /*6B7FF3E300052E97*/
> 12     1       /*5498267_31*/
>
> I have a think that I have to split the map to many rows just as the below:
> (65RFPRO800863GPT, 108, 0.2)
> (6JL6U6EA00863J0J, 352, 0.5)
> (6JL6U6EA00863J0J, 25, 0.15)
> (6JL6U6EA00863J0J, 108, 0.07)
> (6JL6U6EA00863J0J, 26, 0.06)
> (6JL6U6EA00863J0J, 4, 0.16)
> (6B7FF3E300052E97, 25, 0.28)
> (6JL6U6EA00863J0J, 405, 0.05)
> (6JL6U6EA00863J0J, 4, 0.05)
> (5498267_31, 108, 0.05)
> (6JL6U6EA00863J0J, 25, 0.19)
> (6JL6U6EA00863J0J, 12, 0.19)
>
> And then it is easy to group and count.
> Am I right?
> I have no idea how to split the map to many rows as the above show.
> Help.
>
> Thanks.
>
> 2011/5/25 Alan Gates <ga...@yahoo-inc.com>
>
>> Can't you mimic dynamic key support with static keys by making your map
>> have two static keys 'key' and 'value'?
>>
>> Alan.
>>
>>
>> On May 24, 2011, at 3:05 AM, Jameson Li wrote:
>>
>>  OK.OK.I know that just write UDFs.
>>> I have to write UDFs, and see you......
>>> And I still think there should be grammar support for map operation both
>>> static key and dynamic key.............
>>>
>>> Thanks.
>>>
>>> 2011/5/24 Daniel Dai <da...@earthlink.net>
>>>
>>>  GetKey(m) already get the key, so you can filter the key. For value, you
>>>> may need to put into UDF.
>>>>
>>>> Grammar support for map is based on static key, eg: m#'key1'. Your use
>>>> case
>>>> is mostly dealing dynamic keys, which you may rely on yourself currently.
>>>>
>>>> Daniel
>>>>
>>>> -----Original Message----- From: Jameson Li
>>>> Sent: Monday, May 23, 2011 7:07 PM
>>>> To: Daniel Dai
>>>> Cc: user@pig.apache.org
>>>> Subject: Re: how to operate a map type
>>>>
>>>>
>>>> And how to filter a map key or a map value? And also only UDF?
>>>>
>>>> b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1) as m;
>>>> c = filter b by m.key == 'aaa' or m.value> 0.2;
>>>>
>>>> How could I write the code?
>>>> Any other way without writing UDF?
>>>>
>>>> And I have a doubt since only writing UDF can operate a map type, why not
>>>> have the official functions about the map type?
>>>>
>>>> Thanks.
>>>>
>>>> 2011/5/24 Daniel Dai <ji...@yahoo-inc.com>
>>>>
>>>> I cannot think of a way without writing UDF. You can write two UDF:
>>>>
>>>>> * GetKey, input a map, output the key of the map
>>>>> * GetValues, input a bag of map, output a bag of map values
>>>>>
>>>>> The script is like:
>>>>> b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1) as m;
>>>>> c = foreach b generate GetKey(m) as key, m;
>>>>> d = group c by key;
>>>>> e = foreach c generate group, SUM(GetValues(c.m));
>>>>>
>>>>>
>>>>> Daniel
>>>>>
>>>>>
>>>>> On 05/23/2011 07:06 AM, Jameson Li wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>>>
>>>>>> I have the below pig code:
>>>>>>
>>>>>> register /home/uu/project/lib/pigudfs.jar
>>>>>> ruls = load 'testurl' as (url:chararray);
>>>>>>
>>>>>> b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1);
>>>>>>
>>>>>> here when dump b, it will return:
>>>>>> ([4#0.1677963])
>>>>>> ([193#0.16985779,81#0.10994483])
>>>>>> ([418#0.14138427,9#0.1107544,282#0.18699136])
>>>>>>
>>>>>> I just want group by the map key and sum the map value just like:
>>>>>> c = group b by $0#key;
>>>>>> d = foreach c generate group,SUM(b.$0#value);
>>>>>>
>>>>>> How could I write the code?
>>>>>>
>>>>>> Thanks,
>>>>>> Jameson Li.
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>
>

Re: how to operate a map type

Posted by Jameson Li <ho...@gmail.com>.
Hi,

my pig code is like this:
register myudf.jar
a = load 'testurls' as (info:chararray);
b = foreach a generate info,com.company.pig.GetInfoScore($0) as m;
dump b;

The output is like this:
(65RFPRO800863GPT,[108#0.2])
(6JL6U6EA00863J0J,[352#0.5,25#0.15,108#0.07,26#0.06,4#0.16])
(6B7FF3E300052E97,[25#0.28,405#0.05,4#0.05])
(5498267_31,[108#0.05,25#0.19,12#0.19])

And I want to group by the map key, and count the info, just like the below
output:
108  3        /*65RFPRO800863GPT   6JL6U6EA00863J0J   5498267_31 */
352  1        /*6JL6U6EA00863J0J*/
25    3        /*6JL6U6EA00863J0J  6B7FF3E300052E97 5498267_31 */
26    1        /*6JL6U6EA00863J0J*/
4      2        /*6JL6U6EA00863J0J   6B7FF3E300052E97*/
405   1       /*6B7FF3E300052E97*/
12     1       /*5498267_31*/

I have a think that I have to split the map to many rows just as the below:
(65RFPRO800863GPT, 108, 0.2)
(6JL6U6EA00863J0J, 352, 0.5)
(6JL6U6EA00863J0J, 25, 0.15)
(6JL6U6EA00863J0J, 108, 0.07)
(6JL6U6EA00863J0J, 26, 0.06)
(6JL6U6EA00863J0J, 4, 0.16)
(6B7FF3E300052E97, 25, 0.28)
(6JL6U6EA00863J0J, 405, 0.05)
(6JL6U6EA00863J0J, 4, 0.05)
(5498267_31, 108, 0.05)
(6JL6U6EA00863J0J, 25, 0.19)
(6JL6U6EA00863J0J, 12, 0.19)

And then it is easy to group and count.
Am I right?
I have no idea how to split the map to many rows as the above show.
Help.

Thanks.

2011/5/25 Alan Gates <ga...@yahoo-inc.com>

> Can't you mimic dynamic key support with static keys by making your map
> have two static keys 'key' and 'value'?
>
> Alan.
>
>
> On May 24, 2011, at 3:05 AM, Jameson Li wrote:
>
>  OK.OK.I know that just write UDFs.
>> I have to write UDFs, and see you......
>> And I still think there should be grammar support for map operation both
>> static key and dynamic key.............
>>
>> Thanks.
>>
>> 2011/5/24 Daniel Dai <da...@earthlink.net>
>>
>>  GetKey(m) already get the key, so you can filter the key. For value, you
>>> may need to put into UDF.
>>>
>>> Grammar support for map is based on static key, eg: m#'key1'. Your use
>>> case
>>> is mostly dealing dynamic keys, which you may rely on yourself currently.
>>>
>>> Daniel
>>>
>>> -----Original Message----- From: Jameson Li
>>> Sent: Monday, May 23, 2011 7:07 PM
>>> To: Daniel Dai
>>> Cc: user@pig.apache.org
>>> Subject: Re: how to operate a map type
>>>
>>>
>>> And how to filter a map key or a map value? And also only UDF?
>>>
>>> b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1) as m;
>>> c = filter b by m.key == 'aaa' or m.value> 0.2;
>>>
>>> How could I write the code?
>>> Any other way without writing UDF?
>>>
>>> And I have a doubt since only writing UDF can operate a map type, why not
>>> have the official functions about the map type?
>>>
>>> Thanks.
>>>
>>> 2011/5/24 Daniel Dai <ji...@yahoo-inc.com>
>>>
>>> I cannot think of a way without writing UDF. You can write two UDF:
>>>
>>>> * GetKey, input a map, output the key of the map
>>>> * GetValues, input a bag of map, output a bag of map values
>>>>
>>>> The script is like:
>>>> b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1) as m;
>>>> c = foreach b generate GetKey(m) as key, m;
>>>> d = group c by key;
>>>> e = foreach c generate group, SUM(GetValues(c.m));
>>>>
>>>>
>>>> Daniel
>>>>
>>>>
>>>> On 05/23/2011 07:06 AM, Jameson Li wrote:
>>>>
>>>> Hi all,
>>>>
>>>>>
>>>>> I have the below pig code:
>>>>>
>>>>> register /home/uu/project/lib/pigudfs.jar
>>>>> ruls = load 'testurl' as (url:chararray);
>>>>>
>>>>> b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1);
>>>>>
>>>>> here when dump b, it will return:
>>>>> ([4#0.1677963])
>>>>> ([193#0.16985779,81#0.10994483])
>>>>> ([418#0.14138427,9#0.1107544,282#0.18699136])
>>>>>
>>>>> I just want group by the map key and sum the map value just like:
>>>>> c = group b by $0#key;
>>>>> d = foreach c generate group,SUM(b.$0#value);
>>>>>
>>>>> How could I write the code?
>>>>>
>>>>> Thanks,
>>>>> Jameson Li.
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>

Re: how to operate a map type

Posted by Alan Gates <ga...@yahoo-inc.com>.
Can't you mimic dynamic key support with static keys by making your  
map have two static keys 'key' and 'value'?

Alan.

On May 24, 2011, at 3:05 AM, Jameson Li wrote:

> OK.OK.I know that just write UDFs.
> I have to write UDFs, and see you......
> And I still think there should be grammar support for map operation  
> both
> static key and dynamic key.............
>
> Thanks.
>
> 2011/5/24 Daniel Dai <da...@earthlink.net>
>
>> GetKey(m) already get the key, so you can filter the key. For  
>> value, you
>> may need to put into UDF.
>>
>> Grammar support for map is based on static key, eg: m#'key1'. Your  
>> use case
>> is mostly dealing dynamic keys, which you may rely on yourself  
>> currently.
>>
>> Daniel
>>
>> -----Original Message----- From: Jameson Li
>> Sent: Monday, May 23, 2011 7:07 PM
>> To: Daniel Dai
>> Cc: user@pig.apache.org
>> Subject: Re: how to operate a map type
>>
>>
>> And how to filter a map key or a map value? And also only UDF?
>>
>> b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1)  
>> as m;
>> c = filter b by m.key == 'aaa' or m.value> 0.2;
>>
>> How could I write the code?
>> Any other way without writing UDF?
>>
>> And I have a doubt since only writing UDF can operate a map type,  
>> why not
>> have the official functions about the map type?
>>
>> Thanks.
>>
>> 2011/5/24 Daniel Dai <ji...@yahoo-inc.com>
>>
>> I cannot think of a way without writing UDF. You can write two UDF:
>>> * GetKey, input a map, output the key of the map
>>> * GetValues, input a bag of map, output a bag of map values
>>>
>>> The script is like:
>>> b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1)  
>>> as m;
>>> c = foreach b generate GetKey(m) as key, m;
>>> d = group c by key;
>>> e = foreach c generate group, SUM(GetValues(c.m));
>>>
>>>
>>> Daniel
>>>
>>>
>>> On 05/23/2011 07:06 AM, Jameson Li wrote:
>>>
>>> Hi all,
>>>>
>>>> I have the below pig code:
>>>>
>>>> register /home/uu/project/lib/pigudfs.jar
>>>> ruls = load 'testurl' as (url:chararray);
>>>>
>>>> b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1);
>>>>
>>>> here when dump b, it will return:
>>>> ([4#0.1677963])
>>>> ([193#0.16985779,81#0.10994483])
>>>> ([418#0.14138427,9#0.1107544,282#0.18699136])
>>>>
>>>> I just want group by the map key and sum the map value just like:
>>>> c = group b by $0#key;
>>>> d = foreach c generate group,SUM(b.$0#value);
>>>>
>>>> How could I write the code?
>>>>
>>>> Thanks,
>>>> Jameson Li.
>>>>
>>>>
>>>
>>>
>>


Re: how to operate a map type

Posted by Jameson Li <ho...@gmail.com>.
OK.OK.I know that just write UDFs.
I have to write UDFs, and see you......
And I still think there should be grammar support for map operation both
static key and dynamic key.............

Thanks.

2011/5/24 Daniel Dai <da...@earthlink.net>

> GetKey(m) already get the key, so you can filter the key. For value, you
> may need to put into UDF.
>
> Grammar support for map is based on static key, eg: m#'key1'. Your use case
> is mostly dealing dynamic keys, which you may rely on yourself currently.
>
> Daniel
>
> -----Original Message----- From: Jameson Li
> Sent: Monday, May 23, 2011 7:07 PM
> To: Daniel Dai
> Cc: user@pig.apache.org
> Subject: Re: how to operate a map type
>
>
> And how to filter a map key or a map value? And also only UDF?
>
> b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1) as m;
> c = filter b by m.key == 'aaa' or m.value> 0.2;
>
> How could I write the code?
> Any other way without writing UDF?
>
> And I have a doubt since only writing UDF can operate a map type, why not
> have the official functions about the map type?
>
> Thanks.
>
> 2011/5/24 Daniel Dai <ji...@yahoo-inc.com>
>
>  I cannot think of a way without writing UDF. You can write two UDF:
>> * GetKey, input a map, output the key of the map
>> * GetValues, input a bag of map, output a bag of map values
>>
>> The script is like:
>> b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1) as m;
>> c = foreach b generate GetKey(m) as key, m;
>> d = group c by key;
>> e = foreach c generate group, SUM(GetValues(c.m));
>>
>>
>> Daniel
>>
>>
>> On 05/23/2011 07:06 AM, Jameson Li wrote:
>>
>>  Hi all,
>>>
>>> I have the below pig code:
>>>
>>> register /home/uu/project/lib/pigudfs.jar
>>> ruls = load 'testurl' as (url:chararray);
>>>
>>> b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1);
>>>
>>> here when dump b, it will return:
>>> ([4#0.1677963])
>>> ([193#0.16985779,81#0.10994483])
>>> ([418#0.14138427,9#0.1107544,282#0.18699136])
>>>
>>> I just want group by the map key and sum the map value just like:
>>> c = group b by $0#key;
>>> d = foreach c generate group,SUM(b.$0#value);
>>>
>>> How could I write the code?
>>>
>>> Thanks,
>>> Jameson Li.
>>>
>>>
>>
>>
>

Re: how to operate a map type

Posted by Daniel Dai <da...@earthlink.net>.
GetKey(m) already get the key, so you can filter the key. For value, you may 
need to put into UDF.

Grammar support for map is based on static key, eg: m#'key1'. Your use case 
is mostly dealing dynamic keys, which you may rely on yourself currently.

Daniel

-----Original Message----- 
From: Jameson Li
Sent: Monday, May 23, 2011 7:07 PM
To: Daniel Dai
Cc: user@pig.apache.org
Subject: Re: how to operate a map type

And how to filter a map key or a map value? And also only UDF?

b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1) as m;
c = filter b by m.key == 'aaa' or m.value> 0.2;

How could I write the code?
Any other way without writing UDF?

And I have a doubt since only writing UDF can operate a map type, why not
have the official functions about the map type?

Thanks.

2011/5/24 Daniel Dai <ji...@yahoo-inc.com>

> I cannot think of a way without writing UDF. You can write two UDF:
> * GetKey, input a map, output the key of the map
> * GetValues, input a bag of map, output a bag of map values
>
> The script is like:
> b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1) as m;
> c = foreach b generate GetKey(m) as key, m;
> d = group c by key;
> e = foreach c generate group, SUM(GetValues(c.m));
>
>
> Daniel
>
>
> On 05/23/2011 07:06 AM, Jameson Li wrote:
>
>> Hi all,
>>
>> I have the below pig code:
>>
>> register /home/uu/project/lib/pigudfs.jar
>> ruls = load 'testurl' as (url:chararray);
>>
>> b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1);
>>
>> here when dump b, it will return:
>> ([4#0.1677963])
>> ([193#0.16985779,81#0.10994483])
>> ([418#0.14138427,9#0.1107544,282#0.18699136])
>>
>> I just want group by the map key and sum the map value just like:
>> c = group b by $0#key;
>> d = foreach c generate group,SUM(b.$0#value);
>>
>> How could I write the code?
>>
>> Thanks,
>> Jameson Li.
>>
>
> 


Re: how to operate a map type

Posted by Jameson Li <ho...@gmail.com>.
And how to filter a map key or a map value? And also only UDF?

b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1) as m;
c = filter b by m.key == 'aaa' or m.value> 0.2;

How could I write the code?
Any other way without writing UDF?

And I have a doubt since only writing UDF can operate a map type, why not
have the official functions about the map type?

Thanks.

2011/5/24 Daniel Dai <ji...@yahoo-inc.com>

> I cannot think of a way without writing UDF. You can write two UDF:
> * GetKey, input a map, output the key of the map
> * GetValues, input a bag of map, output a bag of map values
>
> The script is like:
> b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1) as m;
> c = foreach b generate GetKey(m) as key, m;
> d = group c by key;
> e = foreach c generate group, SUM(GetValues(c.m));
>
>
> Daniel
>
>
> On 05/23/2011 07:06 AM, Jameson Li wrote:
>
>> Hi all,
>>
>> I have the below pig code:
>>
>> register /home/uu/project/lib/pigudfs.jar
>> ruls = load 'testurl' as (url:chararray);
>>
>> b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1);
>>
>> here when dump b, it will return:
>> ([4#0.1677963])
>> ([193#0.16985779,81#0.10994483])
>> ([418#0.14138427,9#0.1107544,282#0.18699136])
>>
>> I just want group by the map key and sum the map value just like:
>> c = group b by $0#key;
>> d = foreach c generate group,SUM(b.$0#value);
>>
>> How could I write the code?
>>
>> Thanks,
>> Jameson Li.
>>
>
>

Re: how to operate a map type

Posted by Daniel Dai <ji...@yahoo-inc.com>.
I cannot think of a way without writing UDF. You can write two UDF:
* GetKey, input a map, output the key of the map
* GetValues, input a bag of map, output a bag of map values

The script is like:
b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1) as m;
c = foreach b generate GetKey(m) as key, m;
d = group c by key;
e = foreach c generate group, SUM(GetValues(c.m));


Daniel

On 05/23/2011 07:06 AM, Jameson Li wrote:
> Hi all,
>
> I have the below pig code:
>
> register /home/uu/project/lib/pigudfs.jar
> ruls = load 'testurl' as (url:chararray);
>
> b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1);
>
> here when dump b, it will return:
> ([4#0.1677963])
> ([193#0.16985779,81#0.10994483])
> ([418#0.14138427,9#0.1107544,282#0.18699136])
>
> I just want group by the map key and sum the map value just like:
> c = group b by $0#key;
> d = foreach c generate group,SUM(b.$0#value);
>
> How could I write the code?
>
> Thanks,
> Jameson Li.