You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Prashant Kommireddi (JIRA)" <ji...@apache.org> on 2012/10/23 20:37:12 UTC
[jira] [Updated] (PIG-2600) Better Map support
[ https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Prashant Kommireddi updated PIG-2600:
-------------------------------------
Tags: udf
Release Note:
Pig 0.11+ includes the following UDFs for operating with Map
1. VALUESET
2. VALUELIST
3. KEYSET
4. INVERSEMAP
VALUESET
This UDF takes a Map and returns a Tuple containing the value set.
Note, this UDF returns only unique values. For all values, use
VALUELIST instead.
<code>
grunt> cat data
[open#apache,1#2,11#2]
[apache#hadoop,3#4,12#hadoop]
grunt> a = load 'data' as (M:[]);
grunt> b = foreach a generate VALUELIST($0);
({(apache),(2)})
({(4),(hadoop)})
</code>
VALUELIST
This UDF takes a Map and returns a Bag containing the values from map.
Note that output tuple contains all values, not just unique ones.
For obtaining unique values from map, use VALUESET instead.
<code>
grunt> cat data
[open#apache,1#2,11#2]
[apache#hadoop,3#4,12#hadoop]
grunt> a = load 'data' as (M:[]);
grunt> b = foreach a generate VALUELIST($0);
grunt> dump b;
({(apache),(2),(2)})
({(4),(hadoop),(hadoop)})
</code>
KEYSET
This UDF takes a Map and returns a Bag containing the keyset.
<code>
grunt> cat data
[open#apache,1#2,11#2]
[apache#hadoop,3#4,12#hadoop]
grunt> a = load 'data' as (M:[]);
grunt> b = foreach a generate KEYSET($0);
grunt> dump b;
({(open),(1),(11)})
({(3),(apache),(12)})
</code>
INVERSEMAP
This UDF accepts a Map as input with values of any primitive data type.
UDF swaps keys with values and returns the new inverse Map.
Note in case original values are non-unique, the resulting Map would
contain String Key -> DataBag of values. Here the bag of values is composed
of the original keys having the same value.
Note: 1. UDF accepts Map with Values of primitive data type
2. UDF returns Map<String,DataBag>
<code>
grunt> cat 1data
[open#1,1#2,11#2]
[apache#2,3#4,12#24]
grunt> a = load 'data' as (M:[int]);
grunt> b = foreach a generate INVERSEMAP($0);
grunt> dump b;
([2#{(1),(11)},apache#{(open)}])
([hadoop#{(apache),(12)},4#{(3)}])
</code>
Olga, adding release notes. Let me know if you need more info.
> Better Map support
> ------------------
>
> Key: PIG-2600
> URL: https://issues.apache.org/jira/browse/PIG-2600
> Project: Pig
> Issue Type: Improvement
> Reporter: Jonathan Coveney
> Assignee: Prashant Kommireddi
> Fix For: 0.11
>
> Attachments: PIG-2600_2.patch, PIG-2600_3.patch, PIG-2600_4.patch, PIG-2600_5.patch, PIG-2600_6.patch, PIG-2600_7.patch, PIG-2600_8.patch, PIG-2600_9.patch, PIG-2600.patch
>
>
> It would be nice if Pig played better with Maps. To that end, I'd like to add a lot of utility around Maps.
> - TOBAG should take a Map and output {(key, value)}
> - TOMAP should take a Bag in that same form and make a map.
> - KEYSET should return the set of keys.
> - VALUESET should return the set of values.
> - VALUELIST should return the List of values (no deduping).
> - INVERSEMAP would return a Map of values => the set of keys that refer to that Key
> This would all be pretty easy. A more substantial piece of work would be to make Pig support non-String keys (this is especially an issue since UDFs and whatnot probably assume that they are all Integers). Not sure if it is worth it.
> I'd love to hear other things that would be useful for people!
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira