You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Prashant Kommireddi (JIRA)" <ji...@apache.org> on 2012/10/23 20:37:12 UTC

[jira] [Updated] (PIG-2600) Better Map support

     [ https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prashant Kommireddi updated PIG-2600:
-------------------------------------

            Tags: udf
    Release Note: 
Pig 0.11+ includes the following UDFs for operating with Map

1. VALUESET
2. VALUELIST
3. KEYSET
4. INVERSEMAP

VALUESET

  This UDF takes a Map and returns a Tuple containing the value set. 
  Note, this UDF returns only unique values. For all values, use 
  VALUELIST instead.

  <code>
  grunt> cat data
  [open#apache,1#2,11#2]
  [apache#hadoop,3#4,12#hadoop]
 
  grunt> a = load 'data' as (M:[]);
  grunt> b = foreach a generate VALUELIST($0);
  ({(apache),(2)})
  ({(4),(hadoop)})
 
  </code>

VALUELIST

 
  This UDF takes a Map and returns a Bag containing the values from map. 
  Note that output tuple contains all values, not just unique ones.
  For obtaining unique values from map, use VALUESET instead. 
 
  <code>
  grunt> cat data
  [open#apache,1#2,11#2]
  [apache#hadoop,3#4,12#hadoop]
 
  grunt> a = load 'data' as (M:[]);
  grunt> b = foreach a generate VALUELIST($0);
  grunt> dump b;
  ({(apache),(2),(2)})
  ({(4),(hadoop),(hadoop)})
  </code>

KEYSET

  This UDF takes a Map and returns a Bag containing the keyset.

  <code>
  grunt> cat data
  [open#apache,1#2,11#2]
  [apache#hadoop,3#4,12#hadoop]
 
  grunt> a = load 'data' as (M:[]);
  grunt> b = foreach a generate KEYSET($0);
  grunt> dump b;
  ({(open),(1),(11)})
  ({(3),(apache),(12)})
  </code>

INVERSEMAP

  This UDF accepts a Map as input with values of any primitive data type. 
  UDF swaps keys with values and returns the new inverse Map. 
  Note in case original values are non-unique, the resulting Map would 
  contain String Key -> DataBag of values. Here the bag of values is composed 
  of the original keys having the same value. 
 
  Note: 1. UDF accepts Map with Values of primitive data type
           2. UDF returns Map<String,DataBag>
  <code>
  grunt> cat 1data
  [open#1,1#2,11#2]
  [apache#2,3#4,12#24]
 
  
  grunt> a = load 'data' as (M:[int]);
  grunt> b = foreach a generate INVERSEMAP($0);
 
  grunt> dump b;
  ([2#{(1),(11)},apache#{(open)}])
  ([hadoop#{(apache),(12)},4#{(3)}])
  </code>

Olga, adding release notes. Let me know if you need more info.
                
> Better Map support
> ------------------
>
>                 Key: PIG-2600
>                 URL: https://issues.apache.org/jira/browse/PIG-2600
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Prashant Kommireddi
>             Fix For: 0.11
>
>         Attachments: PIG-2600_2.patch, PIG-2600_3.patch, PIG-2600_4.patch, PIG-2600_5.patch, PIG-2600_6.patch, PIG-2600_7.patch, PIG-2600_8.patch, PIG-2600_9.patch, PIG-2600.patch
>
>
> It would be nice if Pig played better with Maps. To that end, I'd like to add a lot of utility around Maps.
> - TOBAG should take a Map and output {(key, value)}
> - TOMAP should take a Bag in that same form and make a map.
> - KEYSET should return the set of keys.
> - VALUESET should return the set of values.
> - VALUELIST should return the List of values (no deduping).
> - INVERSEMAP would return a Map of values => the set of keys that refer to that Key
> This would all be pretty easy. A more substantial piece of work would be to make Pig support non-String keys (this is especially an issue since UDFs and whatnot probably assume that they are all Integers). Not sure if it is worth it.
> I'd love to hear other things that would be useful for people!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira