You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Jonathan Coveney (Created) (JIRA)" <ji...@apache.org> on 2012/03/16 06:41:49 UTC

[jira] [Created] (PIG-2600) Better Map support

Better Map support
------------------

                 Key: PIG-2600
                 URL: https://issues.apache.org/jira/browse/PIG-2600
             Project: Pig
          Issue Type: Improvement
            Reporter: Jonathan Coveney
             Fix For: 0.11


It would be nice if Pig played better with Maps. To that end, I'd like to add a lot of utility around Maps.

- TOBAG should take a Map and output {(key, value)}
- TOMAP should take a Bag in that same form and make a map.
- KEYSET should return the set of keys.
- VALUESET should return the set of values.
- VALUELIST should return the List of values (no deduping).
- INVERSEMAP would return a Map of values => the set of keys that refer to that Key

This would all be pretty easy. A more substantial piece of work would be to make Pig support non-String keys (this is especially an issue since UDFs and whatnot probably assume that they are all Integers). Not sure if it is worth it.

I'd love to hear other things that would be useful for people!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2600) Better Map support

Posted by "Prashant Kommireddi (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prashant Kommireddi updated PIG-2600:
-------------------------------------

    Attachment: PIG-2600_2.patch

Adding a new patch with KEYSET, VALUESET, VALUELIST and INVERSEMAP. Added corresponding test cases to TestBuiltin.java. Also added documentation for all the new classes.
                
> Better Map support
> ------------------
>
>                 Key: PIG-2600
>                 URL: https://issues.apache.org/jira/browse/PIG-2600
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>             Fix For: 0.11
>
>         Attachments: PIG-2600.patch, PIG-2600_2.patch
>
>
> It would be nice if Pig played better with Maps. To that end, I'd like to add a lot of utility around Maps.
> - TOBAG should take a Map and output {(key, value)}
> - TOMAP should take a Bag in that same form and make a map.
> - KEYSET should return the set of keys.
> - VALUESET should return the set of values.
> - VALUELIST should return the List of values (no deduping).
> - INVERSEMAP would return a Map of values => the set of keys that refer to that Key
> This would all be pretty easy. A more substantial piece of work would be to make Pig support non-String keys (this is especially an issue since UDFs and whatnot probably assume that they are all Integers). Not sure if it is worth it.
> I'd love to hear other things that would be useful for people!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2600) Better Map support

Posted by "Prashant Kommireddi (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prashant Kommireddi updated PIG-2600:
-------------------------------------

    Attachment: PIG-2600_3.patch

Attaching a revised patch. Here is the new functionality of INVERSEMAP, you would find the documentation in Class docs

{code}
 grunt> cat 1data
 [open#1,1#2,11#2]
 [apache#2,3#4,12#24]


 grunt> a = load 'data' as (M:[int]);
 grunt> b = foreach a generate INVERSEMAP($0);

 grunt> dump b;
 ([2#(1,11),1#(open)])
 ([2#(apache),24#(12),4#(3)])
{code}

                
> Better Map support
> ------------------
>
>                 Key: PIG-2600
>                 URL: https://issues.apache.org/jira/browse/PIG-2600
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>             Fix For: 0.11
>
>         Attachments: PIG-2600.patch, PIG-2600_2.patch, PIG-2600_3.patch
>
>
> It would be nice if Pig played better with Maps. To that end, I'd like to add a lot of utility around Maps.
> - TOBAG should take a Map and output {(key, value)}
> - TOMAP should take a Bag in that same form and make a map.
> - KEYSET should return the set of keys.
> - VALUESET should return the set of values.
> - VALUELIST should return the List of values (no deduping).
> - INVERSEMAP would return a Map of values => the set of keys that refer to that Key
> This would all be pretty easy. A more substantial piece of work would be to make Pig support non-String keys (this is especially an issue since UDFs and whatnot probably assume that they are all Integers). Not sure if it is worth it.
> I'd love to hear other things that would be useful for people!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2600) Better Map support

Posted by "Prashant Kommireddi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235048#comment-13235048 ] 

Prashant Kommireddi commented on PIG-2600:
------------------------------------------

I don't see a clear advantage of using Tuples over Bags in this case, I guess I can change it to be Bags. One of the use-cases could be to find Set Intersections or Unions between the value sets or lists. For example, each Map could be an array of (User, PageId) entries, and one might want to find intersections between different click-through patterns. I don't use too much of Maps in my PigLatin code to be honest so others can pitch in on how they would like this to be used.

Regarding INVERSEMAP, the return type now would be Map<String, DataBag> now. We will be returning a bag of keys a value maps to?
                
> Better Map support
> ------------------
>
>                 Key: PIG-2600
>                 URL: https://issues.apache.org/jira/browse/PIG-2600
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Prashant Kommireddi
>             Fix For: 0.11
>
>         Attachments: PIG-2600.patch, PIG-2600_2.patch, PIG-2600_3.patch
>
>
> It would be nice if Pig played better with Maps. To that end, I'd like to add a lot of utility around Maps.
> - TOBAG should take a Map and output {(key, value)}
> - TOMAP should take a Bag in that same form and make a map.
> - KEYSET should return the set of keys.
> - VALUESET should return the set of values.
> - VALUELIST should return the List of values (no deduping).
> - INVERSEMAP would return a Map of values => the set of keys that refer to that Key
> This would all be pretty easy. A more substantial piece of work would be to make Pig support non-String keys (this is especially an issue since UDFs and whatnot probably assume that they are all Integers). Not sure if it is worth it.
> I'd love to hear other things that would be useful for people!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2600) Better Map support

Posted by "Prashant Kommireddi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481704#comment-13481704 ] 

Prashant Kommireddi commented on PIG-2600:
------------------------------------------

Btw, syntax and usage examples are all present in java files for each of the UDFs. Lmk if it needs to go into release notes (and where) and I can put it there.
                
> Better Map support
> ------------------
>
>                 Key: PIG-2600
>                 URL: https://issues.apache.org/jira/browse/PIG-2600
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Prashant Kommireddi
>             Fix For: 0.11
>
>         Attachments: PIG-2600_2.patch, PIG-2600_3.patch, PIG-2600_4.patch, PIG-2600_5.patch, PIG-2600_6.patch, PIG-2600_7.patch, PIG-2600_8.patch, PIG-2600_9.patch, PIG-2600.patch
>
>
> It would be nice if Pig played better with Maps. To that end, I'd like to add a lot of utility around Maps.
> - TOBAG should take a Map and output {(key, value)}
> - TOMAP should take a Bag in that same form and make a map.
> - KEYSET should return the set of keys.
> - VALUESET should return the set of values.
> - VALUELIST should return the List of values (no deduping).
> - INVERSEMAP would return a Map of values => the set of keys that refer to that Key
> This would all be pretty easy. A more substantial piece of work would be to make Pig support non-String keys (this is especially an issue since UDFs and whatnot probably assume that they are all Integers). Not sure if it is worth it.
> I'd love to hear other things that would be useful for people!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-2600) Better Map support

Posted by "Prashant Kommireddi (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prashant Kommireddi updated PIG-2600:
-------------------------------------

    Attachment: PIG-2600.patch

An initial patch for KEYSET, VALUESET, VALUELIST. I wanted to have more eyes look at it and comment. I have a test case for KEYSET and will be working on tests for others soon.
                
> Better Map support
> ------------------
>
>                 Key: PIG-2600
>                 URL: https://issues.apache.org/jira/browse/PIG-2600
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>             Fix For: 0.11
>
>         Attachments: PIG-2600.patch
>
>
> It would be nice if Pig played better with Maps. To that end, I'd like to add a lot of utility around Maps.
> - TOBAG should take a Map and output {(key, value)}
> - TOMAP should take a Bag in that same form and make a map.
> - KEYSET should return the set of keys.
> - VALUESET should return the set of values.
> - VALUELIST should return the List of values (no deduping).
> - INVERSEMAP would return a Map of values => the set of keys that refer to that Key
> This would all be pretty easy. A more substantial piece of work would be to make Pig support non-String keys (this is especially an issue since UDFs and whatnot probably assume that they are all Integers). Not sure if it is worth it.
> I'd love to hear other things that would be useful for people!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2600) Better Map support

Posted by "Prashant Kommireddi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231632#comment-13231632 ] 

Prashant Kommireddi commented on PIG-2600:
------------------------------------------

Taking a look at TOBAG, the existing UDF already converts a map entry to bag. Are we looking for additional functionality here?

                
> Better Map support
> ------------------
>
>                 Key: PIG-2600
>                 URL: https://issues.apache.org/jira/browse/PIG-2600
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>             Fix For: 0.11
>
>
> It would be nice if Pig played better with Maps. To that end, I'd like to add a lot of utility around Maps.
> - TOBAG should take a Map and output {(key, value)}
> - TOMAP should take a Bag in that same form and make a map.
> - KEYSET should return the set of keys.
> - VALUESET should return the set of values.
> - VALUELIST should return the List of values (no deduping).
> - INVERSEMAP would return a Map of values => the set of keys that refer to that Key
> This would all be pretty easy. A more substantial piece of work would be to make Pig support non-String keys (this is especially an issue since UDFs and whatnot probably assume that they are all Integers). Not sure if it is worth it.
> I'd love to hear other things that would be useful for people!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2600) Better Map support

Posted by "Jonathan Coveney (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13249630#comment-13249630 ] 

Jonathan Coveney commented on PIG-2600:
---------------------------------------

Prashant, thanks for tackling this. Maps have been a second class citizen for too long :) Some comments:

General:
- I personally usually do "if (input == null || input.size() == 0)" just so you don't get the array index error, but that's fine. Same issue as in VALUELIST comes up with that as well.

VALUELIST/SET
- in some cases maps actually have information on the type of their keys. If this is the case, that information should be used to create the appropriate output schema.
- I personally like the approach of making TupleFactory and BagFactory a static final member of classes, but that's hardly Pig dogma :)
- In the case of a null map that is passed in, there are two options. One is to return null as you do, another is to return an empty bag... the importance is important because a null value won't get thrown out on a flatten, but a null bag will. I think I'd return a null bag, but would be open either way. It's just something we have to be conscious of.

KEYSET/LIST
{code}
+        //Inner schema for keys would always be type chararray
+        return new Schema(new Schema.FieldSchema(null, DataType.BAG)); 
{code}
You should build the Schema so that it takes into account the fact that you know it will be chararrays.

VALUESET
- You have an interesting comment about how you could iterate over the collection, and 

from HashSet, here's the constructor code:
{code}
public HashSet(Collection<? extends E> c) {
    map = new HashMap<>(Math.max((int) (c.size()/.75f) + 1, 16));
    addAll(c);
}
{code}

And addAll does what you're predict. The point of this being that you can probably save in the net amount of logic by iterating over the values directly as you suggest and checking on the HashSet if it already exists. There wouldn't be much of a change compared to what's going on under the covers anyway, as it's not like they do anything special to predict the size it's going to be...you'll just be doing the resizes as you iterate over values manually instead of all at once.

INVERSEMAP
- in doInverse, I don't think you really need to do the chain of isntanceofs. You're really just checking that it's not a DataBag or Tuple, right? you could do something like
{code}
if (!(o instanceof Tuple || o instanceof DataBag))
  newKey = o.toString();
{code}
The casting is unecessary. If it is the proper type, then toString will give you what you want irrespective.
- For updating hashmaps in cases like this, I'm a fan of the following to avoid having to do containsKey and then a get right afterwards
{code}
DataBag bag = (DataBag)inverseMap.get(newKey);
if (bag==null) {
    //logic
} else {
    bad.add( ... //more logic)
}
{code}

Thanks again for the legwork on this.
                
> Better Map support
> ------------------
>
>                 Key: PIG-2600
>                 URL: https://issues.apache.org/jira/browse/PIG-2600
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Prashant Kommireddi
>             Fix For: 0.11
>
>         Attachments: PIG-2600.patch, PIG-2600_2.patch, PIG-2600_3.patch, PIG-2600_4.patch
>
>
> It would be nice if Pig played better with Maps. To that end, I'd like to add a lot of utility around Maps.
> - TOBAG should take a Map and output {(key, value)}
> - TOMAP should take a Bag in that same form and make a map.
> - KEYSET should return the set of keys.
> - VALUESET should return the set of values.
> - VALUELIST should return the List of values (no deduping).
> - INVERSEMAP would return a Map of values => the set of keys that refer to that Key
> This would all be pretty easy. A more substantial piece of work would be to make Pig support non-String keys (this is especially an issue since UDFs and whatnot probably assume that they are all Integers). Not sure if it is worth it.
> I'd love to hear other things that would be useful for people!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2600) Better Map support

Posted by "Prashant Kommireddi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prashant Kommireddi updated PIG-2600:
-------------------------------------

    Attachment: PIG-2600_7.patch

Jon, addressed your feedback. Thanks.
                
> Better Map support
> ------------------
>
>                 Key: PIG-2600
>                 URL: https://issues.apache.org/jira/browse/PIG-2600
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Prashant Kommireddi
>             Fix For: 0.11
>
>         Attachments: PIG-2600.patch, PIG-2600_2.patch, PIG-2600_3.patch, PIG-2600_4.patch, PIG-2600_5.patch, PIG-2600_6.patch, PIG-2600_7.patch
>
>
> It would be nice if Pig played better with Maps. To that end, I'd like to add a lot of utility around Maps.
> - TOBAG should take a Map and output {(key, value)}
> - TOMAP should take a Bag in that same form and make a map.
> - KEYSET should return the set of keys.
> - VALUESET should return the set of values.
> - VALUELIST should return the List of values (no deduping).
> - INVERSEMAP would return a Map of values => the set of keys that refer to that Key
> This would all be pretty easy. A more substantial piece of work would be to make Pig support non-String keys (this is especially an issue since UDFs and whatnot probably assume that they are all Integers). Not sure if it is worth it.
> I'd love to hear other things that would be useful for people!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2600) Better Map support

Posted by "Jonathan Coveney (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13234499#comment-13234499 ] 

Jonathan Coveney commented on PIG-2600:
---------------------------------------

Hey Prashant, thanks again for your work on this. Took a look at the patch, looks good, though I'm wondering if VALUELIST and VALUESET shouldn't return bags? I guess it depends how someone plans to use it and you could convert easily from one to the other, but yeah... hmm. I think the big question is how it'd be used. The awkward part about having it as a Tuple is that it'd be impossible to do anything with the elements short of having a UDF to read them, whereas if it's a bag, then we know the type of the keys (whatever it is in the Map), and it seems more workable to me. This idea applies to all of the methods, really.
                
> Better Map support
> ------------------
>
>                 Key: PIG-2600
>                 URL: https://issues.apache.org/jira/browse/PIG-2600
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Prashant Kommireddi
>             Fix For: 0.11
>
>         Attachments: PIG-2600.patch, PIG-2600_2.patch, PIG-2600_3.patch
>
>
> It would be nice if Pig played better with Maps. To that end, I'd like to add a lot of utility around Maps.
> - TOBAG should take a Map and output {(key, value)}
> - TOMAP should take a Bag in that same form and make a map.
> - KEYSET should return the set of keys.
> - VALUESET should return the set of values.
> - VALUELIST should return the List of values (no deduping).
> - INVERSEMAP would return a Map of values => the set of keys that refer to that Key
> This would all be pretty easy. A more substantial piece of work would be to make Pig support non-String keys (this is especially an issue since UDFs and whatnot probably assume that they are all Integers). Not sure if it is worth it.
> I'd love to hear other things that would be useful for people!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2600) Better Map support

Posted by "Bill Graham (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231282#comment-13231282 ] 

Bill Graham commented on PIG-2600:
----------------------------------

+1 for overloading. People often try to use these with Maps or Bags, realize they can't and then write a MapToBag UDF.

Do you think INVERSEMAP is something that people need often? I question the general usefulness of that one.
                
> Better Map support
> ------------------
>
>                 Key: PIG-2600
>                 URL: https://issues.apache.org/jira/browse/PIG-2600
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>             Fix For: 0.11
>
>
> It would be nice if Pig played better with Maps. To that end, I'd like to add a lot of utility around Maps.
> - TOBAG should take a Map and output {(key, value)}
> - TOMAP should take a Bag in that same form and make a map.
> - KEYSET should return the set of keys.
> - VALUESET should return the set of values.
> - VALUELIST should return the List of values (no deduping).
> - INVERSEMAP would return a Map of values => the set of keys that refer to that Key
> This would all be pretty easy. A more substantial piece of work would be to make Pig support non-String keys (this is especially an issue since UDFs and whatnot probably assume that they are all Integers). Not sure if it is worth it.
> I'd love to hear other things that would be useful for people!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2600) Better Map support

Posted by "Russell Jurney (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231358#comment-13231358 ] 

Russell Jurney commented on PIG-2600:
-------------------------------------

The other thing we need is to parametize maps with columns.
                
> Better Map support
> ------------------
>
>                 Key: PIG-2600
>                 URL: https://issues.apache.org/jira/browse/PIG-2600
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>             Fix For: 0.11
>
>
> It would be nice if Pig played better with Maps. To that end, I'd like to add a lot of utility around Maps.
> - TOBAG should take a Map and output {(key, value)}
> - TOMAP should take a Bag in that same form and make a map.
> - KEYSET should return the set of keys.
> - VALUESET should return the set of values.
> - VALUELIST should return the List of values (no deduping).
> - INVERSEMAP would return a Map of values => the set of keys that refer to that Key
> This would all be pretty easy. A more substantial piece of work would be to make Pig support non-String keys (this is especially an issue since UDFs and whatnot probably assume that they are all Integers). Not sure if it is worth it.
> I'd love to hear other things that would be useful for people!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2600) Better Map support

Posted by "Prashant Kommireddi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231360#comment-13231360 ] 

Prashant Kommireddi commented on PIG-2600:
------------------------------------------

INVERSEMAP is a nice to have functionality, not necessarily uber-useful. This could be similar to http://google-collections.googlecode.com/svn/trunk/javadoc/com/google/common/collect/BiMap.html except the fact that it does not require the restriction that values are unique. Rather we could think of a strategy to determine which value->key pairs the function returns

1. First occurring pair, key pair -> (v1, k1) from (k1,v1)(k2,v1)
2. Last occurring pair -> (v1, k2) from (k1,v1)(k2,v1)
3. All keys corresponding to value -> (v1, (k1,k2))

Input could be tuple of maps or bag of maps.

INVERSEMAP has been useful in a few situations to me, for eg to get field->index map from index->field. I see it being useful in few cases with click-through analysis as well. 

Again, not a super required use-case but good to have.
                
> Better Map support
> ------------------
>
>                 Key: PIG-2600
>                 URL: https://issues.apache.org/jira/browse/PIG-2600
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>             Fix For: 0.11
>
>
> It would be nice if Pig played better with Maps. To that end, I'd like to add a lot of utility around Maps.
> - TOBAG should take a Map and output {(key, value)}
> - TOMAP should take a Bag in that same form and make a map.
> - KEYSET should return the set of keys.
> - VALUESET should return the set of values.
> - VALUELIST should return the List of values (no deduping).
> - INVERSEMAP would return a Map of values => the set of keys that refer to that Key
> This would all be pretty easy. A more substantial piece of work would be to make Pig support non-String keys (this is especially an issue since UDFs and whatnot probably assume that they are all Integers). Not sure if it is worth it.
> I'd love to hear other things that would be useful for people!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2600) Better Map support

Posted by "Prashant Kommireddi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13280360#comment-13280360 ] 

Prashant Kommireddi commented on PIG-2600:
------------------------------------------

Thanks Jon. Did you forget to change the resolution status?
                
> Better Map support
> ------------------
>
>                 Key: PIG-2600
>                 URL: https://issues.apache.org/jira/browse/PIG-2600
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Prashant Kommireddi
>             Fix For: 0.11
>
>         Attachments: PIG-2600.patch, PIG-2600_2.patch, PIG-2600_3.patch, PIG-2600_4.patch, PIG-2600_5.patch, PIG-2600_6.patch, PIG-2600_7.patch, PIG-2600_8.patch, PIG-2600_9.patch
>
>
> It would be nice if Pig played better with Maps. To that end, I'd like to add a lot of utility around Maps.
> - TOBAG should take a Map and output {(key, value)}
> - TOMAP should take a Bag in that same form and make a map.
> - KEYSET should return the set of keys.
> - VALUESET should return the set of values.
> - VALUELIST should return the List of values (no deduping).
> - INVERSEMAP would return a Map of values => the set of keys that refer to that Key
> This would all be pretty easy. A more substantial piece of work would be to make Pig support non-String keys (this is especially an issue since UDFs and whatnot probably assume that they are all Integers). Not sure if it is worth it.
> I'd love to hear other things that would be useful for people!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2600) Better Map support

Posted by "Jonathan Coveney (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233850#comment-13233850 ] 

Jonathan Coveney commented on PIG-2600:
---------------------------------------

Prashant,

Thanks for working on this! As far as Inversemap, I don't know if I like the semantic that if there are multiple values, you just get essentially a random Key. I think you should get a Tuple or Bag of or List whatever of Keys that correspond... the assumption is that values can appear more than once.

Open to debate though :)

Another option on the "inversmap only works on Map<String,String>" front would be that, if given Map<String, T>" it converts T to a string, which I think isn't a bad idea. Maybe that could be a constructor option, and the default is an error.
                
> Better Map support
> ------------------
>
>                 Key: PIG-2600
>                 URL: https://issues.apache.org/jira/browse/PIG-2600
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>             Fix For: 0.11
>
>         Attachments: PIG-2600.patch, PIG-2600_2.patch
>
>
> It would be nice if Pig played better with Maps. To that end, I'd like to add a lot of utility around Maps.
> - TOBAG should take a Map and output {(key, value)}
> - TOMAP should take a Bag in that same form and make a map.
> - KEYSET should return the set of keys.
> - VALUESET should return the set of values.
> - VALUELIST should return the List of values (no deduping).
> - INVERSEMAP would return a Map of values => the set of keys that refer to that Key
> This would all be pretty easy. A more substantial piece of work would be to make Pig support non-String keys (this is especially an issue since UDFs and whatnot probably assume that they are all Integers). Not sure if it is worth it.
> I'd love to hear other things that would be useful for people!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2600) Better Map support

Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13270990#comment-13270990 ] 

Jonathan Coveney commented on PIG-2600:
---------------------------------------

Man, I feel so nitpicky, though this one isn't so nitpicky: the javadoc formatting isn't correct. As is, when I compile the javadocs, I get this:
{code}
This UDF accepts a Map as input with values of any primitive data type. UDF swaps keys with values and returns the new inverse Map. Note in case original values are non-unique, the resulting Map would contain String Key -> DataBag of values. Here the bag of values is composed of the original keys having the same value. Note: 1. UDF accepts Map with Values of primitive data type 2. UDF returns Map grunt> cat 1data [open#1,1#2,11#2] [apache#2,3#4,12#24] grunt> a = load 'data' as (M:[int]); grunt> b = foreach a generate INVERSEMAP($0); grunt> dump b; ([2#{(1),(11)},apache#{(open)}]) ([hadoop#{(apache),(12)},4#{(3)}])
{code}

Other comments:
- doInverse should return Map<String,DataBag>
- INVERSEMAP should extend extends EvalFunc<Map<String,DataBag>>
- inverseMap should be a Map<String,Databag>

{code}

100             DataBag bag = (DataBag)inverseMap.get(newKey);
101             if(bag == null) {
102                 DataBag dataBag = BAG_FACTORY.newDefaultBag();
103                 dataBag.add(TUPLE_FACTORY.newTuple(entry.getKey()));
104                 inverseMap.put(newKey, dataBag);
105             } else {
106                 bag.add(TUPLE_FACTORY.newTuple(entry.getKey()));
107             }
{code}

You can just reuse the bag reference instead of making a new dataBag

That should be the last of the comments...
                
> Better Map support
> ------------------
>
>                 Key: PIG-2600
>                 URL: https://issues.apache.org/jira/browse/PIG-2600
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Prashant Kommireddi
>             Fix For: 0.11
>
>         Attachments: PIG-2600.patch, PIG-2600_2.patch, PIG-2600_3.patch, PIG-2600_4.patch, PIG-2600_5.patch, PIG-2600_6.patch, PIG-2600_7.patch
>
>
> It would be nice if Pig played better with Maps. To that end, I'd like to add a lot of utility around Maps.
> - TOBAG should take a Map and output {(key, value)}
> - TOMAP should take a Bag in that same form and make a map.
> - KEYSET should return the set of keys.
> - VALUESET should return the set of values.
> - VALUELIST should return the List of values (no deduping).
> - INVERSEMAP would return a Map of values => the set of keys that refer to that Key
> This would all be pretty easy. A more substantial piece of work would be to make Pig support non-String keys (this is especially an issue since UDFs and whatnot probably assume that they are all Integers). Not sure if it is worth it.
> I'd love to hear other things that would be useful for people!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2600) Better Map support

Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481630#comment-13481630 ] 

Jonathan Coveney commented on PIG-2600:
---------------------------------------

Prashant, do you want to take the honor? :)
                
> Better Map support
> ------------------
>
>                 Key: PIG-2600
>                 URL: https://issues.apache.org/jira/browse/PIG-2600
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Prashant Kommireddi
>             Fix For: 0.11
>
>         Attachments: PIG-2600_2.patch, PIG-2600_3.patch, PIG-2600_4.patch, PIG-2600_5.patch, PIG-2600_6.patch, PIG-2600_7.patch, PIG-2600_8.patch, PIG-2600_9.patch, PIG-2600.patch
>
>
> It would be nice if Pig played better with Maps. To that end, I'd like to add a lot of utility around Maps.
> - TOBAG should take a Map and output {(key, value)}
> - TOMAP should take a Bag in that same form and make a map.
> - KEYSET should return the set of keys.
> - VALUESET should return the set of values.
> - VALUELIST should return the List of values (no deduping).
> - INVERSEMAP would return a Map of values => the set of keys that refer to that Key
> This would all be pretty easy. A more substantial piece of work would be to make Pig support non-String keys (this is especially an issue since UDFs and whatnot probably assume that they are all Integers). Not sure if it is worth it.
> I'd love to hear other things that would be useful for people!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2600) Better Map support

Posted by "Prashant Kommireddi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13230934#comment-13230934 ] 

Prashant Kommireddi commented on PIG-2600:
------------------------------------------

You are right, we could simply override existing UDFs and not create type-specific ones. 
                
> Better Map support
> ------------------
>
>                 Key: PIG-2600
>                 URL: https://issues.apache.org/jira/browse/PIG-2600
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>             Fix For: 0.11
>
>
> It would be nice if Pig played better with Maps. To that end, I'd like to add a lot of utility around Maps.
> - TOBAG should take a Map and output {(key, value)}
> - TOMAP should take a Bag in that same form and make a map.
> - KEYSET should return the set of keys.
> - VALUESET should return the set of values.
> - VALUELIST should return the List of values (no deduping).
> - INVERSEMAP would return a Map of values => the set of keys that refer to that Key
> This would all be pretty easy. A more substantial piece of work would be to make Pig support non-String keys (this is especially an issue since UDFs and whatnot probably assume that they are all Integers). Not sure if it is worth it.
> I'd love to hear other things that would be useful for people!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2600) Better Map support

Posted by "Prashant Kommireddi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prashant Kommireddi updated PIG-2600:
-------------------------------------

    Attachment: PIG-2600_8.patch

Jon, here it is.
                
> Better Map support
> ------------------
>
>                 Key: PIG-2600
>                 URL: https://issues.apache.org/jira/browse/PIG-2600
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Prashant Kommireddi
>             Fix For: 0.11
>
>         Attachments: PIG-2600.patch, PIG-2600_2.patch, PIG-2600_3.patch, PIG-2600_4.patch, PIG-2600_5.patch, PIG-2600_6.patch, PIG-2600_7.patch, PIG-2600_8.patch
>
>
> It would be nice if Pig played better with Maps. To that end, I'd like to add a lot of utility around Maps.
> - TOBAG should take a Map and output {(key, value)}
> - TOMAP should take a Bag in that same form and make a map.
> - KEYSET should return the set of keys.
> - VALUESET should return the set of values.
> - VALUELIST should return the List of values (no deduping).
> - INVERSEMAP would return a Map of values => the set of keys that refer to that Key
> This would all be pretty easy. A more substantial piece of work would be to make Pig support non-String keys (this is especially an issue since UDFs and whatnot probably assume that they are all Integers). Not sure if it is worth it.
> I'd love to hear other things that would be useful for people!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2600) Better Map support

Posted by "Prashant Kommireddi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13268205#comment-13268205 ] 

Prashant Kommireddi commented on PIG-2600:
------------------------------------------

Thanks for the review Jon.

1. I agree on Exception handling in most cases, and thanks for catching "printStackTrace" in the code, it wasn't my intention to leave it in there :). In general wrapping specific portions of code within try-catch is a good practice, but I prefer not breaking up try blocks into multiple when most lines within the method throw the same exception, and its not a lot of code otherwise. In these UDFs, Schema.getField is used more than once and ofcourse they all throw FrontEndException. 

And looking at builtin UDFs for examples was really not the best idea :). Looks like some refactoring required there.

2. Regarding resizing of HashSet, trying to optimize right now might be a bit premature. My comment about frequent resizing would make sense if the number of distinct elements in Map values was large. A HashSet start with an internal array of size 16, and starts expanding (creating a new array and copying elements over) once a certain threshold is met. With the current approach, HashSet implementation guesses an approximate size of array based on size of the Collection being passed to it in the constructor. "it just adds all of the elements, and resizes dynamically as you add more of them" - this can be a costly operation if you start with 16 elements and the number of distinct values in the map was thousands, millions... "since that first pass, dynamic resizing and all, is going to happen anyway" - it should make sense that the amount of resizing is not the same in the 2 cases? Either way, too early to be thinking about optimization there :)

I will upload a patch soon with the changes, thanks for reviewing again.

                
> Better Map support
> ------------------
>
>                 Key: PIG-2600
>                 URL: https://issues.apache.org/jira/browse/PIG-2600
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Prashant Kommireddi
>             Fix For: 0.11
>
>         Attachments: PIG-2600.patch, PIG-2600_2.patch, PIG-2600_3.patch, PIG-2600_4.patch, PIG-2600_5.patch
>
>
> It would be nice if Pig played better with Maps. To that end, I'd like to add a lot of utility around Maps.
> - TOBAG should take a Map and output {(key, value)}
> - TOMAP should take a Bag in that same form and make a map.
> - KEYSET should return the set of keys.
> - VALUESET should return the set of values.
> - VALUELIST should return the List of values (no deduping).
> - INVERSEMAP would return a Map of values => the set of keys that refer to that Key
> This would all be pretty easy. A more substantial piece of work would be to make Pig support non-String keys (this is especially an issue since UDFs and whatnot probably assume that they are all Integers). Not sure if it is worth it.
> I'd love to hear other things that would be useful for people!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2600) Better Map support

Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13280331#comment-13280331 ] 

Jonathan Coveney commented on PIG-2600:
---------------------------------------

Prashant: it's in. Thanks for your patience! Looking forward to seeing more contribution :)
                
> Better Map support
> ------------------
>
>                 Key: PIG-2600
>                 URL: https://issues.apache.org/jira/browse/PIG-2600
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Prashant Kommireddi
>             Fix For: 0.11
>
>         Attachments: PIG-2600.patch, PIG-2600_2.patch, PIG-2600_3.patch, PIG-2600_4.patch, PIG-2600_5.patch, PIG-2600_6.patch, PIG-2600_7.patch, PIG-2600_8.patch, PIG-2600_9.patch
>
>
> It would be nice if Pig played better with Maps. To that end, I'd like to add a lot of utility around Maps.
> - TOBAG should take a Map and output {(key, value)}
> - TOMAP should take a Bag in that same form and make a map.
> - KEYSET should return the set of keys.
> - VALUESET should return the set of values.
> - VALUELIST should return the List of values (no deduping).
> - INVERSEMAP would return a Map of values => the set of keys that refer to that Key
> This would all be pretty easy. A more substantial piece of work would be to make Pig support non-String keys (this is especially an issue since UDFs and whatnot probably assume that they are all Integers). Not sure if it is worth it.
> I'd love to hear other things that would be useful for people!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2600) Better Map support

Posted by "Prashant Kommireddi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13230936#comment-13230936 ] 

Prashant Kommireddi commented on PIG-2600:
------------------------------------------

Overload is what I meant, long day!
                
> Better Map support
> ------------------
>
>                 Key: PIG-2600
>                 URL: https://issues.apache.org/jira/browse/PIG-2600
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>             Fix For: 0.11
>
>
> It would be nice if Pig played better with Maps. To that end, I'd like to add a lot of utility around Maps.
> - TOBAG should take a Map and output {(key, value)}
> - TOMAP should take a Bag in that same form and make a map.
> - KEYSET should return the set of keys.
> - VALUESET should return the set of values.
> - VALUELIST should return the List of values (no deduping).
> - INVERSEMAP would return a Map of values => the set of keys that refer to that Key
> This would all be pretty easy. A more substantial piece of work would be to make Pig support non-String keys (this is especially an issue since UDFs and whatnot probably assume that they are all Integers). Not sure if it is worth it.
> I'd love to hear other things that would be useful for people!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2600) Better Map support

Posted by "Jonathan Coveney (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13234503#comment-13234503 ] 

Jonathan Coveney commented on PIG-2600:
---------------------------------------

Woops, hit enter a bit early. With inversemap as well, we know the return type: Map<String,String> (nothing else is really possibly).

For the tests on inversemap, probably worth testing the case of boolean values, int, etc values.
                
> Better Map support
> ------------------
>
>                 Key: PIG-2600
>                 URL: https://issues.apache.org/jira/browse/PIG-2600
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Prashant Kommireddi
>             Fix For: 0.11
>
>         Attachments: PIG-2600.patch, PIG-2600_2.patch, PIG-2600_3.patch
>
>
> It would be nice if Pig played better with Maps. To that end, I'd like to add a lot of utility around Maps.
> - TOBAG should take a Map and output {(key, value)}
> - TOMAP should take a Bag in that same form and make a map.
> - KEYSET should return the set of keys.
> - VALUESET should return the set of values.
> - VALUELIST should return the List of values (no deduping).
> - INVERSEMAP would return a Map of values => the set of keys that refer to that Key
> This would all be pretty easy. A more substantial piece of work would be to make Pig support non-String keys (this is especially an issue since UDFs and whatnot probably assume that they are all Integers). Not sure if it is worth it.
> I'd love to hear other things that would be useful for people!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2600) Better Map support

Posted by "Prashant Kommireddi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prashant Kommireddi updated PIG-2600:
-------------------------------------

    Attachment: PIG-2600_6.patch
    
> Better Map support
> ------------------
>
>                 Key: PIG-2600
>                 URL: https://issues.apache.org/jira/browse/PIG-2600
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Prashant Kommireddi
>             Fix For: 0.11
>
>         Attachments: PIG-2600.patch, PIG-2600_2.patch, PIG-2600_3.patch, PIG-2600_4.patch, PIG-2600_5.patch, PIG-2600_6.patch
>
>
> It would be nice if Pig played better with Maps. To that end, I'd like to add a lot of utility around Maps.
> - TOBAG should take a Map and output {(key, value)}
> - TOMAP should take a Bag in that same form and make a map.
> - KEYSET should return the set of keys.
> - VALUESET should return the set of values.
> - VALUELIST should return the List of values (no deduping).
> - INVERSEMAP would return a Map of values => the set of keys that refer to that Key
> This would all be pretty easy. A more substantial piece of work would be to make Pig support non-String keys (this is especially an issue since UDFs and whatnot probably assume that they are all Integers). Not sure if it is worth it.
> I'd love to hear other things that would be useful for people!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2600) Better Map support

Posted by "Jonathan Coveney (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231365#comment-13231365 ] 

Jonathan Coveney commented on PIG-2600:
---------------------------------------

Russell: can you give an example of how you see that looking?

As far as inversemap... most of the stuff we've talked about thus far is pretty easy to implement. I'm trying to think of if there is anything more ambitious?
                
> Better Map support
> ------------------
>
>                 Key: PIG-2600
>                 URL: https://issues.apache.org/jira/browse/PIG-2600
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>             Fix For: 0.11
>
>
> It would be nice if Pig played better with Maps. To that end, I'd like to add a lot of utility around Maps.
> - TOBAG should take a Map and output {(key, value)}
> - TOMAP should take a Bag in that same form and make a map.
> - KEYSET should return the set of keys.
> - VALUESET should return the set of values.
> - VALUELIST should return the List of values (no deduping).
> - INVERSEMAP would return a Map of values => the set of keys that refer to that Key
> This would all be pretty easy. A more substantial piece of work would be to make Pig support non-String keys (this is especially an issue since UDFs and whatnot probably assume that they are all Integers). Not sure if it is worth it.
> I'd love to hear other things that would be useful for people!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2600) Better Map support

Posted by "Prashant Kommireddi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prashant Kommireddi updated PIG-2600:
-------------------------------------

            Tags: udf
    Release Note: 
Pig 0.11+ includes the following UDFs for operating with Map

1. VALUESET
2. VALUELIST
3. KEYSET
4. INVERSEMAP

VALUESET

  This UDF takes a Map and returns a Tuple containing the value set. 
  Note, this UDF returns only unique values. For all values, use 
  VALUELIST instead.

  <code>
  grunt> cat data
  [open#apache,1#2,11#2]
  [apache#hadoop,3#4,12#hadoop]
 
  grunt> a = load 'data' as (M:[]);
  grunt> b = foreach a generate VALUELIST($0);
  ({(apache),(2)})
  ({(4),(hadoop)})
 
  </code>

VALUELIST

 
  This UDF takes a Map and returns a Bag containing the values from map. 
  Note that output tuple contains all values, not just unique ones.
  For obtaining unique values from map, use VALUESET instead. 
 
  <code>
  grunt> cat data
  [open#apache,1#2,11#2]
  [apache#hadoop,3#4,12#hadoop]
 
  grunt> a = load 'data' as (M:[]);
  grunt> b = foreach a generate VALUELIST($0);
  grunt> dump b;
  ({(apache),(2),(2)})
  ({(4),(hadoop),(hadoop)})
  </code>

KEYSET

  This UDF takes a Map and returns a Bag containing the keyset.

  <code>
  grunt> cat data
  [open#apache,1#2,11#2]
  [apache#hadoop,3#4,12#hadoop]
 
  grunt> a = load 'data' as (M:[]);
  grunt> b = foreach a generate KEYSET($0);
  grunt> dump b;
  ({(open),(1),(11)})
  ({(3),(apache),(12)})
  </code>

INVERSEMAP

  This UDF accepts a Map as input with values of any primitive data type. 
  UDF swaps keys with values and returns the new inverse Map. 
  Note in case original values are non-unique, the resulting Map would 
  contain String Key -> DataBag of values. Here the bag of values is composed 
  of the original keys having the same value. 
 
  Note: 1. UDF accepts Map with Values of primitive data type
           2. UDF returns Map<String,DataBag>
  <code>
  grunt> cat 1data
  [open#1,1#2,11#2]
  [apache#2,3#4,12#24]
 
  
  grunt> a = load 'data' as (M:[int]);
  grunt> b = foreach a generate INVERSEMAP($0);
 
  grunt> dump b;
  ([2#{(1),(11)},apache#{(open)}])
  ([hadoop#{(apache),(12)},4#{(3)}])
  </code>

Olga, adding release notes. Let me know if you need more info.
                
> Better Map support
> ------------------
>
>                 Key: PIG-2600
>                 URL: https://issues.apache.org/jira/browse/PIG-2600
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Prashant Kommireddi
>             Fix For: 0.11
>
>         Attachments: PIG-2600_2.patch, PIG-2600_3.patch, PIG-2600_4.patch, PIG-2600_5.patch, PIG-2600_6.patch, PIG-2600_7.patch, PIG-2600_8.patch, PIG-2600_9.patch, PIG-2600.patch
>
>
> It would be nice if Pig played better with Maps. To that end, I'd like to add a lot of utility around Maps.
> - TOBAG should take a Map and output {(key, value)}
> - TOMAP should take a Bag in that same form and make a map.
> - KEYSET should return the set of keys.
> - VALUESET should return the set of values.
> - VALUELIST should return the List of values (no deduping).
> - INVERSEMAP would return a Map of values => the set of keys that refer to that Key
> This would all be pretty easy. A more substantial piece of work would be to make Pig support non-String keys (this is especially an issue since UDFs and whatnot probably assume that they are all Integers). Not sure if it is worth it.
> I'd love to hear other things that would be useful for people!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2600) Better Map support

Posted by "Prashant Kommireddi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481676#comment-13481676 ] 

Prashant Kommireddi commented on PIG-2600:
------------------------------------------

Sure, I can do that. Where exactly should I be adding this?
                
> Better Map support
> ------------------
>
>                 Key: PIG-2600
>                 URL: https://issues.apache.org/jira/browse/PIG-2600
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Prashant Kommireddi
>             Fix For: 0.11
>
>         Attachments: PIG-2600_2.patch, PIG-2600_3.patch, PIG-2600_4.patch, PIG-2600_5.patch, PIG-2600_6.patch, PIG-2600_7.patch, PIG-2600_8.patch, PIG-2600_9.patch, PIG-2600.patch
>
>
> It would be nice if Pig played better with Maps. To that end, I'd like to add a lot of utility around Maps.
> - TOBAG should take a Map and output {(key, value)}
> - TOMAP should take a Bag in that same form and make a map.
> - KEYSET should return the set of keys.
> - VALUESET should return the set of values.
> - VALUELIST should return the List of values (no deduping).
> - INVERSEMAP would return a Map of values => the set of keys that refer to that Key
> This would all be pretty easy. A more substantial piece of work would be to make Pig support non-String keys (this is especially an issue since UDFs and whatnot probably assume that they are all Integers). Not sure if it is worth it.
> I'd love to hear other things that would be useful for people!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2600) Better Map support

Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13268173#comment-13268173 ] 

Jonathan Coveney commented on PIG-2600:
---------------------------------------

Prashant,

Thanks for your work on this! It's 99.9% of the way there. A more administrative point...you'll need to reupload it with permissions given to apache to use. What follows is fairly nitpicky.
- In valuelist, you have
{code}
+            }
+            else {
{code}
just bump that else up

Less nitpicky:
- You have many cases where you catch (Exception). Try and catch the least general exception possible. Example: In valuelist, you have 
{code}
+        } catch (Exception e) {
+            String msg = "Error while generating " + this.getClass().getSimpleName();
+            throw new RuntimeException(msg, e);
+        }
{code}

(I'm not even quite sure what exception is being thrown?)

Further, instead of throwing a runtime exception, the convention (in exec) is to throw ExecException since exec throws IOException anyway. The builtin functions have many examples of this. But yeah, in general, you want to catch the most specific exception possible in the most specific portion of code possible... I don't like the convention of wrapping an entire method in a try{}catch(Exception). At least, that's how I like to do it and I'm reviewing this ;) So basically: try's should be around the most specific piece of code possible, should catch the most specific exception possible, and should throw ExecException if the code are allows. Throwing a RuntimeException should be a last resort (and is really just jank around the annoying exceptions required by outputSchema).

Similarly, in KEYSET you have
{code}
+        } catch(FrontendException e) {
+            // TODO Auto-generated catch block
+            e.printStackTrace();
+        }
{code}
which I assume was an oversight left by the IDE :) (those things will be the death of us Java programmers...)

- I think we had talked a bit about how to do VALUESET... I still think you could do it in one pass, but I don't think the performance gain will be so massive that you can't do it in two passes. Your comment about frequent resizing isn't really accurate, since if you look at the HashSet constructor, it doesn't do any fancy math...it just adds all of the elements, and resizes dynamically as you add more of them. Thus, you could just add to the HashSet, and if it wasn't previously contained, add it to the Tuple, thus doing it in one pass (since that first pass, dynamic resizing and all, is going to happen anyway). Does that make sense?

Thanks again for your work on this. It is super close.
Jon
                
> Better Map support
> ------------------
>
>                 Key: PIG-2600
>                 URL: https://issues.apache.org/jira/browse/PIG-2600
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Prashant Kommireddi
>             Fix For: 0.11
>
>         Attachments: PIG-2600.patch, PIG-2600_2.patch, PIG-2600_3.patch, PIG-2600_4.patch, PIG-2600_5.patch
>
>
> It would be nice if Pig played better with Maps. To that end, I'd like to add a lot of utility around Maps.
> - TOBAG should take a Map and output {(key, value)}
> - TOMAP should take a Bag in that same form and make a map.
> - KEYSET should return the set of keys.
> - VALUESET should return the set of values.
> - VALUELIST should return the List of values (no deduping).
> - INVERSEMAP would return a Map of values => the set of keys that refer to that Key
> This would all be pretty easy. A more substantial piece of work would be to make Pig support non-String keys (this is especially an issue since UDFs and whatnot probably assume that they are all Integers). Not sure if it is worth it.
> I'd love to hear other things that would be useful for people!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2600) Better Map support

Posted by "Prashant Kommireddi (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prashant Kommireddi updated PIG-2600:
-------------------------------------

    Attachment: PIG-2600_4.patch

Jon, adding a new patch as per our previous discussion (changing return type to bags). I have also added test cases for various primitive types being Map values. 
                
> Better Map support
> ------------------
>
>                 Key: PIG-2600
>                 URL: https://issues.apache.org/jira/browse/PIG-2600
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Prashant Kommireddi
>             Fix For: 0.11
>
>         Attachments: PIG-2600.patch, PIG-2600_2.patch, PIG-2600_3.patch, PIG-2600_4.patch
>
>
> It would be nice if Pig played better with Maps. To that end, I'd like to add a lot of utility around Maps.
> - TOBAG should take a Map and output {(key, value)}
> - TOMAP should take a Bag in that same form and make a map.
> - KEYSET should return the set of keys.
> - VALUESET should return the set of values.
> - VALUELIST should return the List of values (no deduping).
> - INVERSEMAP would return a Map of values => the set of keys that refer to that Key
> This would all be pretty easy. A more substantial piece of work would be to make Pig support non-String keys (this is especially an issue since UDFs and whatnot probably assume that they are all Integers). Not sure if it is worth it.
> I'd love to hear other things that would be useful for people!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2600) Better Map support

Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13270917#comment-13270917 ] 

Jonathan Coveney commented on PIG-2600:
---------------------------------------

So much refactoring is necessary in so many places...oy haha. As far as how much code to wrap...yeah, that's just style at this point. Not worth nitpicking over. I do have a couple of last comments...

In VALUELIST (and a couple of other places I think)
{code}
+        try {
+            //Input must be of type Map. This is verified at compile time
+            m = (Map<String, Object>)(input.get(0));
+        } catch (ExecException e) {
+            throw e;
+        }
{code}

the catch shouldn't be necessary anymore, since exec throws IOException and ExecException is an IOException.

More of style point, but in KEYSET:
{code}
+        Iterator<String> iter = keySet.iterator();
+        DataBag bag = BAG_FACTORY.newDefaultBag();
+        //Add keyset elements to bag
+        while(iter.hasNext()) {
+            Tuple t = TUPLE_FACTORY.newTuple(iter.next());
+            bag.add(t);
+        }
{code}

could be rewritten as
{code}
DataBag bag = BAG_FACTORY.newDefaultBag();
for (String s : keySet) {
    Tuple t = TUPLE_FACTORY.newTuple(iter.next());
    bag.add(t);
}
{code}

IMHO using the for construct with iterators is a lot clearer, but it's not a big deal... if there weren't other things I wouldn't have bother mentioning it, but here it is haha.

- You can have the doInverse method throw ExecException, then instead of throwing RuntimeException, throw an ExecException, which you won't have to catch since exec throws IOException. Also, the exception message is a little misleading, as the error isn't that of getting and unknown primitive (primitive implies int etc), but just a wrong type in general.

That's it though. Should be good for committing after that.

Thanks for the work!
                
> Better Map support
> ------------------
>
>                 Key: PIG-2600
>                 URL: https://issues.apache.org/jira/browse/PIG-2600
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Prashant Kommireddi
>             Fix For: 0.11
>
>         Attachments: PIG-2600.patch, PIG-2600_2.patch, PIG-2600_3.patch, PIG-2600_4.patch, PIG-2600_5.patch, PIG-2600_6.patch
>
>
> It would be nice if Pig played better with Maps. To that end, I'd like to add a lot of utility around Maps.
> - TOBAG should take a Map and output {(key, value)}
> - TOMAP should take a Bag in that same form and make a map.
> - KEYSET should return the set of keys.
> - VALUESET should return the set of values.
> - VALUELIST should return the List of values (no deduping).
> - INVERSEMAP would return a Map of values => the set of keys that refer to that Key
> This would all be pretty easy. A more substantial piece of work would be to make Pig support non-String keys (this is especially an issue since UDFs and whatnot probably assume that they are all Integers). Not sure if it is worth it.
> I'd love to hear other things that would be useful for people!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2600) Better Map support

Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13278398#comment-13278398 ] 

Jonathan Coveney commented on PIG-2600:
---------------------------------------

Prashant,

Looks great. Literally one more comment and then I can commit: VALUESET seems to be in 2space form instead of 4space form. My guess is you forgot to change your IDE from the convention your use at work, or something like that. Fix that and then we're good to go :)

Jon
                
> Better Map support
> ------------------
>
>                 Key: PIG-2600
>                 URL: https://issues.apache.org/jira/browse/PIG-2600
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Prashant Kommireddi
>             Fix For: 0.11
>
>         Attachments: PIG-2600.patch, PIG-2600_2.patch, PIG-2600_3.patch, PIG-2600_4.patch, PIG-2600_5.patch, PIG-2600_6.patch, PIG-2600_7.patch, PIG-2600_8.patch
>
>
> It would be nice if Pig played better with Maps. To that end, I'd like to add a lot of utility around Maps.
> - TOBAG should take a Map and output {(key, value)}
> - TOMAP should take a Bag in that same form and make a map.
> - KEYSET should return the set of keys.
> - VALUESET should return the set of values.
> - VALUELIST should return the List of values (no deduping).
> - INVERSEMAP would return a Map of values => the set of keys that refer to that Key
> This would all be pretty easy. A more substantial piece of work would be to make Pig support non-String keys (this is especially an issue since UDFs and whatnot probably assume that they are all Integers). Not sure if it is worth it.
> I'd love to hear other things that would be useful for people!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2600) Better Map support

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481596#comment-13481596 ] 

Olga Natkovich commented on PIG-2600:
-------------------------------------

can you please add to release notes the UDFs that were added as well as their syntax and usage examples. This is for inclusion in the documentation, thanks!
                
> Better Map support
> ------------------
>
>                 Key: PIG-2600
>                 URL: https://issues.apache.org/jira/browse/PIG-2600
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Prashant Kommireddi
>             Fix For: 0.11
>
>         Attachments: PIG-2600_2.patch, PIG-2600_3.patch, PIG-2600_4.patch, PIG-2600_5.patch, PIG-2600_6.patch, PIG-2600_7.patch, PIG-2600_8.patch, PIG-2600_9.patch, PIG-2600.patch
>
>
> It would be nice if Pig played better with Maps. To that end, I'd like to add a lot of utility around Maps.
> - TOBAG should take a Map and output {(key, value)}
> - TOMAP should take a Bag in that same form and make a map.
> - KEYSET should return the set of keys.
> - VALUESET should return the set of values.
> - VALUELIST should return the List of values (no deduping).
> - INVERSEMAP would return a Map of values => the set of keys that refer to that Key
> This would all be pretty easy. A more substantial piece of work would be to make Pig support non-String keys (this is especially an issue since UDFs and whatnot probably assume that they are all Integers). Not sure if it is worth it.
> I'd love to hear other things that would be useful for people!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-2600) Better Map support

Posted by "Prashant Kommireddi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13230918#comment-13230918 ] 

Prashant Kommireddi commented on PIG-2600:
------------------------------------------

Jon, for these UDFs, we could name 1 and 2 MAPTOBAG and BAGTOMAP respectively, since TOBAG, TOMAP exist already. In fact, all UDFs here (except INVERSEMAP) could be prefixed with MAP to avoid confusion.
                
> Better Map support
> ------------------
>
>                 Key: PIG-2600
>                 URL: https://issues.apache.org/jira/browse/PIG-2600
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>             Fix For: 0.11
>
>
> It would be nice if Pig played better with Maps. To that end, I'd like to add a lot of utility around Maps.
> - TOBAG should take a Map and output {(key, value)}
> - TOMAP should take a Bag in that same form and make a map.
> - KEYSET should return the set of keys.
> - VALUESET should return the set of values.
> - VALUELIST should return the List of values (no deduping).
> - INVERSEMAP would return a Map of values => the set of keys that refer to that Key
> This would all be pretty easy. A more substantial piece of work would be to make Pig support non-String keys (this is especially an issue since UDFs and whatnot probably assume that they are all Integers). Not sure if it is worth it.
> I'd love to hear other things that would be useful for people!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (PIG-2600) Better Map support

Posted by "Prashant Kommireddi (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prashant Kommireddi reassigned PIG-2600:
----------------------------------------

    Assignee: Prashant Kommireddi
    
> Better Map support
> ------------------
>
>                 Key: PIG-2600
>                 URL: https://issues.apache.org/jira/browse/PIG-2600
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Prashant Kommireddi
>             Fix For: 0.11
>
>         Attachments: PIG-2600.patch, PIG-2600_2.patch, PIG-2600_3.patch
>
>
> It would be nice if Pig played better with Maps. To that end, I'd like to add a lot of utility around Maps.
> - TOBAG should take a Map and output {(key, value)}
> - TOMAP should take a Bag in that same form and make a map.
> - KEYSET should return the set of keys.
> - VALUESET should return the set of values.
> - VALUELIST should return the List of values (no deduping).
> - INVERSEMAP would return a Map of values => the set of keys that refer to that Key
> This would all be pretty easy. A more substantial piece of work would be to make Pig support non-String keys (this is especially an issue since UDFs and whatnot probably assume that they are all Integers). Not sure if it is worth it.
> I'd love to hear other things that would be useful for people!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2600) Better Map support

Posted by "Prashant Kommireddi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233075#comment-13233075 ] 

Prashant Kommireddi commented on PIG-2600:
------------------------------------------

Working on INVERSEMAP, does it make sense to allow this UDF to be used only when values are String too. Since this UDF would involve swapping keys with values, it would be required for the new keys (that were values originally) to be chararray?

Also, if a value is not unique the UDF would simply return the last key for that value (as per hashmap "put" behavior)

{code}
grunt> cat data;
[1#2,3#4,4#2]
[apache#hadoop,doug#hadoop]

grunt> a = load 'data' as (M:[]);
grunt> b = foreach a generate INVERSEMAP($0);
grunt> dump b;
[4#3,2#4]
[hadoop#doug]
{code}
                
> Better Map support
> ------------------
>
>                 Key: PIG-2600
>                 URL: https://issues.apache.org/jira/browse/PIG-2600
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>             Fix For: 0.11
>
>         Attachments: PIG-2600.patch
>
>
> It would be nice if Pig played better with Maps. To that end, I'd like to add a lot of utility around Maps.
> - TOBAG should take a Map and output {(key, value)}
> - TOMAP should take a Bag in that same form and make a map.
> - KEYSET should return the set of keys.
> - VALUESET should return the set of values.
> - VALUELIST should return the List of values (no deduping).
> - INVERSEMAP would return a Map of values => the set of keys that refer to that Key
> This would all be pretty easy. A more substantial piece of work would be to make Pig support non-String keys (this is especially an issue since UDFs and whatnot probably assume that they are all Integers). Not sure if it is worth it.
> I'd love to hear other things that would be useful for people!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2600) Better Map support

Posted by "Prashant Kommireddi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233976#comment-13233976 ] 

Prashant Kommireddi commented on PIG-2600:
------------------------------------------

Thanks Jon, I agree with both of your statements. The 2nd case Map<String, T> makes sense for primitive types, will upload a new patch soon.
                
> Better Map support
> ------------------
>
>                 Key: PIG-2600
>                 URL: https://issues.apache.org/jira/browse/PIG-2600
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>             Fix For: 0.11
>
>         Attachments: PIG-2600.patch, PIG-2600_2.patch
>
>
> It would be nice if Pig played better with Maps. To that end, I'd like to add a lot of utility around Maps.
> - TOBAG should take a Map and output {(key, value)}
> - TOMAP should take a Bag in that same form and make a map.
> - KEYSET should return the set of keys.
> - VALUESET should return the set of values.
> - VALUELIST should return the List of values (no deduping).
> - INVERSEMAP would return a Map of values => the set of keys that refer to that Key
> This would all be pretty easy. A more substantial piece of work would be to make Pig support non-String keys (this is especially an issue since UDFs and whatnot probably assume that they are all Integers). Not sure if it is worth it.
> I'd love to hear other things that would be useful for people!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2600) Better Map support

Posted by "Prashant Kommireddi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prashant Kommireddi updated PIG-2600:
-------------------------------------

    Attachment: PIG-2600_5.patch

Jon, I made changes addressing most of your concerns. New patch uploaded.
                
> Better Map support
> ------------------
>
>                 Key: PIG-2600
>                 URL: https://issues.apache.org/jira/browse/PIG-2600
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Prashant Kommireddi
>             Fix For: 0.11
>
>         Attachments: PIG-2600.patch, PIG-2600_2.patch, PIG-2600_3.patch, PIG-2600_4.patch, PIG-2600_5.patch
>
>
> It would be nice if Pig played better with Maps. To that end, I'd like to add a lot of utility around Maps.
> - TOBAG should take a Map and output {(key, value)}
> - TOMAP should take a Bag in that same form and make a map.
> - KEYSET should return the set of keys.
> - VALUESET should return the set of values.
> - VALUELIST should return the List of values (no deduping).
> - INVERSEMAP would return a Map of values => the set of keys that refer to that Key
> This would all be pretty easy. A more substantial piece of work would be to make Pig support non-String keys (this is especially an issue since UDFs and whatnot probably assume that they are all Integers). Not sure if it is worth it.
> I'd love to hear other things that would be useful for people!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2600) Better Map support

Posted by "Prashant Kommireddi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prashant Kommireddi updated PIG-2600:
-------------------------------------

    Attachment: PIG-2600_9.patch

My bad, I used Hbase formatter for making edits on VALUESET. Seems like the conventions are different there. Uploading a new patch.
                
> Better Map support
> ------------------
>
>                 Key: PIG-2600
>                 URL: https://issues.apache.org/jira/browse/PIG-2600
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Prashant Kommireddi
>             Fix For: 0.11
>
>         Attachments: PIG-2600.patch, PIG-2600_2.patch, PIG-2600_3.patch, PIG-2600_4.patch, PIG-2600_5.patch, PIG-2600_6.patch, PIG-2600_7.patch, PIG-2600_8.patch, PIG-2600_9.patch
>
>
> It would be nice if Pig played better with Maps. To that end, I'd like to add a lot of utility around Maps.
> - TOBAG should take a Map and output {(key, value)}
> - TOMAP should take a Bag in that same form and make a map.
> - KEYSET should return the set of keys.
> - VALUESET should return the set of values.
> - VALUELIST should return the List of values (no deduping).
> - INVERSEMAP would return a Map of values => the set of keys that refer to that Key
> This would all be pretty easy. A more substantial piece of work would be to make Pig support non-String keys (this is especially an issue since UDFs and whatnot probably assume that they are all Integers). Not sure if it is worth it.
> I'd love to hear other things that would be useful for people!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2600) Better Map support

Posted by "Jonathan Coveney (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13230928#comment-13230928 ] 

Jonathan Coveney commented on PIG-2600:
---------------------------------------

Hmm, I think there is a compelling argument to do so, but at the same time: do we want a bunch of type specific conversion functions where one function can logically apply to multiple? I mean, TOBAG can work on Tuples as well as Maps...theoretically, the Pig type system coupled with the function should make it unambiguous which one is getting applied, and it'd be nice not to have more UDFs than we need.
                
> Better Map support
> ------------------
>
>                 Key: PIG-2600
>                 URL: https://issues.apache.org/jira/browse/PIG-2600
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>             Fix For: 0.11
>
>
> It would be nice if Pig played better with Maps. To that end, I'd like to add a lot of utility around Maps.
> - TOBAG should take a Map and output {(key, value)}
> - TOMAP should take a Bag in that same form and make a map.
> - KEYSET should return the set of keys.
> - VALUESET should return the set of values.
> - VALUELIST should return the List of values (no deduping).
> - INVERSEMAP would return a Map of values => the set of keys that refer to that Key
> This would all be pretty easy. A more substantial piece of work would be to make Pig support non-String keys (this is especially an issue since UDFs and whatnot probably assume that they are all Integers). Not sure if it is worth it.
> I'd love to hear other things that would be useful for people!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2600) Better Map support

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481738#comment-13481738 ] 

Olga Natkovich commented on PIG-2600:
-------------------------------------

Yes, please, put all the information into the release notes. This way it is much easier to created documentation patch.
                
> Better Map support
> ------------------
>
>                 Key: PIG-2600
>                 URL: https://issues.apache.org/jira/browse/PIG-2600
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Prashant Kommireddi
>             Fix For: 0.11
>
>         Attachments: PIG-2600_2.patch, PIG-2600_3.patch, PIG-2600_4.patch, PIG-2600_5.patch, PIG-2600_6.patch, PIG-2600_7.patch, PIG-2600_8.patch, PIG-2600_9.patch, PIG-2600.patch
>
>
> It would be nice if Pig played better with Maps. To that end, I'd like to add a lot of utility around Maps.
> - TOBAG should take a Map and output {(key, value)}
> - TOMAP should take a Bag in that same form and make a map.
> - KEYSET should return the set of keys.
> - VALUESET should return the set of values.
> - VALUELIST should return the List of values (no deduping).
> - INVERSEMAP would return a Map of values => the set of keys that refer to that Key
> This would all be pretty easy. A more substantial piece of work would be to make Pig support non-String keys (this is especially an issue since UDFs and whatnot probably assume that they are all Integers). Not sure if it is worth it.
> I'd love to hear other things that would be useful for people!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (PIG-2600) Better Map support

Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Coveney resolved PIG-2600.
-----------------------------------

    Resolution: Fixed
    
> Better Map support
> ------------------
>
>                 Key: PIG-2600
>                 URL: https://issues.apache.org/jira/browse/PIG-2600
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Prashant Kommireddi
>             Fix For: 0.11
>
>         Attachments: PIG-2600.patch, PIG-2600_2.patch, PIG-2600_3.patch, PIG-2600_4.patch, PIG-2600_5.patch, PIG-2600_6.patch, PIG-2600_7.patch, PIG-2600_8.patch, PIG-2600_9.patch
>
>
> It would be nice if Pig played better with Maps. To that end, I'd like to add a lot of utility around Maps.
> - TOBAG should take a Map and output {(key, value)}
> - TOMAP should take a Bag in that same form and make a map.
> - KEYSET should return the set of keys.
> - VALUESET should return the set of values.
> - VALUELIST should return the List of values (no deduping).
> - INVERSEMAP would return a Map of values => the set of keys that refer to that Key
> This would all be pretty easy. A more substantial piece of work would be to make Pig support non-String keys (this is especially an issue since UDFs and whatnot probably assume that they are all Integers). Not sure if it is worth it.
> I'd love to hear other things that would be useful for people!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2600) Better Map support

Posted by "Prashant Kommireddi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prashant Kommireddi updated PIG-2600:
-------------------------------------

    Patch Info: Patch Available
    
> Better Map support
> ------------------
>
>                 Key: PIG-2600
>                 URL: https://issues.apache.org/jira/browse/PIG-2600
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Prashant Kommireddi
>             Fix For: 0.11
>
>         Attachments: PIG-2600.patch, PIG-2600_2.patch, PIG-2600_3.patch, PIG-2600_4.patch, PIG-2600_5.patch
>
>
> It would be nice if Pig played better with Maps. To that end, I'd like to add a lot of utility around Maps.
> - TOBAG should take a Map and output {(key, value)}
> - TOMAP should take a Bag in that same form and make a map.
> - KEYSET should return the set of keys.
> - VALUESET should return the set of values.
> - VALUELIST should return the List of values (no deduping).
> - INVERSEMAP would return a Map of values => the set of keys that refer to that Key
> This would all be pretty easy. A more substantial piece of work would be to make Pig support non-String keys (this is especially an issue since UDFs and whatnot probably assume that they are all Integers). Not sure if it is worth it.
> I'd love to hear other things that would be useful for people!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira