You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Thejas M Nair (JIRA)" <ji...@apache.org> on 2011/01/25 21:33:44 UTC

[jira] Created: (PIG-1821) UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)

UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)
-----------------------------------------------------------------------------------------------------

                 Key: PIG-1821
                 URL: https://issues.apache.org/jira/browse/PIG-1821
             Project: Pig
          Issue Type: Bug
    Affects Versions: 0.8.0
            Reporter: Thejas M Nair
             Fix For: 0.9.0


In code below, if generateKey() returns same value for two udfs, the udfs would end up sharing the properties object. 

{code}
private HashMap<Integer, Properties> udfConfs = new HashMap<Integer, Properties>();

    public Properties getUDFProperties(Class c) {
        Integer k = generateKey(c);
        Properties p = udfConfs.get(k);
        if (p == null) {
            p = new Properties();
            udfConfs.put(k, p);
        }
        return p;
    }

    private int generateKey(Class c) {
        return c.getName().hashCode();
    }

    public Properties getUDFProperties(Class c, String[] args) {
        Integer k = generateKey(c, args);
        Properties p = udfConfs.get(k);
        if (p == null) {
            p = new Properties();
            udfConfs.put(k, p);
        }
        return p;
    }

    private int generateKey(Class c, String[] args) {
        int hc = c.getName().hashCode();
        for (int i = 0; i < args.length; i++) {
            hc <<= 1;
            hc ^= args[i].hashCode();
        }
        return hc;
    }

{code}


To prevent this, a new class (say X) that can hold the classname and args should be created, and instead of HashMap<Integer, Properties>,  HashMap<X, Properties> should be used. Then HahsMap will deal with the collisions. 




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] [Commented] (PIG-1821) UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)

Posted by "Richard Ding (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028438#comment-13028438 ] 

Richard Ding commented on PIG-1821:
-----------------------------------

+1

> UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)
> -----------------------------------------------------------------------------------------------------
>
>                 Key: PIG-1821
>                 URL: https://issues.apache.org/jira/browse/PIG-1821
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Thejas M Nair
>            Assignee: Thejas M Nair
>             Fix For: 0.9.0
>
>         Attachments: PIG-1821.1.patch, PIG-1821.2.patch
>
>
> In code below, if generateKey() returns same value for two udfs, the udfs would end up sharing the properties object. 
> {code}
> private HashMap<Integer, Properties> udfConfs = new HashMap<Integer, Properties>();
>     public Properties getUDFProperties(Class c) {
>         Integer k = generateKey(c);
>         Properties p = udfConfs.get(k);
>         if (p == null) {
>             p = new Properties();
>             udfConfs.put(k, p);
>         }
>         return p;
>     }
>     private int generateKey(Class c) {
>         return c.getName().hashCode();
>     }
>     public Properties getUDFProperties(Class c, String[] args) {
>         Integer k = generateKey(c, args);
>         Properties p = udfConfs.get(k);
>         if (p == null) {
>             p = new Properties();
>             udfConfs.put(k, p);
>         }
>         return p;
>     }
>     private int generateKey(Class c, String[] args) {
>         int hc = c.getName().hashCode();
>         for (int i = 0; i < args.length; i++) {
>             hc <<= 1;
>             hc ^= args[i].hashCode();
>         }
>         return hc;
>     }
> {code}
> To prevent this, a new class (say X) that can hold the classname and args should be created, and instead of HashMap<Integer, Properties>,  HashMap<X, Properties> should be used. Then HahsMap will deal with the collisions. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-1821) UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)

Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thejas M Nair updated PIG-1821:
-------------------------------

    Attachment: PIG-1821.2.patch

PIG-1821.2.patch - Changes to incorporate comments from Richard. Passes unit tests and test-patch.


> UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)
> -----------------------------------------------------------------------------------------------------
>
>                 Key: PIG-1821
>                 URL: https://issues.apache.org/jira/browse/PIG-1821
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Thejas M Nair
>            Assignee: Thejas M Nair
>             Fix For: 0.9.0
>
>         Attachments: PIG-1821.1.patch, PIG-1821.2.patch
>
>
> In code below, if generateKey() returns same value for two udfs, the udfs would end up sharing the properties object. 
> {code}
> private HashMap<Integer, Properties> udfConfs = new HashMap<Integer, Properties>();
>     public Properties getUDFProperties(Class c) {
>         Integer k = generateKey(c);
>         Properties p = udfConfs.get(k);
>         if (p == null) {
>             p = new Properties();
>             udfConfs.put(k, p);
>         }
>         return p;
>     }
>     private int generateKey(Class c) {
>         return c.getName().hashCode();
>     }
>     public Properties getUDFProperties(Class c, String[] args) {
>         Integer k = generateKey(c, args);
>         Properties p = udfConfs.get(k);
>         if (p == null) {
>             p = new Properties();
>             udfConfs.put(k, p);
>         }
>         return p;
>     }
>     private int generateKey(Class c, String[] args) {
>         int hc = c.getName().hashCode();
>         for (int i = 0; i < args.length; i++) {
>             hc <<= 1;
>             hc ^= args[i].hashCode();
>         }
>         return hc;
>     }
> {code}
> To prevent this, a new class (say X) that can hold the classname and args should be created, and instead of HashMap<Integer, Properties>,  HashMap<X, Properties> should be used. Then HahsMap will deal with the collisions. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (PIG-1821) UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)

Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12986707#action_12986707 ] 

Thejas M Nair commented on PIG-1821:
------------------------------------

I meant to say that List<String> can be used as key, ie  HashMap <List<String>, Properties>

> UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)
> -----------------------------------------------------------------------------------------------------
>
>                 Key: PIG-1821
>                 URL: https://issues.apache.org/jira/browse/PIG-1821
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Thejas M Nair
>             Fix For: 0.9.0
>
>
> In code below, if generateKey() returns same value for two udfs, the udfs would end up sharing the properties object. 
> {code}
> private HashMap<Integer, Properties> udfConfs = new HashMap<Integer, Properties>();
>     public Properties getUDFProperties(Class c) {
>         Integer k = generateKey(c);
>         Properties p = udfConfs.get(k);
>         if (p == null) {
>             p = new Properties();
>             udfConfs.put(k, p);
>         }
>         return p;
>     }
>     private int generateKey(Class c) {
>         return c.getName().hashCode();
>     }
>     public Properties getUDFProperties(Class c, String[] args) {
>         Integer k = generateKey(c, args);
>         Properties p = udfConfs.get(k);
>         if (p == null) {
>             p = new Properties();
>             udfConfs.put(k, p);
>         }
>         return p;
>     }
>     private int generateKey(Class c, String[] args) {
>         int hc = c.getName().hashCode();
>         for (int i = 0; i < args.length; i++) {
>             hc <<= 1;
>             hc ^= args[i].hashCode();
>         }
>         return hc;
>     }
> {code}
> To prevent this, a new class (say X) that can hold the classname and args should be created, and instead of HashMap<Integer, Properties>,  HashMap<X, Properties> should be used. Then HahsMap will deal with the collisions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] [Updated] (PIG-1821) UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)

Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thejas M Nair updated PIG-1821:
-------------------------------

    Status: Patch Available  (was: Open)

> UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)
> -----------------------------------------------------------------------------------------------------
>
>                 Key: PIG-1821
>                 URL: https://issues.apache.org/jira/browse/PIG-1821
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Thejas M Nair
>            Assignee: Thejas M Nair
>             Fix For: 0.9.0
>
>         Attachments: PIG-1821.1.patch
>
>
> In code below, if generateKey() returns same value for two udfs, the udfs would end up sharing the properties object. 
> {code}
> private HashMap<Integer, Properties> udfConfs = new HashMap<Integer, Properties>();
>     public Properties getUDFProperties(Class c) {
>         Integer k = generateKey(c);
>         Properties p = udfConfs.get(k);
>         if (p == null) {
>             p = new Properties();
>             udfConfs.put(k, p);
>         }
>         return p;
>     }
>     private int generateKey(Class c) {
>         return c.getName().hashCode();
>     }
>     public Properties getUDFProperties(Class c, String[] args) {
>         Integer k = generateKey(c, args);
>         Properties p = udfConfs.get(k);
>         if (p == null) {
>             p = new Properties();
>             udfConfs.put(k, p);
>         }
>         return p;
>     }
>     private int generateKey(Class c, String[] args) {
>         int hc = c.getName().hashCode();
>         for (int i = 0; i < args.length; i++) {
>             hc <<= 1;
>             hc ^= args[i].hashCode();
>         }
>         return hc;
>     }
> {code}
> To prevent this, a new class (say X) that can hold the classname and args should be created, and instead of HashMap<Integer, Properties>,  HashMap<X, Properties> should be used. Then HahsMap will deal with the collisions. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Assigned: (PIG-1821) UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich reassigned PIG-1821:
-----------------------------------

    Assignee: Thejas M Nair

> UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)
> -----------------------------------------------------------------------------------------------------
>
>                 Key: PIG-1821
>                 URL: https://issues.apache.org/jira/browse/PIG-1821
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Thejas M Nair
>            Assignee: Thejas M Nair
>             Fix For: 0.9.0
>
>
> In code below, if generateKey() returns same value for two udfs, the udfs would end up sharing the properties object. 
> {code}
> private HashMap<Integer, Properties> udfConfs = new HashMap<Integer, Properties>();
>     public Properties getUDFProperties(Class c) {
>         Integer k = generateKey(c);
>         Properties p = udfConfs.get(k);
>         if (p == null) {
>             p = new Properties();
>             udfConfs.put(k, p);
>         }
>         return p;
>     }
>     private int generateKey(Class c) {
>         return c.getName().hashCode();
>     }
>     public Properties getUDFProperties(Class c, String[] args) {
>         Integer k = generateKey(c, args);
>         Properties p = udfConfs.get(k);
>         if (p == null) {
>             p = new Properties();
>             udfConfs.put(k, p);
>         }
>         return p;
>     }
>     private int generateKey(Class c, String[] args) {
>         int hc = c.getName().hashCode();
>         for (int i = 0; i < args.length; i++) {
>             hc <<= 1;
>             hc ^= args[i].hashCode();
>         }
>         return hc;
>     }
> {code}
> To prevent this, a new class (say X) that can hold the classname and args should be created, and instead of HashMap<Integer, Properties>,  HashMap<X, Properties> should be used. Then HahsMap will deal with the collisions. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-1821) UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)

Posted by "Woody Anderson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030188#comment-13030188 ] 

Woody Anderson commented on PIG-1821:
-------------------------------------

this checkin has caused a classloader failure when using my loader UDF.

i've narrowed it the the checking that references this bug:

r1099860 | thejas | 2011-05-05 09:10:26 -0700 (Thu, 05 May 2011) | 3 lines
PIG-1821: UDFContext.getUDFProperties does not handle collisions
  in hashcode of udf classname (+ arg hashcodes) (thejas)

I rebuilt my loader against the new pig jar, still fails.

here's the output of my code (it works if i run using pig built with the previous revision):
Backend error message during job submission
-------------------------------------------
java.io.IOException: Deserialization error: com.yahoo.ymail.pigfunctions.AsStorage
        at org.apache.pig.impl.util.ObjectSerializer.deserialize(ObjectSerializer.java:55)
        at org.apache.pig.impl.util.UDFContext.deserialize(UDFContext.java:183)
        at org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil.setupUDFContext(MapRedUtil.java:155)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.setupUdfEnvAndStores(PigOutputFormat.java:228)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecs(PigOutputFormat.java:185)
        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:770)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
        at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
        at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
        at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
        at java.lang.Thread.run(Thread.java:680)
Caused by: java.lang.ClassNotFoundException: com.yahoo.ymail.pigfunctions.AsStorage
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:247)
        at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:603)
        at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1574)
        at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1495)
        at java.io.ObjectInputStream.readClass(ObjectInputStream.java:1461)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1311)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
        at java.util.HashMap.readObject(HashMap.java:1029)
        at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:974)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1848)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
        at org.apache.pig.impl.util.ObjectSerializer.deserialize(ObjectSerializer.java:53)
        ... 10 more


> UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)
> -----------------------------------------------------------------------------------------------------
>
>                 Key: PIG-1821
>                 URL: https://issues.apache.org/jira/browse/PIG-1821
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Thejas M Nair
>            Assignee: Thejas M Nair
>             Fix For: 0.9.0
>
>         Attachments: PIG-1821.1.patch, PIG-1821.2.patch
>
>
> In code below, if generateKey() returns same value for two udfs, the udfs would end up sharing the properties object. 
> {code}
> private HashMap<Integer, Properties> udfConfs = new HashMap<Integer, Properties>();
>     public Properties getUDFProperties(Class c) {
>         Integer k = generateKey(c);
>         Properties p = udfConfs.get(k);
>         if (p == null) {
>             p = new Properties();
>             udfConfs.put(k, p);
>         }
>         return p;
>     }
>     private int generateKey(Class c) {
>         return c.getName().hashCode();
>     }
>     public Properties getUDFProperties(Class c, String[] args) {
>         Integer k = generateKey(c, args);
>         Properties p = udfConfs.get(k);
>         if (p == null) {
>             p = new Properties();
>             udfConfs.put(k, p);
>         }
>         return p;
>     }
>     private int generateKey(Class c, String[] args) {
>         int hc = c.getName().hashCode();
>         for (int i = 0; i < args.length; i++) {
>             hc <<= 1;
>             hc ^= args[i].hashCode();
>         }
>         return hc;
>     }
> {code}
> To prevent this, a new class (say X) that can hold the classname and args should be created, and instead of HashMap<Integer, Properties>,  HashMap<X, Properties> should be used. Then HahsMap will deal with the collisions. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (PIG-1821) UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)

Posted by "Santhosh Srinivasan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12986682#action_12986682 ] 

Santhosh Srinivasan commented on PIG-1821:
------------------------------------------

Since Pig does not allow function name overloading, can we use fully qualified class names as the key, i..e., HashMap <String, Properties> ?

> UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)
> -----------------------------------------------------------------------------------------------------
>
>                 Key: PIG-1821
>                 URL: https://issues.apache.org/jira/browse/PIG-1821
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Thejas M Nair
>             Fix For: 0.9.0
>
>
> In code below, if generateKey() returns same value for two udfs, the udfs would end up sharing the properties object. 
> {code}
> private HashMap<Integer, Properties> udfConfs = new HashMap<Integer, Properties>();
>     public Properties getUDFProperties(Class c) {
>         Integer k = generateKey(c);
>         Properties p = udfConfs.get(k);
>         if (p == null) {
>             p = new Properties();
>             udfConfs.put(k, p);
>         }
>         return p;
>     }
>     private int generateKey(Class c) {
>         return c.getName().hashCode();
>     }
>     public Properties getUDFProperties(Class c, String[] args) {
>         Integer k = generateKey(c, args);
>         Properties p = udfConfs.get(k);
>         if (p == null) {
>             p = new Properties();
>             udfConfs.put(k, p);
>         }
>         return p;
>     }
>     private int generateKey(Class c, String[] args) {
>         int hc = c.getName().hashCode();
>         for (int i = 0; i < args.length; i++) {
>             hc <<= 1;
>             hc ^= args[i].hashCode();
>         }
>         return hc;
>     }
> {code}
> To prevent this, a new class (say X) that can hold the classname and args should be created, and instead of HashMap<Integer, Properties>,  HashMap<X, Properties> should be used. Then HahsMap will deal with the collisions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] [Updated] (PIG-1821) UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)

Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thejas M Nair updated PIG-1821:
-------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Patch committed to 0.9 branch and trunk.


> UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)
> -----------------------------------------------------------------------------------------------------
>
>                 Key: PIG-1821
>                 URL: https://issues.apache.org/jira/browse/PIG-1821
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Thejas M Nair
>            Assignee: Thejas M Nair
>             Fix For: 0.9.0
>
>         Attachments: PIG-1821.1.patch, PIG-1821.2.patch
>
>
> In code below, if generateKey() returns same value for two udfs, the udfs would end up sharing the properties object. 
> {code}
> private HashMap<Integer, Properties> udfConfs = new HashMap<Integer, Properties>();
>     public Properties getUDFProperties(Class c) {
>         Integer k = generateKey(c);
>         Properties p = udfConfs.get(k);
>         if (p == null) {
>             p = new Properties();
>             udfConfs.put(k, p);
>         }
>         return p;
>     }
>     private int generateKey(Class c) {
>         return c.getName().hashCode();
>     }
>     public Properties getUDFProperties(Class c, String[] args) {
>         Integer k = generateKey(c, args);
>         Properties p = udfConfs.get(k);
>         if (p == null) {
>             p = new Properties();
>             udfConfs.put(k, p);
>         }
>         return p;
>     }
>     private int generateKey(Class c, String[] args) {
>         int hc = c.getName().hashCode();
>         for (int i = 0; i < args.length; i++) {
>             hc <<= 1;
>             hc ^= args[i].hashCode();
>         }
>         return hc;
>     }
> {code}
> To prevent this, a new class (say X) that can hold the classname and args should be created, and instead of HashMap<Integer, Properties>,  HashMap<X, Properties> should be used. Then HahsMap will deal with the collisions. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (PIG-1821) UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)

Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thejas M Nair resolved PIG-1821.
--------------------------------

    Resolution: Fixed

PIG-1821.3.patch - tests passed, patch committed to trunk and 0.9 branch.


> UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)
> -----------------------------------------------------------------------------------------------------
>
>                 Key: PIG-1821
>                 URL: https://issues.apache.org/jira/browse/PIG-1821
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Thejas M Nair
>            Assignee: Thejas M Nair
>             Fix For: 0.9.0
>
>         Attachments: PIG-1821.1.patch, PIG-1821.2.patch, PIG-1821.3.patch
>
>
> In code below, if generateKey() returns same value for two udfs, the udfs would end up sharing the properties object. 
> {code}
> private HashMap<Integer, Properties> udfConfs = new HashMap<Integer, Properties>();
>     public Properties getUDFProperties(Class c) {
>         Integer k = generateKey(c);
>         Properties p = udfConfs.get(k);
>         if (p == null) {
>             p = new Properties();
>             udfConfs.put(k, p);
>         }
>         return p;
>     }
>     private int generateKey(Class c) {
>         return c.getName().hashCode();
>     }
>     public Properties getUDFProperties(Class c, String[] args) {
>         Integer k = generateKey(c, args);
>         Properties p = udfConfs.get(k);
>         if (p == null) {
>             p = new Properties();
>             udfConfs.put(k, p);
>         }
>         return p;
>     }
>     private int generateKey(Class c, String[] args) {
>         int hc = c.getName().hashCode();
>         for (int i = 0; i < args.length; i++) {
>             hc <<= 1;
>             hc ^= args[i].hashCode();
>         }
>         return hc;
>     }
> {code}
> To prevent this, a new class (say X) that can hold the classname and args should be created, and instead of HashMap<Integer, Properties>,  HashMap<X, Properties> should be used. Then HahsMap will deal with the collisions. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-1821) UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)

Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thejas M Nair updated PIG-1821:
-------------------------------

    Attachment: PIG-1821.1.patch

> UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)
> -----------------------------------------------------------------------------------------------------
>
>                 Key: PIG-1821
>                 URL: https://issues.apache.org/jira/browse/PIG-1821
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Thejas M Nair
>            Assignee: Thejas M Nair
>             Fix For: 0.9.0
>
>         Attachments: PIG-1821.1.patch
>
>
> In code below, if generateKey() returns same value for two udfs, the udfs would end up sharing the properties object. 
> {code}
> private HashMap<Integer, Properties> udfConfs = new HashMap<Integer, Properties>();
>     public Properties getUDFProperties(Class c) {
>         Integer k = generateKey(c);
>         Properties p = udfConfs.get(k);
>         if (p == null) {
>             p = new Properties();
>             udfConfs.put(k, p);
>         }
>         return p;
>     }
>     private int generateKey(Class c) {
>         return c.getName().hashCode();
>     }
>     public Properties getUDFProperties(Class c, String[] args) {
>         Integer k = generateKey(c, args);
>         Properties p = udfConfs.get(k);
>         if (p == null) {
>             p = new Properties();
>             udfConfs.put(k, p);
>         }
>         return p;
>     }
>     private int generateKey(Class c, String[] args) {
>         int hc = c.getName().hashCode();
>         for (int i = 0; i < args.length; i++) {
>             hc <<= 1;
>             hc ^= args[i].hashCode();
>         }
>         return hc;
>     }
> {code}
> To prevent this, a new class (say X) that can hold the classname and args should be created, and instead of HashMap<Integer, Properties>,  HashMap<X, Properties> should be used. Then HahsMap will deal with the collisions. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (PIG-1821) UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)

Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thejas M Nair reopened PIG-1821:
--------------------------------


Reopening to address the issue Woody reported.

> UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)
> -----------------------------------------------------------------------------------------------------
>
>                 Key: PIG-1821
>                 URL: https://issues.apache.org/jira/browse/PIG-1821
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Thejas M Nair
>            Assignee: Thejas M Nair
>             Fix For: 0.9.0
>
>         Attachments: PIG-1821.1.patch, PIG-1821.2.patch
>
>
> In code below, if generateKey() returns same value for two udfs, the udfs would end up sharing the properties object. 
> {code}
> private HashMap<Integer, Properties> udfConfs = new HashMap<Integer, Properties>();
>     public Properties getUDFProperties(Class c) {
>         Integer k = generateKey(c);
>         Properties p = udfConfs.get(k);
>         if (p == null) {
>             p = new Properties();
>             udfConfs.put(k, p);
>         }
>         return p;
>     }
>     private int generateKey(Class c) {
>         return c.getName().hashCode();
>     }
>     public Properties getUDFProperties(Class c, String[] args) {
>         Integer k = generateKey(c, args);
>         Properties p = udfConfs.get(k);
>         if (p == null) {
>             p = new Properties();
>             udfConfs.put(k, p);
>         }
>         return p;
>     }
>     private int generateKey(Class c, String[] args) {
>         int hc = c.getName().hashCode();
>         for (int i = 0; i < args.length; i++) {
>             hc <<= 1;
>             hc ^= args[i].hashCode();
>         }
>         return hc;
>     }
> {code}
> To prevent this, a new class (say X) that can hold the classname and args should be created, and instead of HashMap<Integer, Properties>,  HashMap<X, Properties> should be used. Then HahsMap will deal with the collisions. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-1821) UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)

Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027676#comment-13027676 ] 

Thejas M Nair commented on PIG-1821:
------------------------------------

PIG-1821.1.patch - unit tests passed (except TestStoreInstances, which is failing in trunk). test-patch failed because of no new unit tests. There are no new unit tests because it is not easy to create a test case to produce the problem this could have caused.


> UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)
> -----------------------------------------------------------------------------------------------------
>
>                 Key: PIG-1821
>                 URL: https://issues.apache.org/jira/browse/PIG-1821
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Thejas M Nair
>            Assignee: Thejas M Nair
>             Fix For: 0.9.0
>
>         Attachments: PIG-1821.1.patch
>
>
> In code below, if generateKey() returns same value for two udfs, the udfs would end up sharing the properties object. 
> {code}
> private HashMap<Integer, Properties> udfConfs = new HashMap<Integer, Properties>();
>     public Properties getUDFProperties(Class c) {
>         Integer k = generateKey(c);
>         Properties p = udfConfs.get(k);
>         if (p == null) {
>             p = new Properties();
>             udfConfs.put(k, p);
>         }
>         return p;
>     }
>     private int generateKey(Class c) {
>         return c.getName().hashCode();
>     }
>     public Properties getUDFProperties(Class c, String[] args) {
>         Integer k = generateKey(c, args);
>         Properties p = udfConfs.get(k);
>         if (p == null) {
>             p = new Properties();
>             udfConfs.put(k, p);
>         }
>         return p;
>     }
>     private int generateKey(Class c, String[] args) {
>         int hc = c.getName().hashCode();
>         for (int i = 0; i < args.length; i++) {
>             hc <<= 1;
>             hc ^= args[i].hashCode();
>         }
>         return hc;
>     }
> {code}
> To prevent this, a new class (say X) that can hold the classname and args should be created, and instead of HashMap<Integer, Properties>,  HashMap<X, Properties> should be used. Then HahsMap will deal with the collisions. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-1821) UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030243#comment-13030243 ] 

Daniel Dai commented on PIG-1821:
---------------------------------

+1

> UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)
> -----------------------------------------------------------------------------------------------------
>
>                 Key: PIG-1821
>                 URL: https://issues.apache.org/jira/browse/PIG-1821
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Thejas M Nair
>            Assignee: Thejas M Nair
>             Fix For: 0.9.0
>
>         Attachments: PIG-1821.1.patch, PIG-1821.2.patch, PIG-1821.3.patch
>
>
> In code below, if generateKey() returns same value for two udfs, the udfs would end up sharing the properties object. 
> {code}
> private HashMap<Integer, Properties> udfConfs = new HashMap<Integer, Properties>();
>     public Properties getUDFProperties(Class c) {
>         Integer k = generateKey(c);
>         Properties p = udfConfs.get(k);
>         if (p == null) {
>             p = new Properties();
>             udfConfs.put(k, p);
>         }
>         return p;
>     }
>     private int generateKey(Class c) {
>         return c.getName().hashCode();
>     }
>     public Properties getUDFProperties(Class c, String[] args) {
>         Integer k = generateKey(c, args);
>         Properties p = udfConfs.get(k);
>         if (p == null) {
>             p = new Properties();
>             udfConfs.put(k, p);
>         }
>         return p;
>     }
>     private int generateKey(Class c, String[] args) {
>         int hc = c.getName().hashCode();
>         for (int i = 0; i < args.length; i++) {
>             hc <<= 1;
>             hc ^= args[i].hashCode();
>         }
>         return hc;
>     }
> {code}
> To prevent this, a new class (say X) that can hold the classname and args should be created, and instead of HashMap<Integer, Properties>,  HashMap<X, Properties> should be used. Then HahsMap will deal with the collisions. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (PIG-1821) UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)

Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12986705#action_12986705 ] 

Thejas M Nair commented on PIG-1821:
------------------------------------

bq. Since Pig does not allow function name overloading, can we use fully qualified class names as the key, i..e., HashMap <String, Properties> ? 

Yes, but getUDFProperties(Class c, String[] args) also needs to be supported. So List<String> is more appropriate. 

> UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)
> -----------------------------------------------------------------------------------------------------
>
>                 Key: PIG-1821
>                 URL: https://issues.apache.org/jira/browse/PIG-1821
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Thejas M Nair
>             Fix For: 0.9.0
>
>
> In code below, if generateKey() returns same value for two udfs, the udfs would end up sharing the properties object. 
> {code}
> private HashMap<Integer, Properties> udfConfs = new HashMap<Integer, Properties>();
>     public Properties getUDFProperties(Class c) {
>         Integer k = generateKey(c);
>         Properties p = udfConfs.get(k);
>         if (p == null) {
>             p = new Properties();
>             udfConfs.put(k, p);
>         }
>         return p;
>     }
>     private int generateKey(Class c) {
>         return c.getName().hashCode();
>     }
>     public Properties getUDFProperties(Class c, String[] args) {
>         Integer k = generateKey(c, args);
>         Properties p = udfConfs.get(k);
>         if (p == null) {
>             p = new Properties();
>             udfConfs.put(k, p);
>         }
>         return p;
>     }
>     private int generateKey(Class c, String[] args) {
>         int hc = c.getName().hashCode();
>         for (int i = 0; i < args.length; i++) {
>             hc <<= 1;
>             hc ^= args[i].hashCode();
>         }
>         return hc;
>     }
> {code}
> To prevent this, a new class (say X) that can hold the classname and args should be created, and instead of HashMap<Integer, Properties>,  HashMap<X, Properties> should be used. Then HahsMap will deal with the collisions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] [Updated] (PIG-1821) UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)

Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thejas M Nair updated PIG-1821:
-------------------------------

    Attachment: PIG-1821.3.patch

PIG-1821.3.patch - fixes problem reported by Woody. Replaced Class in UDFContext key with class name (String).


> UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)
> -----------------------------------------------------------------------------------------------------
>
>                 Key: PIG-1821
>                 URL: https://issues.apache.org/jira/browse/PIG-1821
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Thejas M Nair
>            Assignee: Thejas M Nair
>             Fix For: 0.9.0
>
>         Attachments: PIG-1821.1.patch, PIG-1821.2.patch, PIG-1821.3.patch
>
>
> In code below, if generateKey() returns same value for two udfs, the udfs would end up sharing the properties object. 
> {code}
> private HashMap<Integer, Properties> udfConfs = new HashMap<Integer, Properties>();
>     public Properties getUDFProperties(Class c) {
>         Integer k = generateKey(c);
>         Properties p = udfConfs.get(k);
>         if (p == null) {
>             p = new Properties();
>             udfConfs.put(k, p);
>         }
>         return p;
>     }
>     private int generateKey(Class c) {
>         return c.getName().hashCode();
>     }
>     public Properties getUDFProperties(Class c, String[] args) {
>         Integer k = generateKey(c, args);
>         Properties p = udfConfs.get(k);
>         if (p == null) {
>             p = new Properties();
>             udfConfs.put(k, p);
>         }
>         return p;
>     }
>     private int generateKey(Class c, String[] args) {
>         int hc = c.getName().hashCode();
>         for (int i = 0; i < args.length; i++) {
>             hc <<= 1;
>             hc ^= args[i].hashCode();
>         }
>         return hc;
>     }
> {code}
> To prevent this, a new class (say X) that can hold the classname and args should be created, and instead of HashMap<Integer, Properties>,  HashMap<X, Properties> should be used. Then HahsMap will deal with the collisions. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira