You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Thejas M Nair (JIRA)" <ji...@apache.org> on 2011/01/25 21:33:44 UTC
[jira] Created: (PIG-1821) UDFContext.getUDFProperties does not
handle collisions in hashcode of udf classname (+ arg hashcodes)
UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)
-----------------------------------------------------------------------------------------------------
Key: PIG-1821
URL: https://issues.apache.org/jira/browse/PIG-1821
Project: Pig
Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Thejas M Nair
Fix For: 0.9.0
In code below, if generateKey() returns same value for two udfs, the udfs would end up sharing the properties object.
{code}
private HashMap<Integer, Properties> udfConfs = new HashMap<Integer, Properties>();
public Properties getUDFProperties(Class c) {
Integer k = generateKey(c);
Properties p = udfConfs.get(k);
if (p == null) {
p = new Properties();
udfConfs.put(k, p);
}
return p;
}
private int generateKey(Class c) {
return c.getName().hashCode();
}
public Properties getUDFProperties(Class c, String[] args) {
Integer k = generateKey(c, args);
Properties p = udfConfs.get(k);
if (p == null) {
p = new Properties();
udfConfs.put(k, p);
}
return p;
}
private int generateKey(Class c, String[] args) {
int hc = c.getName().hashCode();
for (int i = 0; i < args.length; i++) {
hc <<= 1;
hc ^= args[i].hashCode();
}
return hc;
}
{code}
To prevent this, a new class (say X) that can hold the classname and args should be created, and instead of HashMap<Integer, Properties>, HashMap<X, Properties> should be used. Then HahsMap will deal with the collisions.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] [Commented] (PIG-1821) UDFContext.getUDFProperties does not
handle collisions in hashcode of udf classname (+ arg hashcodes)
Posted by "Richard Ding (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028438#comment-13028438 ]
Richard Ding commented on PIG-1821:
-----------------------------------
+1
> UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)
> -----------------------------------------------------------------------------------------------------
>
> Key: PIG-1821
> URL: https://issues.apache.org/jira/browse/PIG-1821
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.0
> Reporter: Thejas M Nair
> Assignee: Thejas M Nair
> Fix For: 0.9.0
>
> Attachments: PIG-1821.1.patch, PIG-1821.2.patch
>
>
> In code below, if generateKey() returns same value for two udfs, the udfs would end up sharing the properties object.
> {code}
> private HashMap<Integer, Properties> udfConfs = new HashMap<Integer, Properties>();
> public Properties getUDFProperties(Class c) {
> Integer k = generateKey(c);
> Properties p = udfConfs.get(k);
> if (p == null) {
> p = new Properties();
> udfConfs.put(k, p);
> }
> return p;
> }
> private int generateKey(Class c) {
> return c.getName().hashCode();
> }
> public Properties getUDFProperties(Class c, String[] args) {
> Integer k = generateKey(c, args);
> Properties p = udfConfs.get(k);
> if (p == null) {
> p = new Properties();
> udfConfs.put(k, p);
> }
> return p;
> }
> private int generateKey(Class c, String[] args) {
> int hc = c.getName().hashCode();
> for (int i = 0; i < args.length; i++) {
> hc <<= 1;
> hc ^= args[i].hashCode();
> }
> return hc;
> }
> {code}
> To prevent this, a new class (say X) that can hold the classname and args should be created, and instead of HashMap<Integer, Properties>, HashMap<X, Properties> should be used. Then HahsMap will deal with the collisions.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-1821) UDFContext.getUDFProperties does not
handle collisions in hashcode of udf classname (+ arg hashcodes)
Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Thejas M Nair updated PIG-1821:
-------------------------------
Attachment: PIG-1821.2.patch
PIG-1821.2.patch - Changes to incorporate comments from Richard. Passes unit tests and test-patch.
> UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)
> -----------------------------------------------------------------------------------------------------
>
> Key: PIG-1821
> URL: https://issues.apache.org/jira/browse/PIG-1821
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.0
> Reporter: Thejas M Nair
> Assignee: Thejas M Nair
> Fix For: 0.9.0
>
> Attachments: PIG-1821.1.patch, PIG-1821.2.patch
>
>
> In code below, if generateKey() returns same value for two udfs, the udfs would end up sharing the properties object.
> {code}
> private HashMap<Integer, Properties> udfConfs = new HashMap<Integer, Properties>();
> public Properties getUDFProperties(Class c) {
> Integer k = generateKey(c);
> Properties p = udfConfs.get(k);
> if (p == null) {
> p = new Properties();
> udfConfs.put(k, p);
> }
> return p;
> }
> private int generateKey(Class c) {
> return c.getName().hashCode();
> }
> public Properties getUDFProperties(Class c, String[] args) {
> Integer k = generateKey(c, args);
> Properties p = udfConfs.get(k);
> if (p == null) {
> p = new Properties();
> udfConfs.put(k, p);
> }
> return p;
> }
> private int generateKey(Class c, String[] args) {
> int hc = c.getName().hashCode();
> for (int i = 0; i < args.length; i++) {
> hc <<= 1;
> hc ^= args[i].hashCode();
> }
> return hc;
> }
> {code}
> To prevent this, a new class (say X) that can hold the classname and args should be created, and instead of HashMap<Integer, Properties>, HashMap<X, Properties> should be used. Then HahsMap will deal with the collisions.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (PIG-1821) UDFContext.getUDFProperties does not
handle collisions in hashcode of udf classname (+ arg hashcodes)
Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12986707#action_12986707 ]
Thejas M Nair commented on PIG-1821:
------------------------------------
I meant to say that List<String> can be used as key, ie HashMap <List<String>, Properties>
> UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)
> -----------------------------------------------------------------------------------------------------
>
> Key: PIG-1821
> URL: https://issues.apache.org/jira/browse/PIG-1821
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.0
> Reporter: Thejas M Nair
> Fix For: 0.9.0
>
>
> In code below, if generateKey() returns same value for two udfs, the udfs would end up sharing the properties object.
> {code}
> private HashMap<Integer, Properties> udfConfs = new HashMap<Integer, Properties>();
> public Properties getUDFProperties(Class c) {
> Integer k = generateKey(c);
> Properties p = udfConfs.get(k);
> if (p == null) {
> p = new Properties();
> udfConfs.put(k, p);
> }
> return p;
> }
> private int generateKey(Class c) {
> return c.getName().hashCode();
> }
> public Properties getUDFProperties(Class c, String[] args) {
> Integer k = generateKey(c, args);
> Properties p = udfConfs.get(k);
> if (p == null) {
> p = new Properties();
> udfConfs.put(k, p);
> }
> return p;
> }
> private int generateKey(Class c, String[] args) {
> int hc = c.getName().hashCode();
> for (int i = 0; i < args.length; i++) {
> hc <<= 1;
> hc ^= args[i].hashCode();
> }
> return hc;
> }
> {code}
> To prevent this, a new class (say X) that can hold the classname and args should be created, and instead of HashMap<Integer, Properties>, HashMap<X, Properties> should be used. Then HahsMap will deal with the collisions.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] [Updated] (PIG-1821) UDFContext.getUDFProperties does not
handle collisions in hashcode of udf classname (+ arg hashcodes)
Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Thejas M Nair updated PIG-1821:
-------------------------------
Status: Patch Available (was: Open)
> UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)
> -----------------------------------------------------------------------------------------------------
>
> Key: PIG-1821
> URL: https://issues.apache.org/jira/browse/PIG-1821
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.0
> Reporter: Thejas M Nair
> Assignee: Thejas M Nair
> Fix For: 0.9.0
>
> Attachments: PIG-1821.1.patch
>
>
> In code below, if generateKey() returns same value for two udfs, the udfs would end up sharing the properties object.
> {code}
> private HashMap<Integer, Properties> udfConfs = new HashMap<Integer, Properties>();
> public Properties getUDFProperties(Class c) {
> Integer k = generateKey(c);
> Properties p = udfConfs.get(k);
> if (p == null) {
> p = new Properties();
> udfConfs.put(k, p);
> }
> return p;
> }
> private int generateKey(Class c) {
> return c.getName().hashCode();
> }
> public Properties getUDFProperties(Class c, String[] args) {
> Integer k = generateKey(c, args);
> Properties p = udfConfs.get(k);
> if (p == null) {
> p = new Properties();
> udfConfs.put(k, p);
> }
> return p;
> }
> private int generateKey(Class c, String[] args) {
> int hc = c.getName().hashCode();
> for (int i = 0; i < args.length; i++) {
> hc <<= 1;
> hc ^= args[i].hashCode();
> }
> return hc;
> }
> {code}
> To prevent this, a new class (say X) that can hold the classname and args should be created, and instead of HashMap<Integer, Properties>, HashMap<X, Properties> should be used. Then HahsMap will deal with the collisions.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Assigned: (PIG-1821) UDFContext.getUDFProperties does not
handle collisions in hashcode of udf classname (+ arg hashcodes)
Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Olga Natkovich reassigned PIG-1821:
-----------------------------------
Assignee: Thejas M Nair
> UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)
> -----------------------------------------------------------------------------------------------------
>
> Key: PIG-1821
> URL: https://issues.apache.org/jira/browse/PIG-1821
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.0
> Reporter: Thejas M Nair
> Assignee: Thejas M Nair
> Fix For: 0.9.0
>
>
> In code below, if generateKey() returns same value for two udfs, the udfs would end up sharing the properties object.
> {code}
> private HashMap<Integer, Properties> udfConfs = new HashMap<Integer, Properties>();
> public Properties getUDFProperties(Class c) {
> Integer k = generateKey(c);
> Properties p = udfConfs.get(k);
> if (p == null) {
> p = new Properties();
> udfConfs.put(k, p);
> }
> return p;
> }
> private int generateKey(Class c) {
> return c.getName().hashCode();
> }
> public Properties getUDFProperties(Class c, String[] args) {
> Integer k = generateKey(c, args);
> Properties p = udfConfs.get(k);
> if (p == null) {
> p = new Properties();
> udfConfs.put(k, p);
> }
> return p;
> }
> private int generateKey(Class c, String[] args) {
> int hc = c.getName().hashCode();
> for (int i = 0; i < args.length; i++) {
> hc <<= 1;
> hc ^= args[i].hashCode();
> }
> return hc;
> }
> {code}
> To prevent this, a new class (say X) that can hold the classname and args should be created, and instead of HashMap<Integer, Properties>, HashMap<X, Properties> should be used. Then HahsMap will deal with the collisions.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1821) UDFContext.getUDFProperties does not
handle collisions in hashcode of udf classname (+ arg hashcodes)
Posted by "Woody Anderson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030188#comment-13030188 ]
Woody Anderson commented on PIG-1821:
-------------------------------------
this checkin has caused a classloader failure when using my loader UDF.
i've narrowed it the the checking that references this bug:
r1099860 | thejas | 2011-05-05 09:10:26 -0700 (Thu, 05 May 2011) | 3 lines
PIG-1821: UDFContext.getUDFProperties does not handle collisions
in hashcode of udf classname (+ arg hashcodes) (thejas)
I rebuilt my loader against the new pig jar, still fails.
here's the output of my code (it works if i run using pig built with the previous revision):
Backend error message during job submission
-------------------------------------------
java.io.IOException: Deserialization error: com.yahoo.ymail.pigfunctions.AsStorage
at org.apache.pig.impl.util.ObjectSerializer.deserialize(ObjectSerializer.java:55)
at org.apache.pig.impl.util.UDFContext.deserialize(UDFContext.java:183)
at org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil.setupUDFContext(MapRedUtil.java:155)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.setupUdfEnvAndStores(PigOutputFormat.java:228)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.checkOutputSpecs(PigOutputFormat.java:185)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:770)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
at java.lang.Thread.run(Thread.java:680)
Caused by: java.lang.ClassNotFoundException: com.yahoo.ymail.pigfunctions.AsStorage
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:603)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1574)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1495)
at java.io.ObjectInputStream.readClass(ObjectInputStream.java:1461)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1311)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
at java.util.HashMap.readObject(HashMap.java:1029)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:974)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1848)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
at org.apache.pig.impl.util.ObjectSerializer.deserialize(ObjectSerializer.java:53)
... 10 more
> UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)
> -----------------------------------------------------------------------------------------------------
>
> Key: PIG-1821
> URL: https://issues.apache.org/jira/browse/PIG-1821
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.0
> Reporter: Thejas M Nair
> Assignee: Thejas M Nair
> Fix For: 0.9.0
>
> Attachments: PIG-1821.1.patch, PIG-1821.2.patch
>
>
> In code below, if generateKey() returns same value for two udfs, the udfs would end up sharing the properties object.
> {code}
> private HashMap<Integer, Properties> udfConfs = new HashMap<Integer, Properties>();
> public Properties getUDFProperties(Class c) {
> Integer k = generateKey(c);
> Properties p = udfConfs.get(k);
> if (p == null) {
> p = new Properties();
> udfConfs.put(k, p);
> }
> return p;
> }
> private int generateKey(Class c) {
> return c.getName().hashCode();
> }
> public Properties getUDFProperties(Class c, String[] args) {
> Integer k = generateKey(c, args);
> Properties p = udfConfs.get(k);
> if (p == null) {
> p = new Properties();
> udfConfs.put(k, p);
> }
> return p;
> }
> private int generateKey(Class c, String[] args) {
> int hc = c.getName().hashCode();
> for (int i = 0; i < args.length; i++) {
> hc <<= 1;
> hc ^= args[i].hashCode();
> }
> return hc;
> }
> {code}
> To prevent this, a new class (say X) that can hold the classname and args should be created, and instead of HashMap<Integer, Properties>, HashMap<X, Properties> should be used. Then HahsMap will deal with the collisions.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (PIG-1821) UDFContext.getUDFProperties does not
handle collisions in hashcode of udf classname (+ arg hashcodes)
Posted by "Santhosh Srinivasan (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12986682#action_12986682 ]
Santhosh Srinivasan commented on PIG-1821:
------------------------------------------
Since Pig does not allow function name overloading, can we use fully qualified class names as the key, i..e., HashMap <String, Properties> ?
> UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)
> -----------------------------------------------------------------------------------------------------
>
> Key: PIG-1821
> URL: https://issues.apache.org/jira/browse/PIG-1821
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.0
> Reporter: Thejas M Nair
> Fix For: 0.9.0
>
>
> In code below, if generateKey() returns same value for two udfs, the udfs would end up sharing the properties object.
> {code}
> private HashMap<Integer, Properties> udfConfs = new HashMap<Integer, Properties>();
> public Properties getUDFProperties(Class c) {
> Integer k = generateKey(c);
> Properties p = udfConfs.get(k);
> if (p == null) {
> p = new Properties();
> udfConfs.put(k, p);
> }
> return p;
> }
> private int generateKey(Class c) {
> return c.getName().hashCode();
> }
> public Properties getUDFProperties(Class c, String[] args) {
> Integer k = generateKey(c, args);
> Properties p = udfConfs.get(k);
> if (p == null) {
> p = new Properties();
> udfConfs.put(k, p);
> }
> return p;
> }
> private int generateKey(Class c, String[] args) {
> int hc = c.getName().hashCode();
> for (int i = 0; i < args.length; i++) {
> hc <<= 1;
> hc ^= args[i].hashCode();
> }
> return hc;
> }
> {code}
> To prevent this, a new class (say X) that can hold the classname and args should be created, and instead of HashMap<Integer, Properties>, HashMap<X, Properties> should be used. Then HahsMap will deal with the collisions.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] [Updated] (PIG-1821) UDFContext.getUDFProperties does not
handle collisions in hashcode of udf classname (+ arg hashcodes)
Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Thejas M Nair updated PIG-1821:
-------------------------------
Resolution: Fixed
Status: Resolved (was: Patch Available)
Patch committed to 0.9 branch and trunk.
> UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)
> -----------------------------------------------------------------------------------------------------
>
> Key: PIG-1821
> URL: https://issues.apache.org/jira/browse/PIG-1821
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.0
> Reporter: Thejas M Nair
> Assignee: Thejas M Nair
> Fix For: 0.9.0
>
> Attachments: PIG-1821.1.patch, PIG-1821.2.patch
>
>
> In code below, if generateKey() returns same value for two udfs, the udfs would end up sharing the properties object.
> {code}
> private HashMap<Integer, Properties> udfConfs = new HashMap<Integer, Properties>();
> public Properties getUDFProperties(Class c) {
> Integer k = generateKey(c);
> Properties p = udfConfs.get(k);
> if (p == null) {
> p = new Properties();
> udfConfs.put(k, p);
> }
> return p;
> }
> private int generateKey(Class c) {
> return c.getName().hashCode();
> }
> public Properties getUDFProperties(Class c, String[] args) {
> Integer k = generateKey(c, args);
> Properties p = udfConfs.get(k);
> if (p == null) {
> p = new Properties();
> udfConfs.put(k, p);
> }
> return p;
> }
> private int generateKey(Class c, String[] args) {
> int hc = c.getName().hashCode();
> for (int i = 0; i < args.length; i++) {
> hc <<= 1;
> hc ^= args[i].hashCode();
> }
> return hc;
> }
> {code}
> To prevent this, a new class (say X) that can hold the classname and args should be created, and instead of HashMap<Integer, Properties>, HashMap<X, Properties> should be used. Then HahsMap will deal with the collisions.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-1821) UDFContext.getUDFProperties does not
handle collisions in hashcode of udf classname (+ arg hashcodes)
Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Thejas M Nair resolved PIG-1821.
--------------------------------
Resolution: Fixed
PIG-1821.3.patch - tests passed, patch committed to trunk and 0.9 branch.
> UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)
> -----------------------------------------------------------------------------------------------------
>
> Key: PIG-1821
> URL: https://issues.apache.org/jira/browse/PIG-1821
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.0
> Reporter: Thejas M Nair
> Assignee: Thejas M Nair
> Fix For: 0.9.0
>
> Attachments: PIG-1821.1.patch, PIG-1821.2.patch, PIG-1821.3.patch
>
>
> In code below, if generateKey() returns same value for two udfs, the udfs would end up sharing the properties object.
> {code}
> private HashMap<Integer, Properties> udfConfs = new HashMap<Integer, Properties>();
> public Properties getUDFProperties(Class c) {
> Integer k = generateKey(c);
> Properties p = udfConfs.get(k);
> if (p == null) {
> p = new Properties();
> udfConfs.put(k, p);
> }
> return p;
> }
> private int generateKey(Class c) {
> return c.getName().hashCode();
> }
> public Properties getUDFProperties(Class c, String[] args) {
> Integer k = generateKey(c, args);
> Properties p = udfConfs.get(k);
> if (p == null) {
> p = new Properties();
> udfConfs.put(k, p);
> }
> return p;
> }
> private int generateKey(Class c, String[] args) {
> int hc = c.getName().hashCode();
> for (int i = 0; i < args.length; i++) {
> hc <<= 1;
> hc ^= args[i].hashCode();
> }
> return hc;
> }
> {code}
> To prevent this, a new class (say X) that can hold the classname and args should be created, and instead of HashMap<Integer, Properties>, HashMap<X, Properties> should be used. Then HahsMap will deal with the collisions.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-1821) UDFContext.getUDFProperties does not
handle collisions in hashcode of udf classname (+ arg hashcodes)
Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Thejas M Nair updated PIG-1821:
-------------------------------
Attachment: PIG-1821.1.patch
> UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)
> -----------------------------------------------------------------------------------------------------
>
> Key: PIG-1821
> URL: https://issues.apache.org/jira/browse/PIG-1821
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.0
> Reporter: Thejas M Nair
> Assignee: Thejas M Nair
> Fix For: 0.9.0
>
> Attachments: PIG-1821.1.patch
>
>
> In code below, if generateKey() returns same value for two udfs, the udfs would end up sharing the properties object.
> {code}
> private HashMap<Integer, Properties> udfConfs = new HashMap<Integer, Properties>();
> public Properties getUDFProperties(Class c) {
> Integer k = generateKey(c);
> Properties p = udfConfs.get(k);
> if (p == null) {
> p = new Properties();
> udfConfs.put(k, p);
> }
> return p;
> }
> private int generateKey(Class c) {
> return c.getName().hashCode();
> }
> public Properties getUDFProperties(Class c, String[] args) {
> Integer k = generateKey(c, args);
> Properties p = udfConfs.get(k);
> if (p == null) {
> p = new Properties();
> udfConfs.put(k, p);
> }
> return p;
> }
> private int generateKey(Class c, String[] args) {
> int hc = c.getName().hashCode();
> for (int i = 0; i < args.length; i++) {
> hc <<= 1;
> hc ^= args[i].hashCode();
> }
> return hc;
> }
> {code}
> To prevent this, a new class (say X) that can hold the classname and args should be created, and instead of HashMap<Integer, Properties>, HashMap<X, Properties> should be used. Then HahsMap will deal with the collisions.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (PIG-1821) UDFContext.getUDFProperties does not
handle collisions in hashcode of udf classname (+ arg hashcodes)
Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Thejas M Nair reopened PIG-1821:
--------------------------------
Reopening to address the issue Woody reported.
> UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)
> -----------------------------------------------------------------------------------------------------
>
> Key: PIG-1821
> URL: https://issues.apache.org/jira/browse/PIG-1821
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.0
> Reporter: Thejas M Nair
> Assignee: Thejas M Nair
> Fix For: 0.9.0
>
> Attachments: PIG-1821.1.patch, PIG-1821.2.patch
>
>
> In code below, if generateKey() returns same value for two udfs, the udfs would end up sharing the properties object.
> {code}
> private HashMap<Integer, Properties> udfConfs = new HashMap<Integer, Properties>();
> public Properties getUDFProperties(Class c) {
> Integer k = generateKey(c);
> Properties p = udfConfs.get(k);
> if (p == null) {
> p = new Properties();
> udfConfs.put(k, p);
> }
> return p;
> }
> private int generateKey(Class c) {
> return c.getName().hashCode();
> }
> public Properties getUDFProperties(Class c, String[] args) {
> Integer k = generateKey(c, args);
> Properties p = udfConfs.get(k);
> if (p == null) {
> p = new Properties();
> udfConfs.put(k, p);
> }
> return p;
> }
> private int generateKey(Class c, String[] args) {
> int hc = c.getName().hashCode();
> for (int i = 0; i < args.length; i++) {
> hc <<= 1;
> hc ^= args[i].hashCode();
> }
> return hc;
> }
> {code}
> To prevent this, a new class (say X) that can hold the classname and args should be created, and instead of HashMap<Integer, Properties>, HashMap<X, Properties> should be used. Then HahsMap will deal with the collisions.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1821) UDFContext.getUDFProperties does not
handle collisions in hashcode of udf classname (+ arg hashcodes)
Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027676#comment-13027676 ]
Thejas M Nair commented on PIG-1821:
------------------------------------
PIG-1821.1.patch - unit tests passed (except TestStoreInstances, which is failing in trunk). test-patch failed because of no new unit tests. There are no new unit tests because it is not easy to create a test case to produce the problem this could have caused.
> UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)
> -----------------------------------------------------------------------------------------------------
>
> Key: PIG-1821
> URL: https://issues.apache.org/jira/browse/PIG-1821
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.0
> Reporter: Thejas M Nair
> Assignee: Thejas M Nair
> Fix For: 0.9.0
>
> Attachments: PIG-1821.1.patch
>
>
> In code below, if generateKey() returns same value for two udfs, the udfs would end up sharing the properties object.
> {code}
> private HashMap<Integer, Properties> udfConfs = new HashMap<Integer, Properties>();
> public Properties getUDFProperties(Class c) {
> Integer k = generateKey(c);
> Properties p = udfConfs.get(k);
> if (p == null) {
> p = new Properties();
> udfConfs.put(k, p);
> }
> return p;
> }
> private int generateKey(Class c) {
> return c.getName().hashCode();
> }
> public Properties getUDFProperties(Class c, String[] args) {
> Integer k = generateKey(c, args);
> Properties p = udfConfs.get(k);
> if (p == null) {
> p = new Properties();
> udfConfs.put(k, p);
> }
> return p;
> }
> private int generateKey(Class c, String[] args) {
> int hc = c.getName().hashCode();
> for (int i = 0; i < args.length; i++) {
> hc <<= 1;
> hc ^= args[i].hashCode();
> }
> return hc;
> }
> {code}
> To prevent this, a new class (say X) that can hold the classname and args should be created, and instead of HashMap<Integer, Properties>, HashMap<X, Properties> should be used. Then HahsMap will deal with the collisions.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1821) UDFContext.getUDFProperties does not
handle collisions in hashcode of udf classname (+ arg hashcodes)
Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030243#comment-13030243 ]
Daniel Dai commented on PIG-1821:
---------------------------------
+1
> UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)
> -----------------------------------------------------------------------------------------------------
>
> Key: PIG-1821
> URL: https://issues.apache.org/jira/browse/PIG-1821
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.0
> Reporter: Thejas M Nair
> Assignee: Thejas M Nair
> Fix For: 0.9.0
>
> Attachments: PIG-1821.1.patch, PIG-1821.2.patch, PIG-1821.3.patch
>
>
> In code below, if generateKey() returns same value for two udfs, the udfs would end up sharing the properties object.
> {code}
> private HashMap<Integer, Properties> udfConfs = new HashMap<Integer, Properties>();
> public Properties getUDFProperties(Class c) {
> Integer k = generateKey(c);
> Properties p = udfConfs.get(k);
> if (p == null) {
> p = new Properties();
> udfConfs.put(k, p);
> }
> return p;
> }
> private int generateKey(Class c) {
> return c.getName().hashCode();
> }
> public Properties getUDFProperties(Class c, String[] args) {
> Integer k = generateKey(c, args);
> Properties p = udfConfs.get(k);
> if (p == null) {
> p = new Properties();
> udfConfs.put(k, p);
> }
> return p;
> }
> private int generateKey(Class c, String[] args) {
> int hc = c.getName().hashCode();
> for (int i = 0; i < args.length; i++) {
> hc <<= 1;
> hc ^= args[i].hashCode();
> }
> return hc;
> }
> {code}
> To prevent this, a new class (say X) that can hold the classname and args should be created, and instead of HashMap<Integer, Properties>, HashMap<X, Properties> should be used. Then HahsMap will deal with the collisions.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (PIG-1821) UDFContext.getUDFProperties does not
handle collisions in hashcode of udf classname (+ arg hashcodes)
Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12986705#action_12986705 ]
Thejas M Nair commented on PIG-1821:
------------------------------------
bq. Since Pig does not allow function name overloading, can we use fully qualified class names as the key, i..e., HashMap <String, Properties> ?
Yes, but getUDFProperties(Class c, String[] args) also needs to be supported. So List<String> is more appropriate.
> UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)
> -----------------------------------------------------------------------------------------------------
>
> Key: PIG-1821
> URL: https://issues.apache.org/jira/browse/PIG-1821
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.0
> Reporter: Thejas M Nair
> Fix For: 0.9.0
>
>
> In code below, if generateKey() returns same value for two udfs, the udfs would end up sharing the properties object.
> {code}
> private HashMap<Integer, Properties> udfConfs = new HashMap<Integer, Properties>();
> public Properties getUDFProperties(Class c) {
> Integer k = generateKey(c);
> Properties p = udfConfs.get(k);
> if (p == null) {
> p = new Properties();
> udfConfs.put(k, p);
> }
> return p;
> }
> private int generateKey(Class c) {
> return c.getName().hashCode();
> }
> public Properties getUDFProperties(Class c, String[] args) {
> Integer k = generateKey(c, args);
> Properties p = udfConfs.get(k);
> if (p == null) {
> p = new Properties();
> udfConfs.put(k, p);
> }
> return p;
> }
> private int generateKey(Class c, String[] args) {
> int hc = c.getName().hashCode();
> for (int i = 0; i < args.length; i++) {
> hc <<= 1;
> hc ^= args[i].hashCode();
> }
> return hc;
> }
> {code}
> To prevent this, a new class (say X) that can hold the classname and args should be created, and instead of HashMap<Integer, Properties>, HashMap<X, Properties> should be used. Then HahsMap will deal with the collisions.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] [Updated] (PIG-1821) UDFContext.getUDFProperties does not
handle collisions in hashcode of udf classname (+ arg hashcodes)
Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Thejas M Nair updated PIG-1821:
-------------------------------
Attachment: PIG-1821.3.patch
PIG-1821.3.patch - fixes problem reported by Woody. Replaced Class in UDFContext key with class name (String).
> UDFContext.getUDFProperties does not handle collisions in hashcode of udf classname (+ arg hashcodes)
> -----------------------------------------------------------------------------------------------------
>
> Key: PIG-1821
> URL: https://issues.apache.org/jira/browse/PIG-1821
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.0
> Reporter: Thejas M Nair
> Assignee: Thejas M Nair
> Fix For: 0.9.0
>
> Attachments: PIG-1821.1.patch, PIG-1821.2.patch, PIG-1821.3.patch
>
>
> In code below, if generateKey() returns same value for two udfs, the udfs would end up sharing the properties object.
> {code}
> private HashMap<Integer, Properties> udfConfs = new HashMap<Integer, Properties>();
> public Properties getUDFProperties(Class c) {
> Integer k = generateKey(c);
> Properties p = udfConfs.get(k);
> if (p == null) {
> p = new Properties();
> udfConfs.put(k, p);
> }
> return p;
> }
> private int generateKey(Class c) {
> return c.getName().hashCode();
> }
> public Properties getUDFProperties(Class c, String[] args) {
> Integer k = generateKey(c, args);
> Properties p = udfConfs.get(k);
> if (p == null) {
> p = new Properties();
> udfConfs.put(k, p);
> }
> return p;
> }
> private int generateKey(Class c, String[] args) {
> int hc = c.getName().hashCode();
> for (int i = 0; i < args.length; i++) {
> hc <<= 1;
> hc ^= args[i].hashCode();
> }
> return hc;
> }
> {code}
> To prevent this, a new class (say X) that can hold the classname and args should be created, and instead of HashMap<Integer, Properties>, HashMap<X, Properties> should be used. Then HahsMap will deal with the collisions.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira