You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Owen O'Malley (JIRA)" <ji...@apache.org> on 2008/09/25 18:37:44 UTC

[jira] Created: (HADOOP-4276) The mapred.*ID classes are inefficient for hashCode and serialization

The mapred.*ID classes are inefficient for hashCode and serialization
---------------------------------------------------------------------

                 Key: HADOOP-4276
                 URL: https://issues.apache.org/jira/browse/HADOOP-4276
             Project: Hadoop Core
          Issue Type: Improvement
            Reporter: Owen O'Malley
             Fix For: 0.20.0


Currently the ID classes call toString and hash the resulting string rather than computing a hash directly.

The ID classes also create new instances of the higher level object in readFields (via read) rather than re-using the object via readFields.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4276) The mapred.*ID classes are inefficient for hashCode and serialization

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642273#action_12642273 ] 

Chris Douglas commented on HADOOP-4276:
---------------------------------------

+1

> The mapred.*ID classes are inefficient for hashCode and serialization
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-4276
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4276
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.20.0
>
>         Attachments: h4276.patch, h4276.patch
>
>
> Currently the ID classes call toString and hash the resulting string rather than computing a hash directly.
> The ID classes also create new instances of the higher level object in readFields (via read) rather than re-using the object via readFields.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4276) The mapred.*ID classes are inefficient for hashCode and serialization

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642464#action_12642464 ] 

Hudson commented on HADOOP-4276:
--------------------------------

Integrated in Hadoop-trunk #641 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/641/])
    . Improve the hashing functions and deserialization of the 
mapred ID classes. (omalley)


> The mapred.*ID classes are inefficient for hashCode and serialization
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-4276
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4276
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.20.0
>
>         Attachments: h4276.patch, h4276.patch
>
>
> Currently the ID classes call toString and hash the resulting string rather than computing a hash directly.
> The ID classes also create new instances of the higher level object in readFields (via read) rather than re-using the object via readFields.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4276) The mapred.*ID classes are inefficient for hashCode and serialization

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HADOOP-4276:
----------------------------------

    Assignee: Owen O'Malley
      Status: Patch Available  (was: Open)

> The mapred.*ID classes are inefficient for hashCode and serialization
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-4276
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4276
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.20.0
>
>         Attachments: h4276.patch
>
>
> Currently the ID classes call toString and hash the resulting string rather than computing a hash directly.
> The ID classes also create new instances of the higher level object in readFields (via read) rather than re-using the object via readFields.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4276) The mapred.*ID classes are inefficient for hashCode and serialization

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HADOOP-4276:
----------------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]
          Status: Resolved  (was: Patch Available)

I just committed this.

> The mapred.*ID classes are inefficient for hashCode and serialization
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-4276
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4276
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.20.0
>
>         Attachments: h4276.patch, h4276.patch
>
>
> Currently the ID classes call toString and hash the resulting string rather than computing a hash directly.
> The ID classes also create new instances of the higher level object in readFields (via read) rather than re-using the object via readFields.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4276) The mapred.*ID classes are inefficient for hashCode and serialization

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HADOOP-4276:
----------------------------------

    Attachment: h4276.patch

Addressed Chris' comments, except that I left the redundant else, which makes the return paths symmetric. I also used appendTo instead of appendIdTo, which seems more awkward.

> The mapred.*ID classes are inefficient for hashCode and serialization
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-4276
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4276
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.20.0
>
>         Attachments: h4276.patch, h4276.patch
>
>
> Currently the ID classes call toString and hash the resulting string rather than computing a hash directly.
> The ID classes also create new instances of the higher level object in readFields (via read) rather than re-using the object via readFields.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-4276) The mapred.*ID classes are inefficient for hashCode and serialization

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-4276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HADOOP-4276:
----------------------------------

    Attachment: h4276.patch

This patch:
  1. Removes the string generation during hashing of id objects.
  2. Reuses the id objects during readFields.
  3. Defines a protected field for SEPARATOR and removes UNDERLINE.
  4. Replace the toStringWOPrefix methods with addId that will reuse the same StringBuilder, which is more efficient.
  5. Store the jtIdentifier as Text so that it doesn't need to be encoded for sending across RPC.

> The mapred.*ID classes are inefficient for hashCode and serialization
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-4276
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4276
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: Owen O'Malley
>             Fix For: 0.20.0
>
>         Attachments: h4276.patch
>
>
> Currently the ID classes call toString and hash the resulting string rather than computing a hash directly.
> The ID classes also create new instances of the higher level object in readFields (via read) rather than re-using the object via readFields.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4276) The mapred.*ID classes are inefficient for hashCode and serialization

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641240#action_12641240 ] 

Hadoop QA commented on HADOOP-4276:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12392453/h4276.patch
  against trunk revision 706417.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3501/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3501/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3501/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3501/console

This message is automatically generated.

> The mapred.*ID classes are inefficient for hashCode and serialization
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-4276
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4276
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.20.0
>
>         Attachments: h4276.patch
>
>
> Currently the ID classes call toString and hash the resulting string rather than computing a hash directly.
> The ID classes also create new instances of the higher level object in readFields (via read) rather than re-using the object via readFields.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4276) The mapred.*ID classes are inefficient for hashCode and serialization

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-4276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641956#action_12641956 ] 

Chris Douglas commented on HADOOP-4276:
---------------------------------------

This looks good. Just a few suggestions/nits:
* In JobID:
{noformat}
-      .append(jtIdentifier != null ? jtIdentifier : "[^_]*").append(UNDERSCORE)
+      .append(jtIdentifier != null ? jtIdentifier : "[^_]*").append(SEPARATOR)
{noformat}
the regexp "[^_]" should probably use the SEPARATOR constant
* Where this replaces calls to ID factories with instances created in the cstr (JobProfile, TaskReport, TaskStatus, TaskCompletionEvent, TaskAttemptID, Task, KillTaskAction, KillJobAction, JobStatus) it might make sense to make the instance final
* In TaskID:
{noformat}
-      else return this.isMap ? -1 : 1;
+      else {
+        return this.isMap ? -1 : 1;
+      }
{noformat}
The else is redundant
* {{addId}} reads like a mutator. Would {{addIdTo}} or {{appendIdTo}} make more sense?

> The mapred.*ID classes are inefficient for hashCode and serialization
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-4276
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4276
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.20.0
>
>         Attachments: h4276.patch
>
>
> Currently the ID classes call toString and hash the resulting string rather than computing a hash directly.
> The ID classes also create new instances of the higher level object in readFields (via read) rather than re-using the object via readFields.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.