You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2011/03/11 19:23:59 UTC

[jira] Created: (HADOOP-7183) WritableComparator.get should not cache comparator objects

WritableComparator.get should not cache comparator objects
----------------------------------------------------------

                 Key: HADOOP-7183
                 URL: https://issues.apache.org/jira/browse/HADOOP-7183
             Project: Hadoop Common
          Issue Type: Bug
    Affects Versions: 0.22.0
            Reporter: Todd Lipcon
            Priority: Blocker
             Fix For: 0.22.0


HADOOP-6881 modified WritableComparator.get such that the constructed WritableComparator gets saved back into the static map. This is fine for stateless comparators, but some comparators have per-instance state, and thus this becomes thread-unsafe and causes errors in the shuffle where multiple threads are doing comparisons. An example of a Comparator with per-instance state is WritableComparator itself.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-7183) WritableComparator.get should not cache comparator objects

Posted by "Tom White (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007180#comment-13007180 ] 

Tom White commented on HADOOP-7183:
-----------------------------------

Before HADOOP-6881 the only way to add a WritableComparator to the cache was by explicitly calling WritableComparator.define(). HADOOP-6881 changed WritableComparator.get() to add a generic WritableComparator to the comparators cache when there was none registered for the class. This patch simply restores the previous behaviour.

Making WritableComparator thread-safe by using a thread-local buffer would be a good enhancement but isn't necessary for this fix.

> WritableComparator.get should not cache comparator objects
> ----------------------------------------------------------
>
>                 Key: HADOOP-7183
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7183
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Tom White
>            Priority: Blocker
>             Fix For: 0.20.3, 0.21.1, 0.22.0
>
>         Attachments: HADOOP-7183.patch
>
>
> HADOOP-6881 modified WritableComparator.get such that the constructed WritableComparator gets saved back into the static map. This is fine for stateless comparators, but some comparators have per-instance state, and thus this becomes thread-unsafe and causes errors in the shuffle where multiple threads are doing comparisons. An example of a Comparator with per-instance state is WritableComparator itself.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HADOOP-7183) WritableComparator.get should not cache comparator objects

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated HADOOP-7183:
---------------------------------

    Fix Version/s: 0.21.1
                   0.20.3

The fix should probably also be merged to the 0.20 and 0.21 branches.

> WritableComparator.get should not cache comparator objects
> ----------------------------------------------------------
>
>                 Key: HADOOP-7183
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7183
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Priority: Blocker
>             Fix For: 0.20.3, 0.21.1, 0.22.0
>
>
> HADOOP-6881 modified WritableComparator.get such that the constructed WritableComparator gets saved back into the static map. This is fine for stateless comparators, but some comparators have per-instance state, and thus this becomes thread-unsafe and causes errors in the shuffle where multiple threads are doing comparisons. An example of a Comparator with per-instance state is WritableComparator itself.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-7183) WritableComparator.get should not cache comparator objects

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029891#comment-13029891 ] 

Hudson commented on HADOOP-7183:
--------------------------------

Integrated in Hadoop-Common-trunk #680 (See [https://builds.apache.org/hudson/job/Hadoop-Common-trunk/680/])
    HADOOP-7183. WritableComparator.get should not cache comparator objects. Contributed by Tom White


> WritableComparator.get should not cache comparator objects
> ----------------------------------------------------------
>
>                 Key: HADOOP-7183
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7183
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Tom White
>            Priority: Blocker
>             Fix For: 0.21.1, 0.22.0
>
>         Attachments: HADOOP-7183.patch, HADOOP-7183.patch
>
>
> HADOOP-6881 modified WritableComparator.get such that the constructed WritableComparator gets saved back into the static map. This is fine for stateless comparators, but some comparators have per-instance state, and thus this becomes thread-unsafe and causes errors in the shuffle where multiple threads are doing comparisons. An example of a Comparator with per-instance state is WritableComparator itself.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-7183) WritableComparator.get should not cache comparator objects

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029867#comment-13029867 ] 

Hudson commented on HADOOP-7183:
--------------------------------

Integrated in Hadoop-Common-22-branch #44 (See [https://builds.apache.org/hudson/job/Hadoop-Common-22-branch/44/])
    HADOOP-7183. svn merge -c 1100056 from trunk


> WritableComparator.get should not cache comparator objects
> ----------------------------------------------------------
>
>                 Key: HADOOP-7183
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7183
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Tom White
>            Priority: Blocker
>             Fix For: 0.21.1, 0.22.0
>
>         Attachments: HADOOP-7183.patch, HADOOP-7183.patch
>
>
> HADOOP-6881 modified WritableComparator.get such that the constructed WritableComparator gets saved back into the static map. This is fine for stateless comparators, but some comparators have per-instance state, and thus this becomes thread-unsafe and causes errors in the shuffle where multiple threads are doing comparisons. An example of a Comparator with per-instance state is WritableComparator itself.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-7183) WritableComparator.get should not cache comparator objects

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007228#comment-13007228 ] 

Owen O'Malley commented on HADOOP-7183:
---------------------------------------

{quote}
6881 changed it to add comparators.put there.
{quote}

Ah, right. Being too efficient. *grin*

{quote}
Making WritableComparator thread-safe by using a thread-local buffer would be a good enhancement but isn't necessary for this fix.
{quote}

This fix is brittle precisely because the expectation is that comparators are stateless. It looks like the shuffle doesn't reuse comparators between threads, but it is much much easier to make them thread-safe than ensure that no one puts in such a use case.

> WritableComparator.get should not cache comparator objects
> ----------------------------------------------------------
>
>                 Key: HADOOP-7183
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7183
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Tom White
>            Priority: Blocker
>             Fix For: 0.20.3, 0.21.1, 0.22.0
>
>         Attachments: HADOOP-7183.patch
>
>
> HADOOP-6881 modified WritableComparator.get such that the constructed WritableComparator gets saved back into the static map. This is fine for stateless comparators, but some comparators have per-instance state, and thus this becomes thread-unsafe and causes errors in the shuffle where multiple threads are doing comparisons. An example of a Comparator with per-instance state is WritableComparator itself.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-7183) WritableComparator.get should not cache comparator objects

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13026063#comment-13026063 ] 

Todd Lipcon commented on HADOOP-7183:
-------------------------------------

+1

> WritableComparator.get should not cache comparator objects
> ----------------------------------------------------------
>
>                 Key: HADOOP-7183
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7183
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Tom White
>            Priority: Blocker
>             Fix For: 0.20.3, 0.21.1, 0.22.0
>
>         Attachments: HADOOP-7183.patch, HADOOP-7183.patch
>
>
> HADOOP-6881 modified WritableComparator.get such that the constructed WritableComparator gets saved back into the static map. This is fine for stateless comparators, but some comparators have per-instance state, and thus this becomes thread-unsafe and causes errors in the shuffle where multiple threads are doing comparisons. An example of a Comparator with per-instance state is WritableComparator itself.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-7183) WritableComparator.get should not cache comparator objects

Posted by "Nigel Daley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13017995#comment-13017995 ] 

Nigel Daley commented on HADOOP-7183:
-------------------------------------

Tom, any update on this for 0.22?

> WritableComparator.get should not cache comparator objects
> ----------------------------------------------------------
>
>                 Key: HADOOP-7183
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7183
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Tom White
>            Priority: Blocker
>             Fix For: 0.20.3, 0.21.1, 0.22.0
>
>         Attachments: HADOOP-7183.patch
>
>
> HADOOP-6881 modified WritableComparator.get such that the constructed WritableComparator gets saved back into the static map. This is fine for stateless comparators, but some comparators have per-instance state, and thus this becomes thread-unsafe and causes errors in the shuffle where multiple threads are doing comparisons. An example of a Comparator with per-instance state is WritableComparator itself.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-7183) WritableComparator.get should not cache comparator objects

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13025064#comment-13025064 ] 

Hadoop QA commented on HADOOP-7183:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12477363/HADOOP-7183.patch
  against trunk revision 1096522.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    -1 javadoc.  The javadoc tool appears to have generated 1 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

    +1 system test framework.  The patch passed system test framework compile.

Test results: https://builds.apache.org/hudson/job/PreCommit-HADOOP-Build/376//testReport/
Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-HADOOP-Build/376//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/hudson/job/PreCommit-HADOOP-Build/376//console

This message is automatically generated.

> WritableComparator.get should not cache comparator objects
> ----------------------------------------------------------
>
>                 Key: HADOOP-7183
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7183
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Tom White
>            Priority: Blocker
>             Fix For: 0.20.3, 0.21.1, 0.22.0
>
>         Attachments: HADOOP-7183.patch, HADOOP-7183.patch
>
>
> HADOOP-6881 modified WritableComparator.get such that the constructed WritableComparator gets saved back into the static map. This is fine for stateless comparators, but some comparators have per-instance state, and thus this becomes thread-unsafe and causes errors in the shuffle where multiple threads are doing comparisons. An example of a Comparator with per-instance state is WritableComparator itself.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-7183) WritableComparator.get should not cache comparator objects

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13025062#comment-13025062 ] 

Hadoop QA commented on HADOOP-7183:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12477363/HADOOP-7183.patch
  against trunk revision 1096522.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

    +1 system test framework.  The patch passed system test framework compile.

Test results: https://builds.apache.org/hudson/job/PreCommit-HADOOP-Build/377//testReport/
Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-HADOOP-Build/377//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/hudson/job/PreCommit-HADOOP-Build/377//console

This message is automatically generated.

> WritableComparator.get should not cache comparator objects
> ----------------------------------------------------------
>
>                 Key: HADOOP-7183
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7183
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Tom White
>            Priority: Blocker
>             Fix For: 0.20.3, 0.21.1, 0.22.0
>
>         Attachments: HADOOP-7183.patch, HADOOP-7183.patch
>
>
> HADOOP-6881 modified WritableComparator.get such that the constructed WritableComparator gets saved back into the static map. This is fine for stateless comparators, but some comparators have per-instance state, and thus this becomes thread-unsafe and causes errors in the shuffle where multiple threads are doing comparisons. An example of a Comparator with per-instance state is WritableComparator itself.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-7183) WritableComparator.get should not cache comparator objects

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005802#comment-13005802 ] 

Doug Cutting commented on HADOOP-7183:
--------------------------------------

+1 patch looks good to me too.

> WritableComparator.get should not cache comparator objects
> ----------------------------------------------------------
>
>                 Key: HADOOP-7183
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7183
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Tom White
>            Priority: Blocker
>             Fix For: 0.20.3, 0.21.1, 0.22.0
>
>         Attachments: HADOOP-7183.patch
>
>
> HADOOP-6881 modified WritableComparator.get such that the constructed WritableComparator gets saved back into the static map. This is fine for stateless comparators, but some comparators have per-instance state, and thus this becomes thread-unsafe and causes errors in the shuffle where multiple threads are doing comparisons. An example of a Comparator with per-instance state is WritableComparator itself.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-7183) WritableComparator.get should not cache comparator objects

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029757#comment-13029757 ] 

Hudson commented on HADOOP-7183:
--------------------------------

Integrated in Hadoop-Common-trunk-Commit #575 (See [https://builds.apache.org/hudson/job/Hadoop-Common-trunk-Commit/575/])
    HADOOP-7183. WritableComparator.get should not cache comparator objects. Contributed by Tom White


> WritableComparator.get should not cache comparator objects
> ----------------------------------------------------------
>
>                 Key: HADOOP-7183
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7183
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Tom White
>            Priority: Blocker
>             Fix For: 0.21.1, 0.22.0
>
>         Attachments: HADOOP-7183.patch, HADOOP-7183.patch
>
>
> HADOOP-6881 modified WritableComparator.get such that the constructed WritableComparator gets saved back into the static map. This is fine for stateless comparators, but some comparators have per-instance state, and thus this becomes thread-unsafe and causes errors in the shuffle where multiple threads are doing comparisons. An example of a Comparator with per-instance state is WritableComparator itself.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Assigned: (HADOOP-7183) WritableComparator.get should not cache comparator objects

Posted by "Tom White (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White reassigned HADOOP-7183:
---------------------------------

    Assignee: Tom White

> WritableComparator.get should not cache comparator objects
> ----------------------------------------------------------
>
>                 Key: HADOOP-7183
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7183
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Tom White
>            Priority: Blocker
>             Fix For: 0.20.3, 0.21.1, 0.22.0
>
>         Attachments: HADOOP-7183.patch
>
>
> HADOOP-6881 modified WritableComparator.get such that the constructed WritableComparator gets saved back into the static map. This is fine for stateless comparators, but some comparators have per-instance state, and thus this becomes thread-unsafe and causes errors in the shuffle where multiple threads are doing comparisons. An example of a Comparator with per-instance state is WritableComparator itself.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-7183) WritableComparator.get should not cache comparator objects

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005797#comment-13005797 ] 

Todd Lipcon commented on HADOOP-7183:
-------------------------------------

+1 pending test results

> WritableComparator.get should not cache comparator objects
> ----------------------------------------------------------
>
>                 Key: HADOOP-7183
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7183
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Tom White
>            Priority: Blocker
>             Fix For: 0.20.3, 0.21.1, 0.22.0
>
>         Attachments: HADOOP-7183.patch
>
>
> HADOOP-6881 modified WritableComparator.get such that the constructed WritableComparator gets saved back into the static map. This is fine for stateless comparators, but some comparators have per-instance state, and thus this becomes thread-unsafe and causes errors in the shuffle where multiple threads are doing comparisons. An example of a Comparator with per-instance state is WritableComparator itself.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-7183) WritableComparator.get should not cache comparator objects

Posted by "Tom White (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White updated HADOOP-7183:
------------------------------

    Attachment: HADOOP-7183.patch

Removed test.

> WritableComparator.get should not cache comparator objects
> ----------------------------------------------------------
>
>                 Key: HADOOP-7183
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7183
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Tom White
>            Priority: Blocker
>             Fix For: 0.20.3, 0.21.1, 0.22.0
>
>         Attachments: HADOOP-7183.patch, HADOOP-7183.patch
>
>
> HADOOP-6881 modified WritableComparator.get such that the constructed WritableComparator gets saved back into the static map. This is fine for stateless comparators, but some comparators have per-instance state, and thus this becomes thread-unsafe and causes errors in the shuffle where multiple threads are doing comparisons. An example of a Comparator with per-instance state is WritableComparator itself.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-7183) WritableComparator.get should not cache comparator objects

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021483#comment-13021483 ] 

Hadoop QA commented on HADOOP-7183:
-----------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12473422/HADOOP-7183.patch
  against trunk revision 1094750.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

    +1 system test framework.  The patch passed system test framework compile.

Test results: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/359//testReport/
Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/359//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/359//console

This message is automatically generated.

> WritableComparator.get should not cache comparator objects
> ----------------------------------------------------------
>
>                 Key: HADOOP-7183
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7183
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Tom White
>            Priority: Blocker
>             Fix For: 0.20.3, 0.21.1, 0.22.0
>
>         Attachments: HADOOP-7183.patch
>
>
> HADOOP-6881 modified WritableComparator.get such that the constructed WritableComparator gets saved back into the static map. This is fine for stateless comparators, but some comparators have per-instance state, and thus this becomes thread-unsafe and causes errors in the shuffle where multiple threads are doing comparisons. An example of a Comparator with per-instance state is WritableComparator itself.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-7183) WritableComparator.get should not cache comparator objects

Posted by "Eli Collins (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eli Collins updated HADOOP-7183:
--------------------------------

       Resolution: Fixed
    Fix Version/s:     (was: 0.20.3)
     Hadoop Flags: [Reviewed]
           Status: Resolved  (was: Patch Available)

I've committed this to trunk, branch 22, and branch 21. Thanks Tom!

> WritableComparator.get should not cache comparator objects
> ----------------------------------------------------------
>
>                 Key: HADOOP-7183
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7183
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Tom White
>            Priority: Blocker
>             Fix For: 0.21.1, 0.22.0
>
>         Attachments: HADOOP-7183.patch, HADOOP-7183.patch
>
>
> HADOOP-6881 modified WritableComparator.get such that the constructed WritableComparator gets saved back into the static map. This is fine for stateless comparators, but some comparators have per-instance state, and thus this becomes thread-unsafe and causes errors in the shuffle where multiple threads are doing comparisons. An example of a Comparator with per-instance state is WritableComparator itself.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-7183) WritableComparator.get should not cache comparator objects

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005803#comment-13005803 ] 

Hadoop QA commented on HADOOP-7183:
-----------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12473422/HADOOP-7183.patch
  against trunk revision 1080723.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

    +1 system test framework.  The patch passed system test framework compile.

Test results: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/308//testReport/
Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/308//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://hudson.apache.org/hudson/job/PreCommit-HADOOP-Build/308//console

This message is automatically generated.

> WritableComparator.get should not cache comparator objects
> ----------------------------------------------------------
>
>                 Key: HADOOP-7183
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7183
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Tom White
>            Priority: Blocker
>             Fix For: 0.20.3, 0.21.1, 0.22.0
>
>         Attachments: HADOOP-7183.patch
>
>
> HADOOP-6881 modified WritableComparator.get such that the constructed WritableComparator gets saved back into the static map. This is fine for stateless comparators, but some comparators have per-instance state, and thus this becomes thread-unsafe and causes errors in the shuffle where multiple threads are doing comparisons. An example of a Comparator with per-instance state is WritableComparator itself.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HADOOP-7183) WritableComparator.get should not cache comparator objects

Posted by "Tom White (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White updated HADOOP-7183:
------------------------------

    Attachment: HADOOP-7183.patch

The problem is that WritableComparator has a mutable field - DataInputBuffer buffer - which is only used by WritableComparable implementations that *don't* override the optimized binary compare method. IntWritable, Text, etc override this method, so there is no thread safety issue for these.

The remedy is to only register comparators explicitly, i.e. not the "generic" ones, since they may not be thread-safe. This is actually the behaviour that was in place before HADOOP-6881.

I've also updated the javadoc for WritableComparator.define to clarify that it should only be called for thread-safe classes.


> WritableComparator.get should not cache comparator objects
> ----------------------------------------------------------
>
>                 Key: HADOOP-7183
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7183
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Priority: Blocker
>             Fix For: 0.20.3, 0.21.1, 0.22.0
>
>         Attachments: HADOOP-7183.patch
>
>
> HADOOP-6881 modified WritableComparator.get such that the constructed WritableComparator gets saved back into the static map. This is fine for stateless comparators, but some comparators have per-instance state, and thus this becomes thread-unsafe and causes errors in the shuffle where multiple threads are doing comparisons. An example of a Comparator with per-instance state is WritableComparator itself.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-7183) WritableComparator.get should not cache comparator objects

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007138#comment-13007138 ] 

Owen O'Malley commented on HADOOP-7183:
---------------------------------------

This behavior wasn't changed by HADOOP-6881, so it has been there for a long time.

> WritableComparator.get should not cache comparator objects
> ----------------------------------------------------------
>
>                 Key: HADOOP-7183
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7183
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Tom White
>            Priority: Blocker
>             Fix For: 0.20.3, 0.21.1, 0.22.0
>
>         Attachments: HADOOP-7183.patch
>
>
> HADOOP-6881 modified WritableComparator.get such that the constructed WritableComparator gets saved back into the static map. This is fine for stateless comparators, but some comparators have per-instance state, and thus this becomes thread-unsafe and causes errors in the shuffle where multiple threads are doing comparisons. An example of a Comparator with per-instance state is WritableComparator itself.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HADOOP-7183) WritableComparator.get should not cache comparator objects

Posted by "Tom White (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White updated HADOOP-7183:
------------------------------

    Status: Patch Available  (was: Open)

> WritableComparator.get should not cache comparator objects
> ----------------------------------------------------------
>
>                 Key: HADOOP-7183
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7183
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Tom White
>            Priority: Blocker
>             Fix For: 0.20.3, 0.21.1, 0.22.0
>
>         Attachments: HADOOP-7183.patch
>
>
> HADOOP-6881 modified WritableComparator.get such that the constructed WritableComparator gets saved back into the static map. This is fine for stateless comparators, but some comparators have per-instance state, and thus this becomes thread-unsafe and causes errors in the shuffle where multiple threads are doing comparisons. An example of a Comparator with per-instance state is WritableComparator itself.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-7183) WritableComparator.get should not cache comparator objects

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007153#comment-13007153 ] 

Owen O'Malley commented on HADOOP-7183:
---------------------------------------

I'm not very happy with this fix. The implicit contract is that all comparators in the cache must be thread safe. To special case the WritableComparator case because it isn't thread safe seems like the wrong direction. It seems far less brittle to use a thread local buffer for the WritableComparator.

Have you actually verified that the shuffle's sorts actually do a get per a thread?


> WritableComparator.get should not cache comparator objects
> ----------------------------------------------------------
>
>                 Key: HADOOP-7183
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7183
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Tom White
>            Priority: Blocker
>             Fix For: 0.20.3, 0.21.1, 0.22.0
>
>         Attachments: HADOOP-7183.patch
>
>
> HADOOP-6881 modified WritableComparator.get such that the constructed WritableComparator gets saved back into the static map. This is fine for stateless comparators, but some comparators have per-instance state, and thus this becomes thread-unsafe and causes errors in the shuffle where multiple threads are doing comparisons. An example of a Comparator with per-instance state is WritableComparator itself.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-7183) WritableComparator.get should not cache comparator objects

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021625#comment-13021625 ] 

Owen O'Malley commented on HADOOP-7183:
---------------------------------------

The test needs to be removed from the fix, since it is only testing the implementation detail of this workaround fix.

Tests need to reflect the correct semantics of the code, not implementation details that will be removed when someone has time to do the right fix.

> WritableComparator.get should not cache comparator objects
> ----------------------------------------------------------
>
>                 Key: HADOOP-7183
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7183
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Tom White
>            Priority: Blocker
>             Fix For: 0.20.3, 0.21.1, 0.22.0
>
>         Attachments: HADOOP-7183.patch
>
>
> HADOOP-6881 modified WritableComparator.get such that the constructed WritableComparator gets saved back into the static map. This is fine for stateless comparators, but some comparators have per-instance state, and thus this becomes thread-unsafe and causes errors in the shuffle where multiple threads are doing comparisons. An example of a Comparator with per-instance state is WritableComparator itself.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-7183) WritableComparator.get should not cache comparator objects

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007172#comment-13007172 ] 

Todd Lipcon commented on HADOOP-7183:
-------------------------------------

bq. This behavior wasn't changed by HADOOP-6881

It looks to me like it was. Prior to 6881, if nothing was in the cache, it did not "write back" to the cache in WritableComparator.get(). 6881 changed it to add {{comparators.put}} there.

bq. Have you actually verified that the shuffle's sorts actually do a get per a thread?

Not in the latest shuffle code, but it looked that way to me previously.


Using ThreadLocals seems like a reasonable alternate implementation. But we should still make the contract explicit that cached comparators must be threadsafe (yes, it's fairly obvious, but we seem to have missed it ourselves!)

> WritableComparator.get should not cache comparator objects
> ----------------------------------------------------------
>
>                 Key: HADOOP-7183
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7183
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Tom White
>            Priority: Blocker
>             Fix For: 0.20.3, 0.21.1, 0.22.0
>
>         Attachments: HADOOP-7183.patch
>
>
> HADOOP-6881 modified WritableComparator.get such that the constructed WritableComparator gets saved back into the static map. This is fine for stateless comparators, but some comparators have per-instance state, and thus this becomes thread-unsafe and causes errors in the shuffle where multiple threads are doing comparisons. An example of a Comparator with per-instance state is WritableComparator itself.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira