You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Jeffrey Naisbitt (JIRA)" <ji...@apache.org> on 2011/05/12 21:33:47 UTC

[jira] [Created] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

Jobsplits with random hostnames can make the queue unusable
-----------------------------------------------------------

                 Key: MAPREDUCE-2489
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: jobtracker
            Reporter: Jeffrey Naisbitt
            Assignee: Jeffrey Naisbitt


We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names.  This caused a major slowdown for the JobTracker.  We should prevent invalid InputSplit hostnames from affecting everyone else.

I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise).  We could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

Posted by "Jeffrey Naisbitt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068438#comment-13068438 ] 

Jeffrey Naisbitt commented on MAPREDUCE-2489:
---------------------------------------------

Not sure why Hudson didn't run, but here are the latest test-patch results for both 0.20s and trunk:
0.20s:
     [exec] +1 overall.  
     [exec] 
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec] 
     [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
     [exec] 
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec] 
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec] 
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.


trunk:

     [exec] -1 overall.  
     [exec] 
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec] 
     [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
     [exec] 
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec] 
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec] 
     [exec]     -1 findbugs.  The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.
     [exec] 
     [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
     [exec] 
     [exec]     +1 system test framework.  The patch passed system test framework compile.
     [exec] 

The extra Findbugs warning seems to be unrelated to this patch.

> Jobsplits with random hostnames can make the queue unusable
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Jeffrey Naisbitt
>            Assignee: Jeffrey Naisbitt
>             Fix For: 0.20.205.0, 0.23.0
>
>         Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s-v3.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred-v4.patch, MAPREDUCE-2489-mapred.patch
>
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names.  This caused a major slowdown for the JobTracker.  We should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise).  We could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13073838#comment-13073838 ] 

Mahadev konar commented on MAPREDUCE-2489:
------------------------------------------

Jeffrey,
 One minor nit, 

 The method:

{code}

  static void verifyHostnames(String[] names) throws UnknownHostException {
{code}

does not seem appropriate for JobInProgress class. It needs to be moved out to some helper class. NetUtils seems more appropriate for this helper method. What do you think?


> Jobsplits with random hostnames can make the queue unusable
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Jeffrey Naisbitt
>            Assignee: Jeffrey Naisbitt
>             Fix For: 0.20.205.0, 0.23.0
>
>         Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s-v3.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred-v4.patch, MAPREDUCE-2489-mapred.patch
>
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names.  This caused a major slowdown for the JobTracker.  We should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise).  We could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

Posted by "Jeffrey Naisbitt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeffrey Naisbitt updated MAPREDUCE-2489:
----------------------------------------

    Attachment: MAPREDUCE-2489-mapred-v4.patch
                MAPREDUCE-2489-0.20s-v3.patch

Updated patch for 0.20.205 - removing the portion of the code corresponding to HADOOP-7314 from this patch and placing it in its own patch on that jira.



> Jobsplits with random hostnames can make the queue unusable
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Jeffrey Naisbitt
>            Assignee: Jeffrey Naisbitt
>         Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s-v3.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred-v4.patch, MAPREDUCE-2489-mapred.patch
>
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names.  This caused a major slowdown for the JobTracker.  We should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise).  We could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

Posted by "Jeffrey Naisbitt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeffrey Naisbitt updated MAPREDUCE-2489:
----------------------------------------

    Attachment: MAPREDUCE-2489-mapred-v2.patch

Address the comments Robert Evans made in HADOOP-7314, making urlValidator a static member variable.

> Jobsplits with random hostnames can make the queue unusable
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>            Reporter: Jeffrey Naisbitt
>            Assignee: Jeffrey Naisbitt
>         Attachments: MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred.patch
>
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names.  This caused a major slowdown for the JobTracker.  We should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise).  We could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

Posted by "Jeffrey Naisbitt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070456#comment-13070456 ] 

Jeffrey Naisbitt commented on MAPREDUCE-2489:
---------------------------------------------

Mahadev,
There are two parts to this patch.  The most important is the use of the new "resolveValidHosts" function and throwing the UnknownHostException up the chain.  That part ensures that the hostnames are actually valid and is needed to resolve the problem this Jira was filed for. 

The second part (that you describe above) is meant to only be an early sanity check.  All it is trying to do is ensure that the hostnames "appear" to be valid.  With this, I'm just trying to check the hostname string itself to see that it contains valid characters and such so we can fail early on if the hostname string contains garbage (like in the original case this Jira was filed for).  The URI object seemed the easiest and most standard way I could find to do this.  The URI object creation will fail, or the uri.getHost() method will return null if the string passed in doesn't match their regular expressions for what a URI should look like (which includes checking the hostname portion).  The URI object requires a schema (http://, ftp://, etc.), so I add one on if it doesn't already have one in the string.  Again, this is only really meant to catch the garbage string cases, and it's not meant to be a perfect check that will catch all hostname problems (those are caught by the first part of the patch mentioned above).

Does that make sense?  If you have other questions or suggestions, please let me know :) 
Thanks for looking at it!

> Jobsplits with random hostnames can make the queue unusable
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Jeffrey Naisbitt
>            Assignee: Jeffrey Naisbitt
>             Fix For: 0.20.205.0, 0.23.0
>
>         Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s-v3.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred-v4.patch, MAPREDUCE-2489-mapred.patch
>
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names.  This caused a major slowdown for the JobTracker.  We should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise).  We could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

Posted by "Jeffrey Naisbitt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeffrey Naisbitt updated MAPREDUCE-2489:
----------------------------------------

    Status: Open  (was: Patch Available)

So, we discovered that the interface change is incompatible.  So, we are removing the incompatible change and the corresponding portion of code relying on that change and implementing the sanity check only.

> Jobsplits with random hostnames can make the queue unusable
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Jeffrey Naisbitt
>            Assignee: Jeffrey Naisbitt
>             Fix For: 0.20.205.0, 0.23.0
>
>         Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s-v3.patch, MAPREDUCE-2489-0.20s-v4.patch, MAPREDUCE-2489-0.20s-v5.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred-v4.patch, MAPREDUCE-2489-mapred-v5.patch, MAPREDUCE-2489-mapred-v6.patch, MAPREDUCE-2489-mapred.patch
>
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names.  This caused a major slowdown for the JobTracker.  We should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise).  We could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13065433#comment-13065433 ] 

Hadoop QA commented on MAPREDUCE-2489:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12486481/MAPREDUCE-2489-mapred-v4.patch
  against trunk revision 1146517.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    -1 javac.  The patch appears to cause tar ant target to fail.

    -1 findbugs.  The patch appears to cause Findbugs (version 1.3.9) to fail.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed these core unit tests:


    -1 contrib tests.  The patch failed contrib unit tests.

    -1 system test framework.  The patch failed system test framework compile.

Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/467//testReport/
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/467//console

This message is automatically generated.

> Jobsplits with random hostnames can make the queue unusable
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Jeffrey Naisbitt
>            Assignee: Jeffrey Naisbitt
>         Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s-v3.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred-v4.patch, MAPREDUCE-2489-mapred.patch
>
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names.  This caused a major slowdown for the JobTracker.  We should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise).  We could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

Posted by "Jeffrey Naisbitt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033116#comment-13033116 ] 

Jeffrey Naisbitt commented on MAPREDUCE-2489:
---------------------------------------------

Honestly, I'm not sure what caching was enabled at the time.  How would caching have helped in this case though - where we have basically tons of lookups on garbage hostnames?  (none of these strings are repeated)

> Jobsplits with random hostnames can make the queue unusable
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>            Reporter: Jeffrey Naisbitt
>            Assignee: Jeffrey Naisbitt
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names.  This caused a major slowdown for the JobTracker.  We should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise).  We could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

Posted by "Jeffrey Naisbitt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081917#comment-13081917 ] 

Jeffrey Naisbitt commented on MAPREDUCE-2489:
---------------------------------------------

The only test failures on trunk are unrelated to this patch and fail with a clean checkout of trunk as well.

trunk test-patch results (since this patch will fail until HADOOP-7499 has been committed):
     [exec] -1 overall.  
     [exec] 
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec] 
     [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
     [exec] 
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec] 
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec] 
     [exec]     -1 findbugs.  The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.
     [exec] 
     [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
     [exec] 
     [exec]     -1 system test framework.  The patch failed system test framework compile.


The extra findbugs warning is unrelated to this bug (Incorrect lazy initialization of static field org.apache.hadoop.mapred.TaskLog.localFS in org.apache.hadoop.mapred.TaskLog.writeToIndexFile(String, boolean))

The system test framework is because HADOOP-7499 has not been committed yet.

> Jobsplits with random hostnames can make the queue unusable
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Jeffrey Naisbitt
>            Assignee: Jeffrey Naisbitt
>             Fix For: 0.20.205.0, 0.23.0
>
>         Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s-v3.patch, MAPREDUCE-2489-0.20s-v4.patch, MAPREDUCE-2489-0.20s-v5.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred-v4.patch, MAPREDUCE-2489-mapred-v5.patch, MAPREDUCE-2489-mapred-v6.patch, MAPREDUCE-2489-mapred.patch
>
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names.  This caused a major slowdown for the JobTracker.  We should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise).  We could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

Posted by "Jeffrey Naisbitt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeffrey Naisbitt updated MAPREDUCE-2489:
----------------------------------------

    Attachment: MAPREDUCE-2489-mapred-v5.patch

Patch for trunk addressing Mahadev's comments.  Note: this should not be committed until HADOOP-7499 has been committed to trunk.

> Jobsplits with random hostnames can make the queue unusable
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Jeffrey Naisbitt
>            Assignee: Jeffrey Naisbitt
>             Fix For: 0.20.205.0, 0.23.0
>
>         Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s-v3.patch, MAPREDUCE-2489-0.20s-v4.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred-v4.patch, MAPREDUCE-2489-mapred-v5.patch, MAPREDUCE-2489-mapred.patch
>
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names.  This caused a major slowdown for the JobTracker.  We should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise).  We could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13054000#comment-13054000 ] 

Hadoop QA commented on MAPREDUCE-2489:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12483623/MAPREDUCE-2489-mapred-v3.patch
  against trunk revision 1138301.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    -1 javac.  The patch appears to cause tar ant target to fail.

    -1 findbugs.  The patch appears to cause Findbugs (version 1.3.9) to fail.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed these core unit tests:


    -1 contrib tests.  The patch failed contrib unit tests.

    -1 system test framework.  The patch failed system test framework compile.

Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/413//testReport/
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/413//console

This message is automatically generated.

> Jobsplits with random hostnames can make the queue unusable
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Jeffrey Naisbitt
>            Assignee: Jeffrey Naisbitt
>         Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred.patch
>
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names.  This caused a major slowdown for the JobTracker.  We should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise).  We could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

Posted by "Jeffrey Naisbitt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeffrey Naisbitt updated MAPREDUCE-2489:
----------------------------------------

    Status: Patch Available  (was: Open)

> Jobsplits with random hostnames can make the queue unusable
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Jeffrey Naisbitt
>            Assignee: Jeffrey Naisbitt
>             Fix For: 0.20.205.0, 0.23.0
>
>         Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s-v3.patch, MAPREDUCE-2489-0.20s-v4.patch, MAPREDUCE-2489-0.20s-v5.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred-v4.patch, MAPREDUCE-2489-mapred-v5.patch, MAPREDUCE-2489-mapred.patch
>
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names.  This caused a major slowdown for the JobTracker.  We should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise).  We could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

Posted by "Jeffrey Naisbitt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeffrey Naisbitt updated MAPREDUCE-2489:
----------------------------------------

    Attachment: MAPREDUCE-2489-mapred.patch

> Jobsplits with random hostnames can make the queue unusable
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>            Reporter: Jeffrey Naisbitt
>            Assignee: Jeffrey Naisbitt
>         Attachments: MAPREDUCE-2489-mapred.patch
>
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names.  This caused a major slowdown for the JobTracker.  We should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise).  We could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

Posted by "Jeffrey Naisbitt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeffrey Naisbitt updated MAPREDUCE-2489:
----------------------------------------

    Status: Open  (was: Patch Available)

Resubmitting patch to run through hudson

> Jobsplits with random hostnames can make the queue unusable
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Jeffrey Naisbitt
>            Assignee: Jeffrey Naisbitt
>         Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s-v3.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred-v4.patch, MAPREDUCE-2489-mapred.patch
>
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names.  This caused a major slowdown for the JobTracker.  We should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise).  We could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

Posted by "Jeffrey Naisbitt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13076195#comment-13076195 ] 

Jeffrey Naisbitt commented on MAPREDUCE-2489:
---------------------------------------------

I agree.  It would definitely fit better in NetUtils.  I'll post new patches soon.


> Jobsplits with random hostnames can make the queue unusable
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Jeffrey Naisbitt
>            Assignee: Jeffrey Naisbitt
>             Fix For: 0.20.205.0, 0.23.0
>
>         Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s-v3.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred-v4.patch, MAPREDUCE-2489-mapred.patch
>
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names.  This caused a major slowdown for the JobTracker.  We should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise).  We could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

Posted by "Jeffrey Naisbitt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeffrey Naisbitt updated MAPREDUCE-2489:
----------------------------------------

    Status: Open  (was: Patch Available)

There appear to be some errors in some of the tests when running 'ant test'.  I want to look into this first.

> Jobsplits with random hostnames can make the queue unusable
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Jeffrey Naisbitt
>            Assignee: Jeffrey Naisbitt
>             Fix For: 0.20.205.0, 0.23.0
>
>         Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s-v3.patch, MAPREDUCE-2489-0.20s-v4.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred-v4.patch, MAPREDUCE-2489-mapred-v5.patch, MAPREDUCE-2489-mapred.patch
>
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names.  This caused a major slowdown for the JobTracker.  We should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise).  We could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

Posted by "Jeffrey Naisbitt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036847#comment-13036847 ] 

Jeffrey Naisbitt commented on MAPREDUCE-2489:
---------------------------------------------

@Allen, apologies for not being clear in the description.

Also, I expect this patch will fail in the Hudson build until HADOOP-7314 has been committed.

> Jobsplits with random hostnames can make the queue unusable
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>            Reporter: Jeffrey Naisbitt
>            Assignee: Jeffrey Naisbitt
>         Attachments: MAPREDUCE-2489-mapred.patch
>
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names.  This caused a major slowdown for the JobTracker.  We should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise).  We could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049876#comment-13049876 ] 

Hadoop QA commented on MAPREDUCE-2489:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12482679/MAPREDUCE-2489-0.20s.patch
  against trunk revision 1136000.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 6 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/397//console

This message is automatically generated.

> Jobsplits with random hostnames can make the queue unusable
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Jeffrey Naisbitt
>            Assignee: Jeffrey Naisbitt
>         Attachments: MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred.patch
>
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names.  This caused a major slowdown for the JobTracker.  We should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise).  We could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038008#comment-13038008 ] 

jiraposter@reviews.apache.org commented on MAPREDUCE-2489:
----------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/776/
-----------------------------------------------------------

Review request for hadoop-mapreduce.


Summary
-------

We saw an issue where a custom InputSplit was returning invalid hostnames (non-repeating) for the splits that were then causing the JobTracker to attempt to excessively resolve host names. This caused a major slowdown for the JobTracker. We should prevent invalid InputSplit hostnames from affecting everyone else.

I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise). We could also fail the job after a certain number of failures in the resolve.

NOTE: This requires the changes in HADOOP-7314


This addresses bug MAPREDUCE-2489.
    https://issues.apache.org/jira/browse/MAPREDUCE-2489


Diffs
-----

  trunk/ivy.xml 1125074 
  trunk/ivy/libraries.properties 1125074 
  trunk/src/contrib/mumak/src/java/org/apache/hadoop/mapred/SimulatorJobTracker.java 1125074 
  trunk/src/java/org/apache/hadoop/mapred/JobInProgress.java 1125074 
  trunk/src/java/org/apache/hadoop/mapred/JobTracker.java 1125074 

Diff: https://reviews.apache.org/r/776/diff


Testing
-------


Thanks,

Jeffrey



> Jobsplits with random hostnames can make the queue unusable
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>            Reporter: Jeffrey Naisbitt
>            Assignee: Jeffrey Naisbitt
>         Attachments: MAPREDUCE-2489-mapred.patch
>
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names.  This caused a major slowdown for the JobTracker.  We should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise).  We could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

Posted by "Jeffrey Naisbitt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079540#comment-13079540 ] 

Jeffrey Naisbitt commented on MAPREDUCE-2489:
---------------------------------------------

To clarify, all the tests pass for the 0.20s patch.  I am still looking at the patch for trunk.

> Jobsplits with random hostnames can make the queue unusable
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Jeffrey Naisbitt
>            Assignee: Jeffrey Naisbitt
>             Fix For: 0.20.205.0, 0.23.0
>
>         Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s-v3.patch, MAPREDUCE-2489-0.20s-v4.patch, MAPREDUCE-2489-0.20s-v5.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred-v4.patch, MAPREDUCE-2489-mapred-v5.patch, MAPREDUCE-2489-mapred.patch
>
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names.  This caused a major slowdown for the JobTracker.  We should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise).  We could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033129#comment-13033129 ] 

Allen Wittenauer commented on MAPREDUCE-2489:
---------------------------------------------

Ah, ok, the non-repeating hostnames wasn't clear in the description.

> Jobsplits with random hostnames can make the queue unusable
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>            Reporter: Jeffrey Naisbitt
>            Assignee: Jeffrey Naisbitt
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names.  This caused a major slowdown for the JobTracker.  We should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise).  We could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

Posted by "Jeffrey Naisbitt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeffrey Naisbitt updated MAPREDUCE-2489:
----------------------------------------

    Fix Version/s: 0.23.0
                   0.20.205.0

> Jobsplits with random hostnames can make the queue unusable
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Jeffrey Naisbitt
>            Assignee: Jeffrey Naisbitt
>             Fix For: 0.20.205.0, 0.23.0
>
>         Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s-v3.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred-v4.patch, MAPREDUCE-2489-mapred.patch
>
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names.  This caused a major slowdown for the JobTracker.  We should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise).  We could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

Posted by "Jeffrey Naisbitt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeffrey Naisbitt updated MAPREDUCE-2489:
----------------------------------------

    Status: Patch Available  (was: Open)

> Jobsplits with random hostnames can make the queue unusable
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Jeffrey Naisbitt
>            Assignee: Jeffrey Naisbitt
>         Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s-v3.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred-v4.patch, MAPREDUCE-2489-mapred.patch
>
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names.  This caused a major slowdown for the JobTracker.  We should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise).  We could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

Posted by "Jeffrey Naisbitt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13054002#comment-13054002 ] 

Jeffrey Naisbitt commented on MAPREDUCE-2489:
---------------------------------------------

Again, this patch will fail until HADOOP-7314 has been committed.

> Jobsplits with random hostnames can make the queue unusable
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Jeffrey Naisbitt
>            Assignee: Jeffrey Naisbitt
>         Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred.patch
>
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names.  This caused a major slowdown for the JobTracker.  We should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise).  We could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mahadev konar updated MAPREDUCE-2489:
-------------------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]
          Status: Resolved  (was: Patch Available)

I just pushed this to 0.20 security and mapred trunk. Thanks Jeff!

> Jobsplits with random hostnames can make the queue unusable
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Jeffrey Naisbitt
>            Assignee: Jeffrey Naisbitt
>             Fix For: 0.20.205.0, 0.23.0
>
>         Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s-v3.patch, MAPREDUCE-2489-0.20s-v4.patch, MAPREDUCE-2489-0.20s-v5.patch, MAPREDUCE-2489-0.20s-v6.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred-v4.patch, MAPREDUCE-2489-mapred-v5.patch, MAPREDUCE-2489-mapred-v6.patch, MAPREDUCE-2489-mapred-v7.patch, MAPREDUCE-2489-mapred.patch
>
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names.  This caused a major slowdown for the JobTracker.  We should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise).  We could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

Posted by "Jeffrey Naisbitt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079472#comment-13079472 ] 

Jeffrey Naisbitt commented on MAPREDUCE-2489:
---------------------------------------------

I meant 'ant test' instead of 'mvn test'.
I was running with ant-1.8.2, which apparently caused the test failures. 
All of the tests pass with ant-1.7.1

> Jobsplits with random hostnames can make the queue unusable
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Jeffrey Naisbitt
>            Assignee: Jeffrey Naisbitt
>             Fix For: 0.20.205.0, 0.23.0
>
>         Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s-v3.patch, MAPREDUCE-2489-0.20s-v4.patch, MAPREDUCE-2489-0.20s-v5.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred-v4.patch, MAPREDUCE-2489-mapred-v5.patch, MAPREDUCE-2489-mapred.patch
>
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names.  This caused a major slowdown for the JobTracker.  We should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise).  We could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083529#comment-13083529 ] 

Hudson commented on MAPREDUCE-2489:
-----------------------------------

Integrated in Hadoop-Common-trunk-Commit #728 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/728/])
    MAPREDUCE-2489. Jobsplits with random hostnames can make the queue unusable (jeffrey naisbit via mahadev)

mahadev : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1156821
Files : 
* /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/JobTracker.java
* /hadoop/common/trunk/mapreduce/CHANGES.txt
* /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/JobInProgress.java


> Jobsplits with random hostnames can make the queue unusable
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Jeffrey Naisbitt
>            Assignee: Jeffrey Naisbitt
>             Fix For: 0.20.205.0, 0.23.0
>
>         Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s-v3.patch, MAPREDUCE-2489-0.20s-v4.patch, MAPREDUCE-2489-0.20s-v5.patch, MAPREDUCE-2489-0.20s-v6.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred-v4.patch, MAPREDUCE-2489-mapred-v5.patch, MAPREDUCE-2489-mapred-v6.patch, MAPREDUCE-2489-mapred-v7.patch, MAPREDUCE-2489-mapred.patch
>
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names.  This caused a major slowdown for the JobTracker.  We should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise).  We could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

Posted by "Jeffrey Naisbitt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049941#comment-13049941 ] 

Jeffrey Naisbitt commented on MAPREDUCE-2489:
---------------------------------------------

Test-patch results for 20-security patch:
     [exec] +1 overall.  
     [exec] 
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec] 
     [exec]     +1 tests included.  The patch appears to include 6 new or modified tests.
     [exec] 
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec] 
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec] 
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

> Jobsplits with random hostnames can make the queue unusable
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Jeffrey Naisbitt
>            Assignee: Jeffrey Naisbitt
>         Attachments: MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred.patch
>
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names.  This caused a major slowdown for the JobTracker.  We should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise).  We could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

Posted by "Jeffrey Naisbitt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeffrey Naisbitt updated MAPREDUCE-2489:
----------------------------------------

    Status: Patch Available  (was: Open)

> Jobsplits with random hostnames can make the queue unusable
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Jeffrey Naisbitt
>            Assignee: Jeffrey Naisbitt
>             Fix For: 0.20.205.0, 0.23.0
>
>         Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s-v3.patch, MAPREDUCE-2489-0.20s-v4.patch, MAPREDUCE-2489-0.20s-v5.patch, MAPREDUCE-2489-0.20s-v6.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred-v4.patch, MAPREDUCE-2489-mapred-v5.patch, MAPREDUCE-2489-mapred-v6.patch, MAPREDUCE-2489-mapred-v7.patch, MAPREDUCE-2489-mapred.patch
>
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names.  This caused a major slowdown for the JobTracker.  We should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise).  We could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083809#comment-13083809 ] 

Hudson commented on MAPREDUCE-2489:
-----------------------------------

Integrated in Hadoop-Mapreduce-trunk-Commit #761 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/761/])
    MAPREDUCE-2489. Jobsplits with random hostnames can make the queue unusable (jeffrey naisbit via mahadev)

mahadev : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1156821
Files : 
* /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/JobTracker.java
* /hadoop/common/trunk/mapreduce/CHANGES.txt
* /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/JobInProgress.java


> Jobsplits with random hostnames can make the queue unusable
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Jeffrey Naisbitt
>            Assignee: Jeffrey Naisbitt
>             Fix For: 0.20.205.0, 0.23.0
>
>         Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s-v3.patch, MAPREDUCE-2489-0.20s-v4.patch, MAPREDUCE-2489-0.20s-v5.patch, MAPREDUCE-2489-0.20s-v6.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred-v4.patch, MAPREDUCE-2489-mapred-v5.patch, MAPREDUCE-2489-mapred-v6.patch, MAPREDUCE-2489-mapred-v7.patch, MAPREDUCE-2489-mapred.patch
>
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names.  This caused a major slowdown for the JobTracker.  We should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise).  We could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

Posted by "Jeffrey Naisbitt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeffrey Naisbitt updated MAPREDUCE-2489:
----------------------------------------

    Status: Open  (was: Patch Available)

Going to try using java.net.URI instead of UrlValidator to get rid of all the extra dependency stuff.

> Jobsplits with random hostnames can make the queue unusable
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Jeffrey Naisbitt
>            Assignee: Jeffrey Naisbitt
>         Attachments: MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred.patch
>
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names.  This caused a major slowdown for the JobTracker.  We should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise).  We could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

Posted by "Jeffrey Naisbitt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeffrey Naisbitt updated MAPREDUCE-2489:
----------------------------------------

    Status: Patch Available  (was: Open)

> Jobsplits with random hostnames can make the queue unusable
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Jeffrey Naisbitt
>            Assignee: Jeffrey Naisbitt
>         Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred.patch
>
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names.  This caused a major slowdown for the JobTracker.  We should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise).  We could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080438#comment-13080438 ] 

Mahadev konar commented on MAPREDUCE-2489:
------------------------------------------

+1 for 0.20S patch. I have committed that to the branch. Thanks Jeffrey. I'll wait for the trunk common and mapred patches before closing this jira.



> Jobsplits with random hostnames can make the queue unusable
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Jeffrey Naisbitt
>            Assignee: Jeffrey Naisbitt
>             Fix For: 0.20.205.0, 0.23.0
>
>         Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s-v3.patch, MAPREDUCE-2489-0.20s-v4.patch, MAPREDUCE-2489-0.20s-v5.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred-v4.patch, MAPREDUCE-2489-mapred-v5.patch, MAPREDUCE-2489-mapred.patch
>
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names.  This caused a major slowdown for the JobTracker.  We should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise).  We could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

Posted by "Jeffrey Naisbitt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13076283#comment-13076283 ] 

Jeffrey Naisbitt commented on MAPREDUCE-2489:
---------------------------------------------

20-security branch test-patch results:

     [exec] +1 overall.  
     [exec] 
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec] 
     [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
     [exec] 
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec] 
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec] 
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

> Jobsplits with random hostnames can make the queue unusable
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Jeffrey Naisbitt
>            Assignee: Jeffrey Naisbitt
>             Fix For: 0.20.205.0, 0.23.0
>
>         Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s-v3.patch, MAPREDUCE-2489-0.20s-v4.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred-v4.patch, MAPREDUCE-2489-mapred.patch
>
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names.  This caused a major slowdown for the JobTracker.  We should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise).  We could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

Posted by "Jeffrey Naisbitt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeffrey Naisbitt updated MAPREDUCE-2489:
----------------------------------------

    Attachment: MAPREDUCE-2489-0.20s.patch

Patch for the 20-security branch

> Jobsplits with random hostnames can make the queue unusable
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>            Reporter: Jeffrey Naisbitt
>            Assignee: Jeffrey Naisbitt
>         Attachments: MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred.patch
>
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names.  This caused a major slowdown for the JobTracker.  We should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise).  We could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070318#comment-13070318 ] 

Mahadev konar commented on MAPREDUCE-2489:
------------------------------------------

Jeffrey,
 Sorry, I am a little unclear on what the patch is doing. Can you please specify what you are trying to achieve with the patch? The patch seems to create a URI with hostname and checking if its a valid URI or not? How is that verifying if a hostname is valid or not? 


> Jobsplits with random hostnames can make the queue unusable
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Jeffrey Naisbitt
>            Assignee: Jeffrey Naisbitt
>             Fix For: 0.20.205.0, 0.23.0
>
>         Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s-v3.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred-v4.patch, MAPREDUCE-2489-mapred.patch
>
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names.  This caused a major slowdown for the JobTracker.  We should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise).  We could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

Posted by "Jeffrey Naisbitt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeffrey Naisbitt updated MAPREDUCE-2489:
----------------------------------------

    Attachment: MAPREDUCE-2489-0.20s-v2.patch

Updated patch for the 0.20-security branch

> Jobsplits with random hostnames can make the queue unusable
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Jeffrey Naisbitt
>            Assignee: Jeffrey Naisbitt
>         Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred.patch
>
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names.  This caused a major slowdown for the JobTracker.  We should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise).  We could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

Posted by "Jeffrey Naisbitt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053997#comment-13053997 ] 

Jeffrey Naisbitt commented on MAPREDUCE-2489:
---------------------------------------------

test-patch results for the 0.20s patch:
     [exec] +1 overall.  
     [exec] 
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec] 
     [exec]     +1 tests included.  The patch appears to include 9 new or modified tests.
     [exec] 
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec] 
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec] 
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.


> Jobsplits with random hostnames can make the queue unusable
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Jeffrey Naisbitt
>            Assignee: Jeffrey Naisbitt
>         Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred.patch
>
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names.  This caused a major slowdown for the JobTracker.  We should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise).  We could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

Posted by "Jeffrey Naisbitt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeffrey Naisbitt updated MAPREDUCE-2489:
----------------------------------------

    Attachment: MAPREDUCE-2489-0.20s-v4.patch

Updated patch for the 20-security branch - based on Mahadev's comments.

> Jobsplits with random hostnames can make the queue unusable
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Jeffrey Naisbitt
>            Assignee: Jeffrey Naisbitt
>             Fix For: 0.20.205.0, 0.23.0
>
>         Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s-v3.patch, MAPREDUCE-2489-0.20s-v4.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred-v4.patch, MAPREDUCE-2489-mapred.patch
>
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names.  This caused a major slowdown for the JobTracker.  We should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise).  We could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

Posted by "Jeffrey Naisbitt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeffrey Naisbitt updated MAPREDUCE-2489:
----------------------------------------

    Attachment: MAPREDUCE-2489-mapred-v3.patch

Updated patch for trunk

> Jobsplits with random hostnames can make the queue unusable
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Jeffrey Naisbitt
>            Assignee: Jeffrey Naisbitt
>         Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred.patch
>
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names.  This caused a major slowdown for the JobTracker.  We should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise).  We could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

Posted by "Jeffrey Naisbitt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeffrey Naisbitt updated MAPREDUCE-2489:
----------------------------------------

    Attachment: MAPREDUCE-2489-0.20s-v5.patch

Updated 20-security patch to fix issues with tests failing.  

test-patch results:
     [exec] +1 overall.  

     [exec] 

     [exec]     +1 @author.  The patch does not contain any @author tags.

     [exec] 

     [exec]     +1 tests included.  The patch appears to include 6 new or modified tests.

     [exec] 

     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.

     [exec] 

     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

     [exec] 

     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

     [exec] 

==========================================
mvn test results:
All the tests pass that currently pass with the clean checkout.




> Jobsplits with random hostnames can make the queue unusable
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Jeffrey Naisbitt
>            Assignee: Jeffrey Naisbitt
>             Fix For: 0.20.205.0, 0.23.0
>
>         Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s-v3.patch, MAPREDUCE-2489-0.20s-v4.patch, MAPREDUCE-2489-0.20s-v5.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred-v4.patch, MAPREDUCE-2489-mapred-v5.patch, MAPREDUCE-2489-mapred.patch
>
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names.  This caused a major slowdown for the JobTracker.  We should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise).  We could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

Posted by "Jeffrey Naisbitt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeffrey Naisbitt updated MAPREDUCE-2489:
----------------------------------------

    Attachment: MAPREDUCE-2489-mapred-v7.patch
                MAPREDUCE-2489-0.20s-v6.patch

Updated patches removing the incompatible change and corresponding code references here.  

All tests pass on branch-0.20-security.

branch-0.20-security test-patch results:
test-patch results:
     [exec] +1 overall.  
     [exec] 
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec] 
     [exec]     +1 tests included.  The patch appears to include 6 new or modified tests.
     [exec] 
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec] 
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec] 
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.
     [exec] 

======================================================
All tests pass in trunk except for those that are currently failing with a clean checkout of trunk as well (TestMRCLI, TestFileSystem, TestMapredSystemDir, TestLocalRunner, TestDBJob, TestDataDrivenDBInputFormat)

test-patch results for trunk:
     [exec] -1 overall.  
     [exec] 
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec] 
     [exec]     -1 tests included.  The patch doesn't appear to include any new or modified tests.
     [exec]                         Please justify why no new tests are needed for this patch.
     [exec]                         Also please list what manual steps were performed to verify this patch.
     [exec] 
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec] 
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec] 
     [exec]     -1 findbugs.  The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.
     [exec] 
     [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
     [exec] 
     [exec]     -1 system test framework.  The patch failed system test framework compile.
     [exec] 


The tests were added to the hadoop-common portion of this patch.
The findbug warning is unrelated to this patch.
The system test failure is during the -compile-fault-inject: phase because the hadoop-common changes have not been committed yet.




> Jobsplits with random hostnames can make the queue unusable
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Jeffrey Naisbitt
>            Assignee: Jeffrey Naisbitt
>             Fix For: 0.20.205.0, 0.23.0
>
>         Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s-v3.patch, MAPREDUCE-2489-0.20s-v4.patch, MAPREDUCE-2489-0.20s-v5.patch, MAPREDUCE-2489-0.20s-v6.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred-v4.patch, MAPREDUCE-2489-mapred-v5.patch, MAPREDUCE-2489-mapred-v6.patch, MAPREDUCE-2489-mapred-v7.patch, MAPREDUCE-2489-mapred.patch
>
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names.  This caused a major slowdown for the JobTracker.  We should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise).  We could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

Posted by "Jeffrey Naisbitt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeffrey Naisbitt updated MAPREDUCE-2489:
----------------------------------------

    Affects Version/s: 0.23.0
                       0.20.205.0
               Status: Patch Available  (was: Open)

> Jobsplits with random hostnames can make the queue unusable
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Jeffrey Naisbitt
>            Assignee: Jeffrey Naisbitt
>         Attachments: MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred.patch
>
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names.  This caused a major slowdown for the JobTracker.  We should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise).  We could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

Posted by "Jeffrey Naisbitt (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeffrey Naisbitt updated MAPREDUCE-2489:
----------------------------------------

    Attachment: MAPREDUCE-2489-mapred-v6.patch

Updated patch to fix an issue with some tests failing.

> Jobsplits with random hostnames can make the queue unusable
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Jeffrey Naisbitt
>            Assignee: Jeffrey Naisbitt
>             Fix For: 0.20.205.0, 0.23.0
>
>         Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s-v3.patch, MAPREDUCE-2489-0.20s-v4.patch, MAPREDUCE-2489-0.20s-v5.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred-v4.patch, MAPREDUCE-2489-mapred-v5.patch, MAPREDUCE-2489-mapred-v6.patch, MAPREDUCE-2489-mapred.patch
>
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names.  This caused a major slowdown for the JobTracker.  We should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise).  We could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083814#comment-13083814 ] 

Hudson commented on MAPREDUCE-2489:
-----------------------------------

Integrated in Hadoop-Mapreduce-trunk #752 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/752/])
    MAPREDUCE-2489. Jobsplits with random hostnames can make the queue unusable (jeffrey naisbit via mahadev)

mahadev : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1156821
Files : 
* /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/JobTracker.java
* /hadoop/common/trunk/mapreduce/CHANGES.txt
* /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/JobInProgress.java


> Jobsplits with random hostnames can make the queue unusable
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.20.205.0, 0.23.0
>            Reporter: Jeffrey Naisbitt
>            Assignee: Jeffrey Naisbitt
>             Fix For: 0.20.205.0, 0.23.0
>
>         Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s-v3.patch, MAPREDUCE-2489-0.20s-v4.patch, MAPREDUCE-2489-0.20s-v5.patch, MAPREDUCE-2489-0.20s-v6.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred-v4.patch, MAPREDUCE-2489-mapred-v5.patch, MAPREDUCE-2489-mapred-v6.patch, MAPREDUCE-2489-mapred-v7.patch, MAPREDUCE-2489-mapred.patch
>
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names.  This caused a major slowdown for the JobTracker.  We should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise).  We could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032738#comment-13032738 ] 

Allen Wittenauer commented on MAPREDUCE-2489:
---------------------------------------------

I'm assuming no nscd or dns cache was running on the JT?

> Jobsplits with random hostnames can make the queue unusable
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-2489
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>            Reporter: Jeffrey Naisbitt
>            Assignee: Jeffrey Naisbitt
>
> We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names.  This caused a major slowdown for the JobTracker.  We should prevent invalid InputSplit hostnames from affecting everyone else.
> I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise).  We could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira