You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Tom White (Created) (JIRA)" <ji...@apache.org> on 2012/02/28 19:57:46 UTC

[jira] [Created] (MAPREDUCE-3936) Improve counter limits behaviour in 1.x

Improve counter limits behaviour in 1.x
---------------------------------------

                 Key: MAPREDUCE-3936
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3936
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: mrv1
            Reporter: Tom White


The code for enforcing counter limits (from MAPREDUCE-1943) creates a static JobConf instance to load the limits, which may throw an exception if the client limit is set to be lower than the limit on the cluster (perhaps because the cluster limit was raised from the default).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3936) Clients should not enforce counter limits

Posted by "Tom White (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White updated MAPREDUCE-3936:
---------------------------------

    Attachment: MAPREDUCE-3936.patch

Here's a patch implementing Alejandro's suggestion of enforcing the limits at the time of aggregation in the AM.

I did a test on a single node cluster where I set mapreduce.job.counters.max to be a lower value in the client config than the server config. Without the patch the job failed, but with the patch it passed.
                
> Clients should not enforce counter limits 
> ------------------------------------------
>
>                 Key: MAPREDUCE-3936
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3936
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: MAPREDUCE-3936.patch, MAPREDUCE-3936.patch
>
>
> The code for enforcing counter limits (from MAPREDUCE-1943) creates a static JobConf instance to load the limits, which may throw an exception if the client limit is set to be lower than the limit on the cluster (perhaps because the cluster limit was raised from the default).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3936) Clients should not enforce counter limits

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13472659#comment-13472659 ] 

Robert Joseph Evans commented on MAPREDUCE-3936:
------------------------------------------------

MAPREDUCE-3061 is only a concept right now.  The JIRA was created over a year ago, and the only update since was someone asking for more clarification about the requirements to which no one responded.  I don't really want to wait for a JIRA that is likely to be off in the future to fix a very real problem that we have right now.

Additionally I don't see splitting the history server into two independent parts as being something that will solve this problem.  It could help, and any changes we make should ideally have this split in mind, but it will not just solve the issue.  The issue is how much data can the history server cache in memory vs. leave in HDFS and reconstruct on demand. And what is the granularity of that caching.  Right now the caching is happening on a per job basis, which is way too large.

We could fix this by not caching at all. Every time a page is loaded, a web service call is made, or an RPC call comes in we parse the job history log and reconstruct just the data for that request and nothing else.  On some very large jobs(50,000+ tasks) I have seen parsing the log take 10 seconds so this would have a negative impact on page load times.  Also what kind of extra load would we be placing on HDFS doing this every time? It really depends on how used the history server becomes.

The final solution really has to be some middle ground where we can cache a known quantity of data, and then reconstruct everything else on demand as needed.  This is a lot of work, and so in the short term I would prefer to see something that allows the history server to not crash with an OOM, but will still provide most of the needed functionality until something better can be written.

I know that the History Server can easily get OOMs when loading large jobs with lots of tasks, which is a far bigger concern to me then the counters are right now, simply because the AM still tries to enforce the counter limits.
                
> Clients should not enforce counter limits 
> ------------------------------------------
>
>                 Key: MAPREDUCE-3936
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3936
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: MAPREDUCE-3936.patch, MAPREDUCE-3936.patch
>
>
> The code for enforcing counter limits (from MAPREDUCE-1943) creates a static JobConf instance to load the limits, which may throw an exception if the client limit is set to be lower than the limit on the cluster (perhaps because the cluster limit was raised from the default).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3936) Clients should not enforce counter limits

Posted by "Tom White (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13468592#comment-13468592 ] 

Tom White commented on MAPREDUCE-3936:
--------------------------------------

Luke, are you saying that we should remove the enforcement of limits in AMs, since the user could choose to do that by modifying the AM themselves?

Perhaps the limits aren't needed in the history server either, since the Counters objects are short-lived (for display on a web page, or to return to the job client) so they will be GC'd very quickly. This is unlike the situation with the JT where all running jobs counters are held in memory for the duration of the job.
                
> Clients should not enforce counter limits 
> ------------------------------------------
>
>                 Key: MAPREDUCE-3936
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3936
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: MAPREDUCE-3936.patch, MAPREDUCE-3936.patch
>
>
> The code for enforcing counter limits (from MAPREDUCE-1943) creates a static JobConf instance to load the limits, which may throw an exception if the client limit is set to be lower than the limit on the cluster (perhaps because the cluster limit was raised from the default).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3936) Clients should not enforce counter limits

Posted by "Robert Joseph Evans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13467935#comment-13467935 ] 

Robert Joseph Evans commented on MAPREDUCE-3936:
------------------------------------------------

Under YARN the History Server also reads in the counters and that is where the counter limits are needed in some form or another.  I don't really know what the correct solution is though.   The history server cannot rely on the AM to enforce a limit because it is user code, so if a job goes over the counter limits what should the history server do? Especially because the client may be requesting data from the history server after the AM exits, so we really want the history server to return results as close to what the AM did as possible.  Should the history server refuse to load the job and throw an exception? That would result in the client possibly failing, unless we are sure to update the client to deal with that eventuality (I think this is the current behavior).  Should it just not load all of the counters? That would result in odd behavior where the client would get a different view of the results when talking to the history server instead of talking to the AM, but it wouldn't crash.
                
> Clients should not enforce counter limits 
> ------------------------------------------
>
>                 Key: MAPREDUCE-3936
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3936
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: MAPREDUCE-3936.patch, MAPREDUCE-3936.patch
>
>
> The code for enforcing counter limits (from MAPREDUCE-1943) creates a static JobConf instance to load the limits, which may throw an exception if the client limit is set to be lower than the limit on the cluster (perhaps because the cluster limit was raised from the default).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3936) Clients should not enforce counter limits

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13454833#comment-13454833 ] 

Hadoop QA commented on MAPREDUCE-3936:
--------------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12544972/MAPREDUCE-3936.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 1 new or modified test files.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 eclipse:eclipse.  The patch built with eclipse:eclipse.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2849//testReport/
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2849//console

This message is automatically generated.
                
> Clients should not enforce counter limits 
> ------------------------------------------
>
>                 Key: MAPREDUCE-3936
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3936
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: MAPREDUCE-3936.patch, MAPREDUCE-3936.patch
>
>
> The code for enforcing counter limits (from MAPREDUCE-1943) creates a static JobConf instance to load the limits, which may throw an exception if the client limit is set to be lower than the limit on the cluster (perhaps because the cluster limit was raised from the default).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3936) Clients should not enforce counter limits

Posted by "Tom White (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White updated MAPREDUCE-3936:
---------------------------------

    Attachment: MAPREDUCE-3936.patch

Here's a rough patch which shows the idea.
                
> Clients should not enforce counter limits 
> ------------------------------------------
>
>                 Key: MAPREDUCE-3936
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3936
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: MAPREDUCE-3936.patch
>
>
> The code for enforcing counter limits (from MAPREDUCE-1943) creates a static JobConf instance to load the limits, which may throw an exception if the client limit is set to be lower than the limit on the cluster (perhaps because the cluster limit was raised from the default).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3936) Clients should not enforce counter limits

Posted by "Alejandro Abdelnur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445086#comment-13445086 ] 

Alejandro Abdelnur commented on MAPREDUCE-3936:
-----------------------------------------------

IMO counters limits should be enforced in the aggregator (JT for Hadoop 1, MRAM for Hadoop 2). They could be enforced also at Task level as Tom is suggesting, as a way to stop thing in case of a counters happy task, but only the aggregator has a full view.
                
> Clients should not enforce counter limits 
> ------------------------------------------
>
>                 Key: MAPREDUCE-3936
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3936
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: MAPREDUCE-3936.patch
>
>
> The code for enforcing counter limits (from MAPREDUCE-1943) creates a static JobConf instance to load the limits, which may throw an exception if the client limit is set to be lower than the limit on the cluster (perhaps because the cluster limit was raised from the default).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3936) Clients should not enforce counter limits

Posted by "Luke Lu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13469719#comment-13469719 ] 

Luke Lu commented on MAPREDUCE-3936:
------------------------------------

bq. the generic history server can display the counters with existing behavior (preserving system counters while displaying an error showing that user counter limits have been exceeded)

Sorry, this is not possible as the generic history server has no internal knowledge of MR counters. It can use a pageable/scroll-on-demand widget to display arbitrarily large json object though.
                
> Clients should not enforce counter limits 
> ------------------------------------------
>
>                 Key: MAPREDUCE-3936
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3936
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: MAPREDUCE-3936.patch, MAPREDUCE-3936.patch
>
>
> The code for enforcing counter limits (from MAPREDUCE-1943) creates a static JobConf instance to load the limits, which may throw an exception if the client limit is set to be lower than the limit on the cluster (perhaps because the cluster limit was raised from the default).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3936) Clients should not enforce counter limits

Posted by "Luke Lu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13469708#comment-13469708 ] 

Luke Lu commented on MAPREDUCE-3936:
------------------------------------

bq. are you saying that we should remove the enforcement of limits in AMs, since the user could choose to do that by modifying the AM themselves?

Yes and no, as the limits can serve as a less dramatic way of prevent users from shooting themselves in the foot than OOMs of AMs.

bq. Should it just not load all of the counters?

The current behavior in trunk (post 0.23) is actually fairly reasonable: it preserves all the system counters and while make the exception queryable.

bq. if a job goes over the counter limits what should the history server do?

IMO, the best way to solve this problem is via MAPREDUCE-3061: the generic history server can display the counters with existing behavior (preserving system counters while displaying an error showing that user counter limits have been exceeded); an application specific history server can display the counters however they want with the serialized counter json streamed from the generic history server. 


                
> Clients should not enforce counter limits 
> ------------------------------------------
>
>                 Key: MAPREDUCE-3936
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3936
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: MAPREDUCE-3936.patch, MAPREDUCE-3936.patch
>
>
> The code for enforcing counter limits (from MAPREDUCE-1943) creates a static JobConf instance to load the limits, which may throw an exception if the client limit is set to be lower than the limit on the cluster (perhaps because the cluster limit was raised from the default).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3936) Clients should not enforce counter limits

Posted by "Matt Foley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matt Foley updated MAPREDUCE-3936:
----------------------------------

    Target Version/s: 1.1.1, 2.0.3-alpha  (was: 1.1.0, 2.0.3-alpha)
    
> Clients should not enforce counter limits 
> ------------------------------------------
>
>                 Key: MAPREDUCE-3936
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3936
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: MAPREDUCE-3936.patch, MAPREDUCE-3936.patch
>
>
> The code for enforcing counter limits (from MAPREDUCE-1943) creates a static JobConf instance to load the limits, which may throw an exception if the client limit is set to be lower than the limit on the cluster (perhaps because the cluster limit was raised from the default).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3936) Clients should not enforce counter limits

Posted by "Tom White (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White updated MAPREDUCE-3936:
---------------------------------

    Target Version/s: 1.1.0, 2.0.3-alpha  (was: 1.1.0)
              Status: Patch Available  (was: Open)
    
> Clients should not enforce counter limits 
> ------------------------------------------
>
>                 Key: MAPREDUCE-3936
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3936
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: MAPREDUCE-3936.patch, MAPREDUCE-3936.patch
>
>
> The code for enforcing counter limits (from MAPREDUCE-1943) creates a static JobConf instance to load the limits, which may throw an exception if the client limit is set to be lower than the limit on the cluster (perhaps because the cluster limit was raised from the default).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3936) Clients should not enforce counter limits

Posted by "Tom White (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White updated MAPREDUCE-3936:
---------------------------------

    Target Version/s: 1.1.0, 2.1.0-alpha  (was: 1.1.0)
            Assignee: Tom White
             Summary: Clients should not enforce counter limits   (was: Improve counter limits behaviour in 1.x)

This problem affects trunk/branch-2 as well, so changing the summary.

I think this can be fixed by changing the code so that limits are only enforced at source - i.e. when user task code is incrementing counters. In other cases - when counters are deserialized, or when manipulating counters in the client - there is no need to check limits (and indeed a mixed configuration will cause problems as mentioned in the description).
                
> Clients should not enforce counter limits 
> ------------------------------------------
>
>                 Key: MAPREDUCE-3936
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3936
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1
>            Reporter: Tom White
>            Assignee: Tom White
>
> The code for enforcing counter limits (from MAPREDUCE-1943) creates a static JobConf instance to load the limits, which may throw an exception if the client limit is set to be lower than the limit on the cluster (perhaps because the cluster limit was raised from the default).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3936) Clients should not enforce counter limits

Posted by "Luke Lu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13467563#comment-13467563 ] 

Luke Lu commented on MAPREDUCE-3936:
------------------------------------

I can see a need for branch-1, which has fairly different code base. But given AM is a per job instance controlled by user, I wonder why the patch is necessary (besides setting the limits higher)?
                
> Clients should not enforce counter limits 
> ------------------------------------------
>
>                 Key: MAPREDUCE-3936
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3936
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: MAPREDUCE-3936.patch, MAPREDUCE-3936.patch
>
>
> The code for enforcing counter limits (from MAPREDUCE-1943) creates a static JobConf instance to load the limits, which may throw an exception if the client limit is set to be lower than the limit on the cluster (perhaps because the cluster limit was raised from the default).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3936) Clients should not enforce counter limits

Posted by "Matt Foley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matt Foley updated MAPREDUCE-3936:
----------------------------------

    Target Version/s: 1.2.0, 2.0.3-alpha  (was: 1.1.1, 2.0.3-alpha)
    
> Clients should not enforce counter limits 
> ------------------------------------------
>
>                 Key: MAPREDUCE-3936
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3936
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: MAPREDUCE-3936.patch, MAPREDUCE-3936.patch
>
>
> The code for enforcing counter limits (from MAPREDUCE-1943) creates a static JobConf instance to load the limits, which may throw an exception if the client limit is set to be lower than the limit on the cluster (perhaps because the cluster limit was raised from the default).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira