You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "Billie Rinaldi (JIRA)" <ji...@apache.org> on 2013/04/19 19:39:16 UTC

[jira] [Commented] (ACCUMULO-507) Large amount of ranges can prevent job from kicking off

    [ https://issues.apache.org/jira/browse/ACCUMULO-507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13636653#comment-13636653 ] 

Billie Rinaldi commented on ACCUMULO-507:
-----------------------------------------

I think this fix was reverted for the wrong reason.  ACCUMULO-826 found that a MapReduce job would fail if the process that started the job was killed.  This was an issue because we were writing the user's password to a file that was being deleted on exit.  Whenever a new map task is kicked off it needs to read the password, so it was trying to read a nonexistent file.  But the ranges don't need to be read by each map task, they only need to be accessed once when getSplits is called, which happens before the job is actually submitted.  Thus it shouldn't matter if the file containing the ranges is deleted in the middle of a job -- if the process exits before the job is actually submitted, the job will fail, but that seems OK to me.

The other issue pointed out in ACCUMULO-826 is valid, that the file was being written to the file system, added to the distributed cache, then read directly from the file system.  The ranges file shouldn't have been added to the distributed cache at all, since it's not needed by the slave nodes.

However, there may be little point in re-applying this fix if the mapred.user.jobconf.limit applies to the whole job submit directory.  Using the ranges file method might effectively halve the size of the job submit directory, but you could still hit the limit if you had enough ranges.  I guess I'll try to verify this is true.  Does anyone have opinions about this issue?
                
> Large amount of ranges can prevent job from kicking off
> -------------------------------------------------------
>
>                 Key: ACCUMULO-507
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-507
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 1.3.5
>            Reporter: John Vines
>            Assignee: Billie Rinaldi
>            Priority: Minor
>              Labels: mapreduce
>
> We use the various ranges a user provides to create splits. Those get read when the job is submitted by the client. On the client side, those ranges are used to get all of the splits, and then the job is started. If the configuration is too large, the job will fail to submit (this size is configurable, but that's besides the point). We should look into clearing the ranges out of the jobconf if it's large to prevent this error, since at this point the ranges are no longer needed in the configuration.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira