You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Jim Brennan (JIRA)" <ji...@apache.org> on 2018/06/20 14:17:00 UTC

[jira] [Created] (HADOOP-15548) Randomize local dirs

Jim Brennan created HADOOP-15548:
------------------------------------

             Summary: Randomize local dirs
                 Key: HADOOP-15548
                 URL: https://issues.apache.org/jira/browse/HADOOP-15548
             Project: Hadoop Common
          Issue Type: Bug
            Reporter: Jim Brennan
            Assignee: Jim Brennan


shuffle LOCAL_DIRS, LOG_DIRS and LOCAL_USER_DIRS when launching container. Some applications will process these in exactly the same way in every container (e.g. roundrobin) which can cause disks to get unnecessarily overloaded (e.g. one output file written to first entry specified in the environment variable).

There are two paths for local dir allocation, depending on whether the size is unknown or known.  The unknown path already uses a random algorithm.  The known path initializes with a random starting point, and then goes round-robin after that.  When selecting a dir, it increments the last used by one and then checks sequentially until it finds a dir that satisfies the request.  Proposal is to increment by a random value of between 1 and num_dirs - 1, and then check sequentially from there.  This should result in a more random selection in all cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org