You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Allen Wittenauer (JIRA)" <ji...@apache.org> on 2015/05/06 05:37:31 UTC
[jira] [Updated] (MAPREDUCE-5969) Private non-Archive Files' size
add twice in Distributed Cache directory size calculation.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Allen Wittenauer updated MAPREDUCE-5969:
----------------------------------------
Labels: BB2015-05-TBR (was: )
> Private non-Archive Files' size add twice in Distributed Cache directory size calculation.
> ------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-5969
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv1
> Reporter: zhihai xu
> Assignee: zhihai xu
> Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-5969.branch1.1.patch, MAPREDUCE-5969.branch1.patch
>
>
> Private non-Archive Files' size add twice in Distributed Cache directory size calculation. Private non-Archive Files list is passed in by "-files" command line option. The Distributed Cache directory size is used to check whether the total cache files size exceed the cache size limitation, the default cache size limitation is 10G.
> I add log in addCacheInfoUpdate and setSize in TrackerDistributedCacheManager.java.
> I use the following command to test:
> hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar /tmp/zxu/test_in/ /tmp/zxu/test_out
> to add two files into distributed cache:WordCount.java and wordcount.jar.
> WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 bytes. The total should be 6260.
> The log show these files size added twice:
> add one time before download to local node and add second time after download to local node, so total file number becomes 4 instead of 2:
> addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local
> addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local
> addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local
> In the code, for Private non-Archive File, the first time we add file size is at
> getLocalCache:
> {code}
> if (!isArchive) {
> //for private archives, the lengths come over RPC from the
> //JobLocalizer since the JobLocalizer is the one who expands
> //archives and gets the total length
> lcacheStatus.size = fileStatus.getLen();
> LOG.info("getLocalCache:" + localizedPath + " size = "
> + lcacheStatus.size);
> // Increase the size and sub directory count of the cache
> // from baseDirSize and baseDirNumberSubDir.
> baseDirManager.addCacheInfoUpdate(lcacheStatus);
> }
> {code}
> The second time we add file size is at
> setSize:
> {code}
> synchronized (status) {
> status.size = size;
> baseDirManager.addCacheInfoUpdate(status);
> }
> {code}
> The fix is not to add the file size for for Private non-Archive File after download(downloadCacheObject).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)