You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Zhihua Deng (Jira)" <ji...@apache.org> on 2020/03/27 10:04:00 UTC

[jira] [Commented] (MAPREDUCE-7241) FileInputFormat listStatus with less memory footprint

    [ https://issues.apache.org/jira/browse/MAPREDUCE-7241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17068493#comment-17068493 ] 

Zhihua Deng commented on MAPREDUCE-7241:
----------------------------------------

[~stevel@apache.org], [~jlowe], [~bkarthikk] Can anyone help review or give some feedback on this small changes?   the patch has been running for over six monthes on cluster of  thousands of nodes and works.

> FileInputFormat listStatus with less memory footprint
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-7241
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7241
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: job submission
>    Affects Versions: 2.6.1
>            Reporter: Zhihua Deng
>            Priority: Major
>         Attachments: MAPREDUCE-7241.trunk.02.patch, MAPREDUCE-7241.trunk.patch, filestatus.png
>
>
> This case sometimes sees in hive when user issues queries over all partitions by mistakes. The file status cached when listing status could accumulate to over 3g.  After digging into the  dumped memory, the LocatedBlock occupies about 50%(sometimes over 60%) memory that retained by LocatedFileStatus, as shows followed,
> !filestatus.png!
> Right now we only extract the block locations info from LocatedFileStatus,  the datanode infos(types) or block token are not taken into account. So there is no need to cache LocatedBlock, as do like this:
> BlockLocation[] blockLocations = dedup(stat.getBlockLocations());
>  LocatedFileStatus shrink = new LocatedFileStatus(stat, blockLocations);
> private static BlockLocation[] dup(BlockLocation[] blockLocations) {
>      BlockLocation[] copyLocs = new BlockLocation[blockLocations.length];
>      int i = 0;
>      for (BlockLocation location : blockLocations)
> {         copyLocs[i++] = new BlockLocation(location);     }
>     return copyLocs;
>  }
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org