You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Raghu Angadi (JIRA)" <ji...@apache.org> on 2007/07/06 19:49:04 UTC

[jira] Commented: (HADOOP-1565) DFSScalability: reduce memory usage of namenode

    [ https://issues.apache.org/jira/browse/HADOOP-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12510751 ] 

Raghu Angadi commented on HADOOP-1565:
--------------------------------------

Last time I checked couple of months back, file name String somehow ended up using 128 byte array. Could you double check? Milind noticed that this might be because of using substring() to get file name from full path. If this is the case then, this can save around 100 bytes per file.


> DFSScalability: reduce memory usage of namenode
> -----------------------------------------------
>
>                 Key: HADOOP-1565
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1565
>             Project: Hadoop
>          Issue Type: Bug
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>
> Experiments have demonstrated that a single file/block needs about 300 to 500 bytes of main memory on a 64-bit Namenode. This puts some limitations on the size of the file system that a single namenode can support. Most of this overhead occurs because a block and/or filename is inserted into multiple TreeMaps and/or HashSets.
> Here are a few ideas that can be measured to see if an appreciable reduction of memory usage occurs:
> 1. Change FSDirectory.children from a TreeMap to an array. Do binary search in this array while looking up children. This saves a TreeMap object for every intermediate node in the directory tree.
> 2. Change INode from an inner class. This saves on one "parent object" reference for each INODE instance. 4 bytes per inode.
> 3. Keep all DatanodeDescriptors in an array. BlocksMap.nodes[] is currently a 64-bit reference to the DatanodeDescriptor object. Instead, it can be a 'short'. This will probably save about 16 bytes per block.
> 4. Change DatanodeDescriptor.blocks from a SortedTreeMap to a HashMap? Block report processing CPU cost can increase.
> For the records: TreeMap has the following fields:
> 	Object key;
> 	Object value;
> 	Entry left = null;
> 	Entry right = null;
> 	Entry parent;
> 	boolean color = BLACK;
> and HashMap object:
> 	final Object key;
> 	Object value;
> 	final int hash;
> 	Entry next;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.