You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Stas Oskin <st...@gmail.com> on 2009/08/31 12:40:38 UTC

Datanode high memory usage

Hi.

I measured the Datanode memory usage, and noticed they take up to 700 MB of
RAM.

As their main job is to store files to disk, any idea why they take so much
RAM?

Thanks for any information.

Re: Datanode high memory usage

Posted by Stas Oskin <st...@gmail.com>.
Hi.

    [
> https://issues.apache.org/jira/browse/HADOOP-6168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
>
>
Does it has any effect on the issue I have?

It seems from the description that the issues are related to various node
task, and not to one particular.

Regards.

RE: Datanode high memory usage

Posted by Amogh Vasekar <am...@yahoo-inc.com>.
Ahh.. very luckily got a mesg on that jira today itself.
----------
     [ https://issues.apache.org/jira/browse/HADOOP-6168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Allen Wittenauer resolved HADOOP-6168.
--------------------------------------

    Resolution: Duplicate

agreed that this a dupe.

> HADOOP_HEAPSIZE cannot be done per-server easily
> ------------------------------------------------
>
>                 Key: HADOOP-6168
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6168
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: conf
>    Affects Versions: 0.18.3
>            Reporter: Allen Wittenauer
>
> The hadoop script forces a heap that cannot be easily overridden if one wants to push the same config everywhere.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
-----------

Cheers!
Amogh
-----Original Message-----
From: Stas Oskin [mailto:stas.oskin@gmail.com] 
Sent: Tuesday, September 01, 2009 2:31 PM
To: common-user@hadoop.apache.org
Subject: Re: Datanode high memory usage

Hi.

2009/9/1 Amogh Vasekar <am...@yahoo-inc.com>

> This wont change the daemon configs.
> Hadoop by default allocates 1000MB of memory for each of its daemons, which
> can be controlled by HADOOP_HEAPSIZE, HADOOP_NAMENODE_OPTS,
> HADOOP_TASKTRACKER_OPTS in the hadoop script.
> However, there was a discussion on this sometime back wherein these options
> would be overridden by default 1000MB, not sure if the patch is available.
>
>
 So this value is getting overriden later on by hard-coded value?

Do you know by chance the patch ID?

Regards.

Re: Datanode high memory usage

Posted by Stas Oskin <st...@gmail.com>.
Hi.

2009/9/1 Amogh Vasekar <am...@yahoo-inc.com>

> This wont change the daemon configs.
> Hadoop by default allocates 1000MB of memory for each of its daemons, which
> can be controlled by HADOOP_HEAPSIZE, HADOOP_NAMENODE_OPTS,
> HADOOP_TASKTRACKER_OPTS in the hadoop script.
> However, there was a discussion on this sometime back wherein these options
> would be overridden by default 1000MB, not sure if the patch is available.
>
>
 So this value is getting overriden later on by hard-coded value?

Do you know by chance the patch ID?

Regards.

RE: Datanode high memory usage

Posted by Amogh Vasekar <am...@yahoo-inc.com>.
This wont change the daemon configs.
Hadoop by default allocates 1000MB of memory for each of its daemons, which can be controlled by HADOOP_HEAPSIZE, HADOOP_NAMENODE_OPTS, HADOOP_TASKTRACKER_OPTS in the hadoop script.
However, there was a discussion on this sometime back wherein these options would be overridden by default 1000MB, not sure if the patch is available.

Cheers!
Amogh




-----Original Message-----
From: Stas Oskin [mailto:stas.oskin@gmail.com] 
Sent: Monday, August 31, 2009 10:40 PM
To: common-user@hadoop.apache.org
Subject: Re: Datanode high memory usage

Hi.


> <property>
>  <name>mapred.child.java.opts</name>
>  <value>
>     -Xmx512M
>  </value>
> </property>
>
>
This has effect even if I not using any reduce tasks?

Regards.

Re: Datanode high memory usage

Posted by Stas Oskin <st...@gmail.com>.
Hi.

Again, experimentation is needed. All we can say is that the datastructures
> get smaller, but not as small as 32-bit systems, as every instance is
> aligned to 8-byte/64-bit boundaries.
>


Ok, thanks - will check and see if/how it reduces the used memory.

Regards.

Re: Datanode high memory usage

Posted by Steve Loughran <st...@apache.org>.
Stas Oskin wrote:
> Hi.
> 
>> It would be nice if Java 6 had a way of switching compressed pointers on by
>> default -the way JRockit 64 bit did. Right now you have to edit every shell
>> script to start up every program,  hadoop included.  Maybe when jdk7 ships
>> it will do this by default.
>>
> 
> Does it give any memory benefits on Datanode as well, in addition to
> Namenode?

All we have so far is in  https://issues.apache.org/jira/browse/HDFS-559

If you are willing to work out the memory savings in the datanode, with 
different block sizes and file counts, it would be welcome.

> 
>  Also, just how much RAM is gained by this setting?
> 

Again, experimentation is needed. All we can say is that the 
datastructures get smaller, but not as small as 32-bit systems, as every 
instance is aligned to 8-byte/64-bit boundaries.

Re: Datanode high memory usage

Posted by Stas Oskin <st...@gmail.com>.
Hi.

>
> It would be nice if Java 6 had a way of switching compressed pointers on by
> default -the way JRockit 64 bit did. Right now you have to edit every shell
> script to start up every program,  hadoop included.  Maybe when jdk7 ships
> it will do this by default.
>

Does it give any memory benefits on Datanode as well, in addition to
Namenode?

 Also, just how much RAM is gained by this setting?

Re: Datanode high memory usage

Posted by Steve Loughran <st...@apache.org>.
Allen Wittenauer wrote:
> On 9/2/09 3:49 AM, "Stas Oskin" <st...@gmail.com> wrote:
>>> It's a Sun JVM setting, not something Hadoop will control.  You'd have to
>>> turn it on in hadoop-env.sh.
>>>
>>>
>> Question is, if Hadoop will include this as standard,  if it indeed has such
>> benefits.

We can't do this as then if you try and bring up Hadoop on a VM without 
this option (currently, all OS/X JVMs), your java program will not start.

> 
> Hadoop doesn't have a -standard- here, it has a -default-.  JVM settings are
> one of those things that should just automatically be expected to be
> adjusted on a per installation basis.  It is pretty much impossible to get
> it correct for everyone.  [Thanks Java. :( ]
> 

It would be nice if Java 6 had a way of switching compressed pointers on 
by default -the way JRockit 64 bit did. Right now you have to edit every 
shell script to start up every program,  hadoop included.  Maybe when 
jdk7 ships it will do this by default.

Re: Datanode high memory usage

Posted by Allen Wittenauer <aw...@linkedin.com>.
On 9/2/09 3:49 AM, "Stas Oskin" <st...@gmail.com> wrote:
>> It's a Sun JVM setting, not something Hadoop will control.  You'd have to
>> turn it on in hadoop-env.sh.
>> 
>> 
> Question is, if Hadoop will include this as standard,  if it indeed has such
> benefits.

Hadoop doesn't have a -standard- here, it has a -default-.  JVM settings are
one of those things that should just automatically be expected to be
adjusted on a per installation basis.  It is pretty much impossible to get
it correct for everyone.  [Thanks Java. :( ]


Re: Datanode high memory usage

Posted by Stas Oskin <st...@gmail.com>.
Hi.


>>
> It's a Sun JVM setting, not something Hadoop will control.  You'd have to
> turn it on in hadoop-env.sh.
>
>
Question is, if Hadoop will include this as standard,  if it indeed has such
benefits.

Regards.

Re: Datanode high memory usage

Posted by Bryan Talbot <bt...@aeriagames.com>.
For info on newer JDK support for compressed oops, see http://java.sun.com/javase/6/webnotes/6u14.html 
  and http://wikis.sun.com/display/HotSpotInternals/CompressedOops


-Bryan




On Sep 1, 2009, at Sep 1, 12:21 PM, Brian Bockelman wrote:

>
> On Sep 1, 2009, at 1:58 PM, Stas Oskin wrote:
>
>> Hi.
>>
>>
>>> With regards to memory, have you tried the compressed pointers JDK  
>>> option
>>> (we saw great benefits on the NN)?  Java is incredibly hard to get a
>>> straight answer from with regards to memory.  You need to perform  
>>> a GC first
>>> manually - the actual usage is the amount it reports used post- 
>>> GC.  You can
>>> get these details by using JMX.
>>>
>>>
>> Will compressed pointers be used as standard in future versions of  
>> Hadoop?
>>
>
> It's a Sun JVM setting, not something Hadoop will control.  You'd  
> have to turn it on in hadoop-env.sh.
>
> Brian


Re: Datanode high memory usage

Posted by Brian Bockelman <bb...@cse.unl.edu>.
On Sep 1, 2009, at 1:58 PM, Stas Oskin wrote:

> Hi.
>
>
>> With regards to memory, have you tried the compressed pointers JDK  
>> option
>> (we saw great benefits on the NN)?  Java is incredibly hard to get a
>> straight answer from with regards to memory.  You need to perform a  
>> GC first
>> manually - the actual usage is the amount it reports used post-GC.   
>> You can
>> get these details by using JMX.
>>
>>
> Will compressed pointers be used as standard in future versions of  
> Hadoop?
>

It's a Sun JVM setting, not something Hadoop will control.  You'd have  
to turn it on in hadoop-env.sh.

Brian

Re: Datanode high memory usage

Posted by Stas Oskin <st...@gmail.com>.
Hi.


> With regards to memory, have you tried the compressed pointers JDK option
> (we saw great benefits on the NN)?  Java is incredibly hard to get a
> straight answer from with regards to memory.  You need to perform a GC first
> manually - the actual usage is the amount it reports used post-GC.  You can
> get these details by using JMX.
>
>
Will compressed pointers be used as standard in future versions of Hadoop?

Regards.

Re: Datanode high memory usage

Posted by Brian Bockelman <bb...@cse.unl.edu>.
Hey Mafish,

If you are getting 1-2m blocks on a single datanode, you'll have many  
other problems - especially with regards to periodic block reports.

With regards to memory, have you tried the compressed pointers JDK  
option (we saw great benefits on the NN)?  Java is incredibly hard to  
get a straight answer from with regards to memory.  You need to  
perform a GC first manually - the actual usage is the amount it  
reports used post-GC.  You can get these details by using JMX.

Brian

On Sep 1, 2009, at 4:08 AM, Mafish Liu wrote:

> Both NameNode and DataNode will be affected by number of files  
> greatly.
> In my test, almost 60% memory are used in datanodes while storing 1m
> files, and the value reach 80% with 2m files.
> My test best is with 5 nodes, 1 namenode and 4 datanodes. All nodes
> have 2GB memory and replication is 3.
>
> 2009/9/1 Stas Oskin <st...@gmail.com>:
>> Hi.
>>
>> 2009/9/1 Mafish Liu <ma...@gmail.com>
>>
>>> Did you have many small files in your system?
>>>
>>>
>> Yes, quite plenty.
>>
>> But this should influence the Namenode, and not the Datanode,  
>> correct?
>>
>> Regards.
>>
>
>
>
> -- 
> Mafish@gmail.com


Re: Datanode high memory usage

Posted by Mafish Liu <ma...@gmail.com>.
2009/9/1 Mafish Liu <ma...@gmail.com>:
> Both NameNode and DataNode will be affected by number of files greatly.
> In my test, almost 60% memory are used in datanodes while storing 1m
> files, and the value reach 80% with 2m files.
> My test best is with 5 nodes, 1 namenode and 4 datanodes. All nodes
~~~~test bed
> have 2GB memory and replication is 3.
>
> 2009/9/1 Stas Oskin <st...@gmail.com>:
>> Hi.
>>
>> 2009/9/1 Mafish Liu <ma...@gmail.com>
>>
>>> Did you have many small files in your system?
>>>
>>>
>> Yes, quite plenty.
>>
>> But this should influence the Namenode, and not the Datanode, correct?
>>
>> Regards.
>>
>
>
>
> --
> Mafish@gmail.com
>



-- 
Mafish@gmail.com

Re: Datanode high memory usage

Posted by Mafish Liu <ma...@gmail.com>.
Both NameNode and DataNode will be affected by number of files greatly.
In my test, almost 60% memory are used in datanodes while storing 1m
files, and the value reach 80% with 2m files.
My test best is with 5 nodes, 1 namenode and 4 datanodes. All nodes
have 2GB memory and replication is 3.

2009/9/1 Stas Oskin <st...@gmail.com>:
> Hi.
>
> 2009/9/1 Mafish Liu <ma...@gmail.com>
>
>> Did you have many small files in your system?
>>
>>
> Yes, quite plenty.
>
> But this should influence the Namenode, and not the Datanode, correct?
>
> Regards.
>



-- 
Mafish@gmail.com

Re: Datanode high memory usage

Posted by Stas Oskin <st...@gmail.com>.
Hi.

2009/9/1 Mafish Liu <ma...@gmail.com>

> Did you have many small files in your system?
>
>
Yes, quite plenty.

But this should influence the Namenode, and not the Datanode, correct?

Regards.

Re: Datanode high memory usage

Posted by Mafish Liu <ma...@gmail.com>.
Did you have many small files in your system?

2009/9/1 Stas Oskin <st...@gmail.com>:
> Hi.
>
>
>> <property>
>>  <name>mapred.child.java.opts</name>
>>  <value>
>>     -Xmx512M
>>  </value>
>> </property>
>>
>>
> This has effect even if I not using any reduce tasks?
>
> Regards.
>



-- 
Mafish@gmail.com

Re: Datanode high memory usage

Posted by Stas Oskin <st...@gmail.com>.
Hi.


> <property>
>  <name>mapred.child.java.opts</name>
>  <value>
>     -Xmx512M
>  </value>
> </property>
>
>
This has effect even if I not using any reduce tasks?

Regards.

Re: Datanode high memory usage

Posted by Jim Twensky <ji...@gmail.com>.
The maximum and minimum amount of memory to be used by the task
trackers can be specified inside the configuration files under conf.
For instance, in order to allocate a maximum of 512 MB, you need to
set:

<property>
  <name>mapred.child.java.opts</name>
  <value>
     -Xmx512M
  </value>
</property>

Hope that helps.

Jim

On Mon, Aug 31, 2009 at 9:07 AM, Stas Oskin<st...@gmail.com> wrote:
> Hi.
>
>
> I think what you see is reduce task, because in reduce task, you have three
>> steps:
>> copy , sort, and reudce.  The copy and  sort steps may cost a lot of
>> memory.
>>
>>
>>
> Nope, I just running the Datanode and copying files to HDFS - no reduce
> tasks are running.
>
> How typically large is a standard Datanode?
>
> Regards.
>

Re: Datanode high memory usage

Posted by Stas Oskin <st...@gmail.com>.
Hi.


I think what you see is reduce task, because in reduce task, you have three
> steps:
> copy , sort, and reudce.  The copy and  sort steps may cost a lot of
> memory.
>
>
>
Nope, I just running the Datanode and copying files to HDFS - no reduce
tasks are running.

How typically large is a standard Datanode?

Regards.

Re: Datanode high memory usage

Posted by zhang jianfeng <zj...@gmail.com>.
I think what you see is reduce task, because in reduce task, you have three
steps:
copy , sort, and reudce.  The copy and  sort steps may cost a lot of memory.



2009/8/31 Stas Oskin <st...@gmail.com>

> Hi.
>
>
> > What does 700MB represent for ? total memory usage of OS or only the task
> > process.
> >
> >
> The Datanode task process - I'm running just it to find out how actually
> RAM
> it takes.
>
> Regards.
>

Re: Datanode high memory usage

Posted by Stas Oskin <st...@gmail.com>.
Hi.


> What does 700MB represent for ? total memory usage of OS or only the task
> process.
>
>
The Datanode task process - I'm running just it to find out how actually RAM
it takes.

Regards.

RE: Datanode high memory usage

Posted by zjffdu <zj...@gmail.com>.
Hi Stas,

What does 700MB represent for ? total memory usage of OS or only the task
process.

There three process related to hadoop on datanode. 

1. DataNode deamon for DFS (only one)
2. TaskTracker for MapReduce (only one)
3. Map Task or Reduce Task (several tasks on one machine, depending on your
configration)

 

-----Original Message-----
From: Stas Oskin [mailto:stas.oskin@gmail.com] 
Sent: 2009年8月31日 3:41
To: core-user@hadoop.apache.org
Subject: Datanode high memory usage

Hi.

I measured the Datanode memory usage, and noticed they take up to 700 MB of
RAM.

As their main job is to store files to disk, any idea why they take so much
RAM?

Thanks for any information.


Re: Datanode high memory usage

Posted by Stas Oskin <st...@gmail.com>.
Hi.


>>
> Resident, shared, or virtual?  Unix memory management is not
> straightforward; the worst thing you can do is look at the virtual memory
> size of the java process and assume that's how much RAM it is using.
>
>
I'm using a tool called ps_mem.py to measure total memory taken. It usually
correct.

Regards.

Re: Datanode high memory usage

Posted by Brian Bockelman <bb...@cse.unl.edu>.
On Sep 1, 2009, at 2:02 PM, Stas Oskin wrote:

> Hi.
>
> What does 'up to 700MB' mean? Is it JVM's virtual memory? resident  
> memory?
>> or java heap in use?
>>
>
> 700 MB is what taken by overall java process.
>

Resident, shared, or virtual?  Unix memory management is not  
straightforward; the worst thing you can do is look at the virtual  
memory size of the java process and assume that's how much RAM it is  
using.

IIRC, the JVM will pre-allocate enough virtual memory to allocate its  
entire heap (so the virtual size will be large), but the heap won't  
necessarily get that large.  The JVM in server mode won't be  
aggressive about GC unless it is pressed for memory - i.e., if you  
give it a 512MB heap, then it will possibly use a good hunk of it  
before doing a GC run.  On top of the heap, the JVM has several other  
memory buffers for compiled code and the like.

Brian

>
>>
>> How many blocks to you have? For an idle DN, most of the memory is  
>> taken by
>> block info structures. It does not really optimize for it.. May be  
>> about 1k
>> per block is the upper limit.
>>
>
>
> Here are the details from NN ui:
>
> *160537 files and directories, 144118 blocks = 304655 total. Heap  
> Size is
> 155.42 MB / 966.69 MB (16%)*
>
> Thanks.


Re: Datanode high memory usage

Posted by Stas Oskin <st...@gmail.com>.
Hi.

What does 'up to 700MB' mean? Is it JVM's virtual memory? resident memory?
> or java heap in use?
>

700 MB is what taken by overall java process.


>
> How many blocks to you have? For an idle DN, most of the memory is taken by
> block info structures. It does not really optimize for it.. May be about 1k
> per block is the upper limit.
>


Here are the details from NN ui:

*160537 files and directories, 144118 blocks = 304655 total. Heap Size is
155.42 MB / 966.69 MB (16%)*

Thanks.

Re: Datanode high memory usage

Posted by Raghu Angadi <ra...@yahoo-inc.com>.
I think this thread is moving in all the possible directions... without 
many details on original problem.

There is no need to speculate on where the memory goes you can run 'jmap 
-histo:live' and 'jmap -heap' to get much better idea.

What does 'up to 700MB' mean? Is it JVM's virtual memory? resident 
memory? or java heap in use?

How many blocks to you have? For an idle DN, most of the memory is taken 
by block info structures. It does not really optimize for it.. May be 
about 1k per block is the upper limit.

Raghu.

Stas Oskin wrote:
> Hi.
> 
> I measured the Datanode memory usage, and noticed they take up to 700 MB of
> RAM.
> 
> As their main job is to store files to disk, any idea why they take so much
> RAM?
> 
> Thanks for any information.
> 


Re: Datanode high memory usage

Posted by Stas Oskin <st...@gmail.com>.
Hi.

The datanode would be using the major part of memory to do following-
> a. Continuously (at regular interval) send heartbeat messages to namenode
> to
> say 'I am live and awake'
> b. In case, any data/file is added to DFS, OR Map Reduce jobs are running,
> datanode would again be talking to namenode or transferring data from its
> local copy to either client or other slave datanodes in the cluster
>
>
These operations don't seem as quite memory intensive, any idea why DN will
take most of the memory?

Regards.

Re: Datanode high memory usage

Posted by indoos <in...@gmail.com>.
Hi,
The recommended RAM for namenode,datanode, jobtracker and tasktracker is 1
GB.
The datanode would be using the major part of memory to do following-
a. Continuously (at regular interval) send heartbeat messages to namenode to
say 'I am live and awake'
b. In case, any data/file is added to DFS, OR Map Reduce jobs are running,
datanode would again be talking to namenode or transferring data from its
local copy to either client or other slave datanodes in the cluster

-Sanjay
 

Stas Oskin-2 wrote:
> 
> Hi.
> 
> I measured the Datanode memory usage, and noticed they take up to 700 MB
> of
> RAM.
> 
> As their main job is to store files to disk, any idea why they take so
> much
> RAM?
> 
> Thanks for any information.
> 
> 

-- 
View this message in context: http://www.nabble.com/Datanode-high-memory-usage-tp25221400p25243059.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.