You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Mafish Liu <ma...@gmail.com> on 2009/09/01 03:40:53 UTC

Re: Datanode high memory usage

Did you have many small files in your system?

2009/9/1 Stas Oskin <st...@gmail.com>:
> Hi.
>
>
>> <property>
>>  <name>mapred.child.java.opts</name>
>>  <value>
>>     -Xmx512M
>>  </value>
>> </property>
>>
>>
> This has effect even if I not using any reduce tasks?
>
> Regards.
>



-- 
Mafish@gmail.com

Re: Datanode high memory usage

Posted by Stas Oskin <st...@gmail.com>.

Hi.

Again, experimentation is needed. All we can say is that the datastructures
> get smaller, but not as small as 32-bit systems, as every instance is
> aligned to 8-byte/64-bit boundaries.
>


Ok, thanks - will check and see if/how it reduces the used memory.

Regards.

Re: Datanode high memory usage

Posted by Steve Loughran <st...@apache.org>.

Stas Oskin wrote:
> Hi.
> 
>> It would be nice if Java 6 had a way of switching compressed pointers on by
>> default -the way JRockit 64 bit did. Right now you have to edit every shell
>> script to start up every program,  hadoop included.  Maybe when jdk7 ships
>> it will do this by default.
>>
> 
> Does it give any memory benefits on Datanode as well, in addition to
> Namenode?

All we have so far is in  https://issues.apache.org/jira/browse/HDFS-559

If you are willing to work out the memory savings in the datanode, with 
different block sizes and file counts, it would be welcome.

> 
>  Also, just how much RAM is gained by this setting?
> 

Again, experimentation is needed. All we can say is that the 
datastructures get smaller, but not as small as 32-bit systems, as every 
instance is aligned to 8-byte/64-bit boundaries.

Re: Datanode high memory usage

Posted by Stas Oskin <st...@gmail.com>.

Hi.

>
> It would be nice if Java 6 had a way of switching compressed pointers on by
> default -the way JRockit 64 bit did. Right now you have to edit every shell
> script to start up every program,  hadoop included.  Maybe when jdk7 ships
> it will do this by default.
>

Does it give any memory benefits on Datanode as well, in addition to
Namenode?

 Also, just how much RAM is gained by this setting?

Re: Datanode high memory usage

Posted by Steve Loughran <st...@apache.org>.

Allen Wittenauer wrote:
> On 9/2/09 3:49 AM, "Stas Oskin" <st...@gmail.com> wrote:
>>> It's a Sun JVM setting, not something Hadoop will control.  You'd have to
>>> turn it on in hadoop-env.sh.
>>>
>>>
>> Question is, if Hadoop will include this as standard,  if it indeed has such
>> benefits.

We can't do this as then if you try and bring up Hadoop on a VM without 
this option (currently, all OS/X JVMs), your java program will not start.

> 
> Hadoop doesn't have a -standard- here, it has a -default-.  JVM settings are
> one of those things that should just automatically be expected to be
> adjusted on a per installation basis.  It is pretty much impossible to get
> it correct for everyone.  [Thanks Java. :( ]
> 

It would be nice if Java 6 had a way of switching compressed pointers on 
by default -the way JRockit 64 bit did. Right now you have to edit every 
shell script to start up every program,  hadoop included.  Maybe when 
jdk7 ships it will do this by default.

Re: Datanode high memory usage

Posted by Allen Wittenauer <aw...@linkedin.com>.

On 9/2/09 3:49 AM, "Stas Oskin" <st...@gmail.com> wrote:
>> It's a Sun JVM setting, not something Hadoop will control.  You'd have to
>> turn it on in hadoop-env.sh.
>> 
>> 
> Question is, if Hadoop will include this as standard,  if it indeed has such
> benefits.

Hadoop doesn't have a -standard- here, it has a -default-.  JVM settings are
one of those things that should just automatically be expected to be
adjusted on a per installation basis.  It is pretty much impossible to get
it correct for everyone.  [Thanks Java. :( ]

Re: Datanode high memory usage

Posted by Stas Oskin <st...@gmail.com>.

Hi.


>>
> It's a Sun JVM setting, not something Hadoop will control.  You'd have to
> turn it on in hadoop-env.sh.
>
>
Question is, if Hadoop will include this as standard,  if it indeed has such
benefits.

Regards.

Re: Datanode high memory usage

Posted by Bryan Talbot <bt...@aeriagames.com>.

For info on newer JDK support for compressed oops, see http://java.sun.com/javase/6/webnotes/6u14.html 
  and http://wikis.sun.com/display/HotSpotInternals/CompressedOops


-Bryan




On Sep 1, 2009, at Sep 1, 12:21 PM, Brian Bockelman wrote:

>
> On Sep 1, 2009, at 1:58 PM, Stas Oskin wrote:
>
>> Hi.
>>
>>
>>> With regards to memory, have you tried the compressed pointers JDK  
>>> option
>>> (we saw great benefits on the NN)?  Java is incredibly hard to get a
>>> straight answer from with regards to memory.  You need to perform  
>>> a GC first
>>> manually - the actual usage is the amount it reports used post- 
>>> GC.  You can
>>> get these details by using JMX.
>>>
>>>
>> Will compressed pointers be used as standard in future versions of  
>> Hadoop?
>>
>
> It's a Sun JVM setting, not something Hadoop will control.  You'd  
> have to turn it on in hadoop-env.sh.
>
> Brian

Re: Datanode high memory usage

Posted by Brian Bockelman <bb...@cse.unl.edu>.

On Sep 1, 2009, at 1:58 PM, Stas Oskin wrote:

> Hi.
>
>
>> With regards to memory, have you tried the compressed pointers JDK  
>> option
>> (we saw great benefits on the NN)?  Java is incredibly hard to get a
>> straight answer from with regards to memory.  You need to perform a  
>> GC first
>> manually - the actual usage is the amount it reports used post-GC.   
>> You can
>> get these details by using JMX.
>>
>>
> Will compressed pointers be used as standard in future versions of  
> Hadoop?
>

It's a Sun JVM setting, not something Hadoop will control.  You'd have  
to turn it on in hadoop-env.sh.

Brian

Re: Datanode high memory usage

Posted by Stas Oskin <st...@gmail.com>.

Hi.


> With regards to memory, have you tried the compressed pointers JDK option
> (we saw great benefits on the NN)?  Java is incredibly hard to get a
> straight answer from with regards to memory.  You need to perform a GC first
> manually - the actual usage is the amount it reports used post-GC.  You can
> get these details by using JMX.
>
>
Will compressed pointers be used as standard in future versions of Hadoop?

Regards.

Re: Datanode high memory usage

Posted by Brian Bockelman <bb...@cse.unl.edu>.

Hey Mafish,

If you are getting 1-2m blocks on a single datanode, you'll have many  
other problems - especially with regards to periodic block reports.

With regards to memory, have you tried the compressed pointers JDK  
option (we saw great benefits on the NN)?  Java is incredibly hard to  
get a straight answer from with regards to memory.  You need to  
perform a GC first manually - the actual usage is the amount it  
reports used post-GC.  You can get these details by using JMX.

Brian

On Sep 1, 2009, at 4:08 AM, Mafish Liu wrote:

> Both NameNode and DataNode will be affected by number of files  
> greatly.
> In my test, almost 60% memory are used in datanodes while storing 1m
> files, and the value reach 80% with 2m files.
> My test best is with 5 nodes, 1 namenode and 4 datanodes. All nodes
> have 2GB memory and replication is 3.
>
> 2009/9/1 Stas Oskin <st...@gmail.com>:
>> Hi.
>>
>> 2009/9/1 Mafish Liu <ma...@gmail.com>
>>
>>> Did you have many small files in your system?
>>>
>>>
>> Yes, quite plenty.
>>
>> But this should influence the Namenode, and not the Datanode,  
>> correct?
>>
>> Regards.
>>
>
>
>
> -- 
> Mafish@gmail.com

Re: Datanode high memory usage

Posted by Mafish Liu <ma...@gmail.com>.

2009/9/1 Mafish Liu <ma...@gmail.com>:
> Both NameNode and DataNode will be affected by number of files greatly.
> In my test, almost 60% memory are used in datanodes while storing 1m
> files, and the value reach 80% with 2m files.
> My test best is with 5 nodes, 1 namenode and 4 datanodes. All nodes
~~~~test bed
> have 2GB memory and replication is 3.
>
> 2009/9/1 Stas Oskin <st...@gmail.com>:
>> Hi.
>>
>> 2009/9/1 Mafish Liu <ma...@gmail.com>
>>
>>> Did you have many small files in your system?
>>>
>>>
>> Yes, quite plenty.
>>
>> But this should influence the Namenode, and not the Datanode, correct?
>>
>> Regards.
>>
>
>
>
> --
> Mafish@gmail.com
>



-- 
Mafish@gmail.com

Re: Datanode high memory usage

Posted by Mafish Liu <ma...@gmail.com>.

Both NameNode and DataNode will be affected by number of files greatly.
In my test, almost 60% memory are used in datanodes while storing 1m
files, and the value reach 80% with 2m files.
My test best is with 5 nodes, 1 namenode and 4 datanodes. All nodes
have 2GB memory and replication is 3.

2009/9/1 Stas Oskin <st...@gmail.com>:
> Hi.
>
> 2009/9/1 Mafish Liu <ma...@gmail.com>
>
>> Did you have many small files in your system?
>>
>>
> Yes, quite plenty.
>
> But this should influence the Namenode, and not the Datanode, correct?
>
> Regards.
>



-- 
Mafish@gmail.com

Re: Datanode high memory usage

Posted by Stas Oskin <st...@gmail.com>.

Hi.

2009/9/1 Mafish Liu <ma...@gmail.com>

> Did you have many small files in your system?
>
>
Yes, quite plenty.

But this should influence the Namenode, and not the Datanode, correct?

Regards.