You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Jonathan Aquilina <ja...@eagleeyet.net> on 2015/02/22 07:22:13 UTC

Re: How can I get the memory usage in Namenode and Datanode?

 

I am rather new to hadoop, but wouldnt the difference be potentially in
how the files are split in terms of size? 

---
Regards,
Jonathan Aquilina
Founder Eagle Eye T

On 2015-02-21 21:54, Fang Zhou wrote: 

> Hi All,
> 
> I want to test the memory usage on Namenode and Datanode.
> 
> I try to use jmap, jstat, proc/pid/stat, top, ps aux, and Hadoop website interface to check the memory.
> The values I get from them are different. I also found that the memory always changes periodically.
> This is the first thing confused me.
> 
> I thought the more files stored in Namenode, the more memory usage in Namenode and Datanode.
> I also thought the memory used in Namenode should be larger than the memory used in each Datanode.
> However, some results show my ideas are wrong.
> For example, I test the memory usage of Namenode with 6000 and 1000 files.
> The "6000" memory is less than "1000" memory from jmap's results. 
> I also found that the memory usage in Datanode is larger than the memory used in Namenode.
> 
> I really don't know how to get the memory usage in Namenode and Datanode.
> 
> Can anyone give me some advices?
> 
> Thanks,
> Tim

Re: How can I get the memory usage in Namenode and Datanode?

Posted by Fang Zhou <ti...@gmail.com>.

Thank you for your sharing.

Appreciate.

Tim

> On Feb 22, 2015, at 1:23 AM, Jonathan Aquilina <ja...@eagleeyet.net> wrote:
> 
> Hi Tim,
> 
> Not sure if this might be of any use in terms of improving overall cluster performance for you, but I hope that it might shed some ideas for you and others.
> 
> https://media.amazonwebservices.com/AWS_Amazon_EMR_Best_Practices.pdf
> 
>  
> ---
> Regards,
> Jonathan Aquilina
> Founder Eagle Eye T
> On 2015-02-22 07:57, Tim Chou wrote:
> 
>> Hi Jonathan,
>>  
>> Very useful information. I will look at the ganglia.
>>  
>> However, I do not have the administrative privilege for the cluster. I don't know if I can install Ganglia in the cluster.
>>  
>> Thank you for your information.
>>  
>> Best,
>> Tim
>> 
>> 2015-02-22 0:53 GMT-06:00 Jonathan Aquilina <jaquilina@eagleeyet.net <ma...@eagleeyet.net>>:
>> Where I am working we are working on transient cluster (temporary) using Amazon EMR. When I was reading up on how things work they suggested for monitoring to use ganglia to monitor memory usage and network usage etc. That way depending on how things are setup be it using an amazon s3 bucket for example and pulling data directly into the cluster the network link will always be saturated to ensure a constant flow of data.
>> 
>> What I am suggesting is potentially looking at ganglia.
>> 
>>  
>> ---
>> Regards,
>> Jonathan Aquilina
>> Founder Eagle Eye T
>> On 2015-02-22 07:42, Fang Zhou wrote:
>> 
>> Hi Jonathan,
>>  
>> Thank you.
>>  
>> The number of files impact on the memory usage in Namenode.
>>  
>> I just want to get the real memory usage situation in Namenode.
>>  
>> The memory used in heap always changes so that I have no idea about which value is the right one.
>>  
>> Thanks,
>> Tim
>> 
>> On Feb 22, 2015, at 12:22 AM, Jonathan Aquilina <jaquilina@eagleeyet.net <ma...@eagleeyet.net>> wrote:
>> 
>> I am rather new to hadoop, but wouldnt the difference be potentially in how the files are split in terms of size?
>> 
>>  
>> ---
>> Regards,
>> Jonathan Aquilina
>> Founder Eagle Eye T
>> On 2015-02-21 21:54, Fang Zhou wrote:
>> 
>> Hi All,
>> 
>> I want to test the memory usage on Namenode and Datanode.
>> 
>> I try to use jmap, jstat, proc/pid/stat, top, ps aux, and Hadoop website interface to check the memory.
>> The values I get from them are different. I also found that the memory always changes periodically.
>> This is the first thing confused me.
>> 
>> I thought the more files stored in Namenode, the more memory usage in Namenode and Datanode.
>> I also thought the memory used in Namenode should be larger than the memory used in each Datanode.
>> However, some results show my ideas are wrong.
>> For example, I test the memory usage of Namenode with 6000 and 1000 files.
>> The "6000" memory is less than "1000" memory from jmap's results. 
>> I also found that the memory usage in Datanode is larger than the memory used in Namenode.
>> 
>> I really don't know how to get the memory usage in Namenode and Datanode.
>> 
>> Can anyone give me some advices?
>> 
>> Thanks,
>> Tim

Re: How can I get the memory usage in Namenode and Datanode?

Posted by Fang Zhou <ti...@gmail.com>.

Thank you for your sharing.

Appreciate.

Tim

> On Feb 22, 2015, at 1:23 AM, Jonathan Aquilina <ja...@eagleeyet.net> wrote:
> 
> Hi Tim,
> 
> Not sure if this might be of any use in terms of improving overall cluster performance for you, but I hope that it might shed some ideas for you and others.
> 
> https://media.amazonwebservices.com/AWS_Amazon_EMR_Best_Practices.pdf
> 
>  
> ---
> Regards,
> Jonathan Aquilina
> Founder Eagle Eye T
> On 2015-02-22 07:57, Tim Chou wrote:
> 
>> Hi Jonathan,
>>  
>> Very useful information. I will look at the ganglia.
>>  
>> However, I do not have the administrative privilege for the cluster. I don't know if I can install Ganglia in the cluster.
>>  
>> Thank you for your information.
>>  
>> Best,
>> Tim
>> 
>> 2015-02-22 0:53 GMT-06:00 Jonathan Aquilina <jaquilina@eagleeyet.net <ma...@eagleeyet.net>>:
>> Where I am working we are working on transient cluster (temporary) using Amazon EMR. When I was reading up on how things work they suggested for monitoring to use ganglia to monitor memory usage and network usage etc. That way depending on how things are setup be it using an amazon s3 bucket for example and pulling data directly into the cluster the network link will always be saturated to ensure a constant flow of data.
>> 
>> What I am suggesting is potentially looking at ganglia.
>> 
>>  
>> ---
>> Regards,
>> Jonathan Aquilina
>> Founder Eagle Eye T
>> On 2015-02-22 07:42, Fang Zhou wrote:
>> 
>> Hi Jonathan,
>>  
>> Thank you.
>>  
>> The number of files impact on the memory usage in Namenode.
>>  
>> I just want to get the real memory usage situation in Namenode.
>>  
>> The memory used in heap always changes so that I have no idea about which value is the right one.
>>  
>> Thanks,
>> Tim
>> 
>> On Feb 22, 2015, at 12:22 AM, Jonathan Aquilina <jaquilina@eagleeyet.net <ma...@eagleeyet.net>> wrote:
>> 
>> I am rather new to hadoop, but wouldnt the difference be potentially in how the files are split in terms of size?
>> 
>>  
>> ---
>> Regards,
>> Jonathan Aquilina
>> Founder Eagle Eye T
>> On 2015-02-21 21:54, Fang Zhou wrote:
>> 
>> Hi All,
>> 
>> I want to test the memory usage on Namenode and Datanode.
>> 
>> I try to use jmap, jstat, proc/pid/stat, top, ps aux, and Hadoop website interface to check the memory.
>> The values I get from them are different. I also found that the memory always changes periodically.
>> This is the first thing confused me.
>> 
>> I thought the more files stored in Namenode, the more memory usage in Namenode and Datanode.
>> I also thought the memory used in Namenode should be larger than the memory used in each Datanode.
>> However, some results show my ideas are wrong.
>> For example, I test the memory usage of Namenode with 6000 and 1000 files.
>> The "6000" memory is less than "1000" memory from jmap's results. 
>> I also found that the memory usage in Datanode is larger than the memory used in Namenode.
>> 
>> I really don't know how to get the memory usage in Namenode and Datanode.
>> 
>> Can anyone give me some advices?
>> 
>> Thanks,
>> Tim

Re: How can I get the memory usage in Namenode and Datanode?

Posted by Fang Zhou <ti...@gmail.com>.

Thank you for your sharing.

Appreciate.

Tim

> On Feb 22, 2015, at 1:23 AM, Jonathan Aquilina <ja...@eagleeyet.net> wrote:
> 
> Hi Tim,
> 
> Not sure if this might be of any use in terms of improving overall cluster performance for you, but I hope that it might shed some ideas for you and others.
> 
> https://media.amazonwebservices.com/AWS_Amazon_EMR_Best_Practices.pdf
> 
>  
> ---
> Regards,
> Jonathan Aquilina
> Founder Eagle Eye T
> On 2015-02-22 07:57, Tim Chou wrote:
> 
>> Hi Jonathan,
>>  
>> Very useful information. I will look at the ganglia.
>>  
>> However, I do not have the administrative privilege for the cluster. I don't know if I can install Ganglia in the cluster.
>>  
>> Thank you for your information.
>>  
>> Best,
>> Tim
>> 
>> 2015-02-22 0:53 GMT-06:00 Jonathan Aquilina <jaquilina@eagleeyet.net <ma...@eagleeyet.net>>:
>> Where I am working we are working on transient cluster (temporary) using Amazon EMR. When I was reading up on how things work they suggested for monitoring to use ganglia to monitor memory usage and network usage etc. That way depending on how things are setup be it using an amazon s3 bucket for example and pulling data directly into the cluster the network link will always be saturated to ensure a constant flow of data.
>> 
>> What I am suggesting is potentially looking at ganglia.
>> 
>>  
>> ---
>> Regards,
>> Jonathan Aquilina
>> Founder Eagle Eye T
>> On 2015-02-22 07:42, Fang Zhou wrote:
>> 
>> Hi Jonathan,
>>  
>> Thank you.
>>  
>> The number of files impact on the memory usage in Namenode.
>>  
>> I just want to get the real memory usage situation in Namenode.
>>  
>> The memory used in heap always changes so that I have no idea about which value is the right one.
>>  
>> Thanks,
>> Tim
>> 
>> On Feb 22, 2015, at 12:22 AM, Jonathan Aquilina <jaquilina@eagleeyet.net <ma...@eagleeyet.net>> wrote:
>> 
>> I am rather new to hadoop, but wouldnt the difference be potentially in how the files are split in terms of size?
>> 
>>  
>> ---
>> Regards,
>> Jonathan Aquilina
>> Founder Eagle Eye T
>> On 2015-02-21 21:54, Fang Zhou wrote:
>> 
>> Hi All,
>> 
>> I want to test the memory usage on Namenode and Datanode.
>> 
>> I try to use jmap, jstat, proc/pid/stat, top, ps aux, and Hadoop website interface to check the memory.
>> The values I get from them are different. I also found that the memory always changes periodically.
>> This is the first thing confused me.
>> 
>> I thought the more files stored in Namenode, the more memory usage in Namenode and Datanode.
>> I also thought the memory used in Namenode should be larger than the memory used in each Datanode.
>> However, some results show my ideas are wrong.
>> For example, I test the memory usage of Namenode with 6000 and 1000 files.
>> The "6000" memory is less than "1000" memory from jmap's results. 
>> I also found that the memory usage in Datanode is larger than the memory used in Namenode.
>> 
>> I really don't know how to get the memory usage in Namenode and Datanode.
>> 
>> Can anyone give me some advices?
>> 
>> Thanks,
>> Tim

Re: How can I get the memory usage in Namenode and Datanode?

Posted by Fang Zhou <ti...@gmail.com>.

Thank you for your sharing.

Appreciate.

Tim

> On Feb 22, 2015, at 1:23 AM, Jonathan Aquilina <ja...@eagleeyet.net> wrote:
> 
> Hi Tim,
> 
> Not sure if this might be of any use in terms of improving overall cluster performance for you, but I hope that it might shed some ideas for you and others.
> 
> https://media.amazonwebservices.com/AWS_Amazon_EMR_Best_Practices.pdf
> 
>  
> ---
> Regards,
> Jonathan Aquilina
> Founder Eagle Eye T
> On 2015-02-22 07:57, Tim Chou wrote:
> 
>> Hi Jonathan,
>>  
>> Very useful information. I will look at the ganglia.
>>  
>> However, I do not have the administrative privilege for the cluster. I don't know if I can install Ganglia in the cluster.
>>  
>> Thank you for your information.
>>  
>> Best,
>> Tim
>> 
>> 2015-02-22 0:53 GMT-06:00 Jonathan Aquilina <jaquilina@eagleeyet.net <ma...@eagleeyet.net>>:
>> Where I am working we are working on transient cluster (temporary) using Amazon EMR. When I was reading up on how things work they suggested for monitoring to use ganglia to monitor memory usage and network usage etc. That way depending on how things are setup be it using an amazon s3 bucket for example and pulling data directly into the cluster the network link will always be saturated to ensure a constant flow of data.
>> 
>> What I am suggesting is potentially looking at ganglia.
>> 
>>  
>> ---
>> Regards,
>> Jonathan Aquilina
>> Founder Eagle Eye T
>> On 2015-02-22 07:42, Fang Zhou wrote:
>> 
>> Hi Jonathan,
>>  
>> Thank you.
>>  
>> The number of files impact on the memory usage in Namenode.
>>  
>> I just want to get the real memory usage situation in Namenode.
>>  
>> The memory used in heap always changes so that I have no idea about which value is the right one.
>>  
>> Thanks,
>> Tim
>> 
>> On Feb 22, 2015, at 12:22 AM, Jonathan Aquilina <jaquilina@eagleeyet.net <ma...@eagleeyet.net>> wrote:
>> 
>> I am rather new to hadoop, but wouldnt the difference be potentially in how the files are split in terms of size?
>> 
>>  
>> ---
>> Regards,
>> Jonathan Aquilina
>> Founder Eagle Eye T
>> On 2015-02-21 21:54, Fang Zhou wrote:
>> 
>> Hi All,
>> 
>> I want to test the memory usage on Namenode and Datanode.
>> 
>> I try to use jmap, jstat, proc/pid/stat, top, ps aux, and Hadoop website interface to check the memory.
>> The values I get from them are different. I also found that the memory always changes periodically.
>> This is the first thing confused me.
>> 
>> I thought the more files stored in Namenode, the more memory usage in Namenode and Datanode.
>> I also thought the memory used in Namenode should be larger than the memory used in each Datanode.
>> However, some results show my ideas are wrong.
>> For example, I test the memory usage of Namenode with 6000 and 1000 files.
>> The "6000" memory is less than "1000" memory from jmap's results. 
>> I also found that the memory usage in Datanode is larger than the memory used in Namenode.
>> 
>> I really don't know how to get the memory usage in Namenode and Datanode.
>> 
>> Can anyone give me some advices?
>> 
>> Thanks,
>> Tim

Re: How can I get the memory usage in Namenode and Datanode?

Posted by Jonathan Aquilina <ja...@eagleeyet.net>.

 

Hi Tim, 

Not sure if this might be of any use in terms of improving overall
cluster performance for you, but I hope that it might shed some ideas
for you and others. 

https://media.amazonwebservices.com/AWS_Amazon_EMR_Best_Practices.pdf 

---
Regards,
Jonathan Aquilina
Founder Eagle Eye T

On 2015-02-22 07:57, Tim Chou wrote: 

> Hi Jonathan, 
> 
> Very useful information. I will look at the ganglia. 
> 
> However, I do not have the administrative privilege for the cluster. I don't know if I can install Ganglia in the cluster. 
> 
> Thank you for your information. 
> 
> Best, 
> Tim 
> 
> 2015-02-22 0:53 GMT-06:00 Jonathan Aquilina <ja...@eagleeyet.net>:
> 
> Where I am working we are working on transient cluster (temporary) using Amazon EMR. When I was reading up on how things work they suggested for monitoring to use ganglia to monitor memory usage and network usage etc. That way depending on how things are setup be it using an amazon s3 bucket for example and pulling data directly into the cluster the network link will always be saturated to ensure a constant flow of data. 
> 
> What I am suggesting is potentially looking at ganglia. 
> 
> ---
> Regards,
> Jonathan Aquilina
> Founder Eagle Eye T
> 
> On 2015-02-22 07:42, Fang Zhou wrote: Hi Jonathan, 
> 
> Thank you. 
> 
> The number of files impact on the memory usage in Namenode. 
> 
> I just want to get the real memory usage situation in Namenode. 
> 
> The memory used in heap always changes so that I have no idea about which value is the right one. 
> 
> Thanks, 
> Tim 
> 
> On Feb 22, 2015, at 12:22 AM, Jonathan Aquilina <ja...@eagleeyet.net> wrote: 
> 
> I am rather new to hadoop, but wouldnt the difference be potentially in how the files are split in terms of size? 
> 
> ---
> Regards,
> Jonathan Aquilina
> Founder Eagle Eye T
> 
> On 2015-02-21 21:54, Fang Zhou wrote: 
> 
> Hi All,
> 
> I want to test the memory usage on Namenode and Datanode.
> 
> I try to use jmap, jstat, proc/pid/stat, top, ps aux, and Hadoop website interface to check the memory.
> The values I get from them are different. I also found that the memory always changes periodically.
> This is the first thing confused me.
> 
> I thought the more files stored in Namenode, the more memory usage in Namenode and Datanode.
> I also thought the memory used in Namenode should be larger than the memory used in each Datanode.
> However, some results show my ideas are wrong.
> For example, I test the memory usage of Namenode with 6000 and 1000 files.
> The "6000" memory is less than "1000" memory from jmap's results. 
> I also found that the memory usage in Datanode is larger than the memory used in Namenode.
> 
> I really don't know how to get the memory usage in Namenode and Datanode.
> 
> Can anyone give me some advices?
> 
> Thanks,
> Tim

Re: How can I get the memory usage in Namenode and Datanode?

Posted by Jonathan Aquilina <ja...@eagleeyet.net>.

 

Hi Tim, 

Not sure if this might be of any use in terms of improving overall
cluster performance for you, but I hope that it might shed some ideas
for you and others. 

https://media.amazonwebservices.com/AWS_Amazon_EMR_Best_Practices.pdf 

---
Regards,
Jonathan Aquilina
Founder Eagle Eye T

On 2015-02-22 07:57, Tim Chou wrote: 

> Hi Jonathan, 
> 
> Very useful information. I will look at the ganglia. 
> 
> However, I do not have the administrative privilege for the cluster. I don't know if I can install Ganglia in the cluster. 
> 
> Thank you for your information. 
> 
> Best, 
> Tim 
> 
> 2015-02-22 0:53 GMT-06:00 Jonathan Aquilina <ja...@eagleeyet.net>:
> 
> Where I am working we are working on transient cluster (temporary) using Amazon EMR. When I was reading up on how things work they suggested for monitoring to use ganglia to monitor memory usage and network usage etc. That way depending on how things are setup be it using an amazon s3 bucket for example and pulling data directly into the cluster the network link will always be saturated to ensure a constant flow of data. 
> 
> What I am suggesting is potentially looking at ganglia. 
> 
> ---
> Regards,
> Jonathan Aquilina
> Founder Eagle Eye T
> 
> On 2015-02-22 07:42, Fang Zhou wrote: Hi Jonathan, 
> 
> Thank you. 
> 
> The number of files impact on the memory usage in Namenode. 
> 
> I just want to get the real memory usage situation in Namenode. 
> 
> The memory used in heap always changes so that I have no idea about which value is the right one. 
> 
> Thanks, 
> Tim 
> 
> On Feb 22, 2015, at 12:22 AM, Jonathan Aquilina <ja...@eagleeyet.net> wrote: 
> 
> I am rather new to hadoop, but wouldnt the difference be potentially in how the files are split in terms of size? 
> 
> ---
> Regards,
> Jonathan Aquilina
> Founder Eagle Eye T
> 
> On 2015-02-21 21:54, Fang Zhou wrote: 
> 
> Hi All,
> 
> I want to test the memory usage on Namenode and Datanode.
> 
> I try to use jmap, jstat, proc/pid/stat, top, ps aux, and Hadoop website interface to check the memory.
> The values I get from them are different. I also found that the memory always changes periodically.
> This is the first thing confused me.
> 
> I thought the more files stored in Namenode, the more memory usage in Namenode and Datanode.
> I also thought the memory used in Namenode should be larger than the memory used in each Datanode.
> However, some results show my ideas are wrong.
> For example, I test the memory usage of Namenode with 6000 and 1000 files.
> The "6000" memory is less than "1000" memory from jmap's results. 
> I also found that the memory usage in Datanode is larger than the memory used in Namenode.
> 
> I really don't know how to get the memory usage in Namenode and Datanode.
> 
> Can anyone give me some advices?
> 
> Thanks,
> Tim

Re: How can I get the memory usage in Namenode and Datanode?

Posted by Jonathan Aquilina <ja...@eagleeyet.net>.

 

Hi Tim, 

Not sure if this might be of any use in terms of improving overall
cluster performance for you, but I hope that it might shed some ideas
for you and others. 

https://media.amazonwebservices.com/AWS_Amazon_EMR_Best_Practices.pdf 

---
Regards,
Jonathan Aquilina
Founder Eagle Eye T

On 2015-02-22 07:57, Tim Chou wrote: 

> Hi Jonathan, 
> 
> Very useful information. I will look at the ganglia. 
> 
> However, I do not have the administrative privilege for the cluster. I don't know if I can install Ganglia in the cluster. 
> 
> Thank you for your information. 
> 
> Best, 
> Tim 
> 
> 2015-02-22 0:53 GMT-06:00 Jonathan Aquilina <ja...@eagleeyet.net>:
> 
> Where I am working we are working on transient cluster (temporary) using Amazon EMR. When I was reading up on how things work they suggested for monitoring to use ganglia to monitor memory usage and network usage etc. That way depending on how things are setup be it using an amazon s3 bucket for example and pulling data directly into the cluster the network link will always be saturated to ensure a constant flow of data. 
> 
> What I am suggesting is potentially looking at ganglia. 
> 
> ---
> Regards,
> Jonathan Aquilina
> Founder Eagle Eye T
> 
> On 2015-02-22 07:42, Fang Zhou wrote: Hi Jonathan, 
> 
> Thank you. 
> 
> The number of files impact on the memory usage in Namenode. 
> 
> I just want to get the real memory usage situation in Namenode. 
> 
> The memory used in heap always changes so that I have no idea about which value is the right one. 
> 
> Thanks, 
> Tim 
> 
> On Feb 22, 2015, at 12:22 AM, Jonathan Aquilina <ja...@eagleeyet.net> wrote: 
> 
> I am rather new to hadoop, but wouldnt the difference be potentially in how the files are split in terms of size? 
> 
> ---
> Regards,
> Jonathan Aquilina
> Founder Eagle Eye T
> 
> On 2015-02-21 21:54, Fang Zhou wrote: 
> 
> Hi All,
> 
> I want to test the memory usage on Namenode and Datanode.
> 
> I try to use jmap, jstat, proc/pid/stat, top, ps aux, and Hadoop website interface to check the memory.
> The values I get from them are different. I also found that the memory always changes periodically.
> This is the first thing confused me.
> 
> I thought the more files stored in Namenode, the more memory usage in Namenode and Datanode.
> I also thought the memory used in Namenode should be larger than the memory used in each Datanode.
> However, some results show my ideas are wrong.
> For example, I test the memory usage of Namenode with 6000 and 1000 files.
> The "6000" memory is less than "1000" memory from jmap's results. 
> I also found that the memory usage in Datanode is larger than the memory used in Namenode.
> 
> I really don't know how to get the memory usage in Namenode and Datanode.
> 
> Can anyone give me some advices?
> 
> Thanks,
> Tim

Re: How can I get the memory usage in Namenode and Datanode?

Posted by Jonathan Aquilina <ja...@eagleeyet.net>.

 

Hi Tim, 

Not sure if this might be of any use in terms of improving overall
cluster performance for you, but I hope that it might shed some ideas
for you and others. 

https://media.amazonwebservices.com/AWS_Amazon_EMR_Best_Practices.pdf 

---
Regards,
Jonathan Aquilina
Founder Eagle Eye T

On 2015-02-22 07:57, Tim Chou wrote: 

> Hi Jonathan, 
> 
> Very useful information. I will look at the ganglia. 
> 
> However, I do not have the administrative privilege for the cluster. I don't know if I can install Ganglia in the cluster. 
> 
> Thank you for your information. 
> 
> Best, 
> Tim 
> 
> 2015-02-22 0:53 GMT-06:00 Jonathan Aquilina <ja...@eagleeyet.net>:
> 
> Where I am working we are working on transient cluster (temporary) using Amazon EMR. When I was reading up on how things work they suggested for monitoring to use ganglia to monitor memory usage and network usage etc. That way depending on how things are setup be it using an amazon s3 bucket for example and pulling data directly into the cluster the network link will always be saturated to ensure a constant flow of data. 
> 
> What I am suggesting is potentially looking at ganglia. 
> 
> ---
> Regards,
> Jonathan Aquilina
> Founder Eagle Eye T
> 
> On 2015-02-22 07:42, Fang Zhou wrote: Hi Jonathan, 
> 
> Thank you. 
> 
> The number of files impact on the memory usage in Namenode. 
> 
> I just want to get the real memory usage situation in Namenode. 
> 
> The memory used in heap always changes so that I have no idea about which value is the right one. 
> 
> Thanks, 
> Tim 
> 
> On Feb 22, 2015, at 12:22 AM, Jonathan Aquilina <ja...@eagleeyet.net> wrote: 
> 
> I am rather new to hadoop, but wouldnt the difference be potentially in how the files are split in terms of size? 
> 
> ---
> Regards,
> Jonathan Aquilina
> Founder Eagle Eye T
> 
> On 2015-02-21 21:54, Fang Zhou wrote: 
> 
> Hi All,
> 
> I want to test the memory usage on Namenode and Datanode.
> 
> I try to use jmap, jstat, proc/pid/stat, top, ps aux, and Hadoop website interface to check the memory.
> The values I get from them are different. I also found that the memory always changes periodically.
> This is the first thing confused me.
> 
> I thought the more files stored in Namenode, the more memory usage in Namenode and Datanode.
> I also thought the memory used in Namenode should be larger than the memory used in each Datanode.
> However, some results show my ideas are wrong.
> For example, I test the memory usage of Namenode with 6000 and 1000 files.
> The "6000" memory is less than "1000" memory from jmap's results. 
> I also found that the memory usage in Datanode is larger than the memory used in Namenode.
> 
> I really don't know how to get the memory usage in Namenode and Datanode.
> 
> Can anyone give me some advices?
> 
> Thanks,
> Tim

Re: How can I get the memory usage in Namenode and Datanode?

Posted by Tim Chou <ti...@gmail.com>.

Hi Jonathan,

Very useful information. I will look at the ganglia.

However, I do not have the administrative privilege for the cluster. I
don't know if I can install Ganglia in the cluster.

Thank you for your information.

Best,
Tim

2015-02-22 0:53 GMT-06:00 Jonathan Aquilina <ja...@eagleeyet.net>:

>  Where I am working we are working on transient cluster (temporary) using
> Amazon EMR. When I was reading up on how things work they suggested for
> monitoring to use ganglia to monitor memory usage and network usage etc.
> That way depending on how things are setup be it using an amazon s3 bucket
> for example and pulling data directly into the cluster the network link
> will always be saturated to ensure a constant flow of data.
>
> What I am suggesting is potentially looking at ganglia.
>
>
>
> ---
> Regards,
> Jonathan Aquilina
> Founder Eagle Eye T
>
>  On 2015-02-22 07:42, Fang Zhou wrote:
>
> Hi Jonathan,
>
> Thank you.
>
> The number of files impact on the memory usage in Namenode.
>
> I just want to get the real memory usage situation in Namenode.
>
> The memory used in heap always changes so that I have no idea about which
> value is the right one.
>
> Thanks,
> Tim
>
>  On Feb 22, 2015, at 12:22 AM, Jonathan Aquilina <ja...@eagleeyet.net>
> wrote:
>
>  I am rather new to hadoop, but wouldnt the difference be potentially in
> how the files are split in terms of size?
>
>
> ---
> Regards,
> Jonathan Aquilina
> Founder Eagle Eye T
>
>  On 2015-02-21 21:54, Fang Zhou wrote:
>
> Hi All,
>
> I want to test the memory usage on Namenode and Datanode.
>
> I try to use jmap, jstat, proc/pid/stat, top, ps aux, and Hadoop website interface to check the memory.
> The values I get from them are different. I also found that the memory always changes periodically.
> This is the first thing confused me.
>
> I thought the more files stored in Namenode, the more memory usage in Namenode and Datanode.
> I also thought the memory used in Namenode should be larger than the memory used in each Datanode.
> However, some results show my ideas are wrong.
> For example, I test the memory usage of Namenode with 6000 and 1000 files.
> The "6000" memory is less than "1000" memory from jmap's results.
> I also found that the memory usage in Datanode is larger than the memory used in Namenode.
>
> I really don't know how to get the memory usage in Namenode and Datanode.
>
> Can anyone give me some advices?
>
> Thanks,
> Tim
>
>

Re: How can I get the memory usage in Namenode and Datanode?

Posted by Tim Chou <ti...@gmail.com>.

Hi Jonathan,

Very useful information. I will look at the ganglia.

However, I do not have the administrative privilege for the cluster. I
don't know if I can install Ganglia in the cluster.

Thank you for your information.

Best,
Tim

2015-02-22 0:53 GMT-06:00 Jonathan Aquilina <ja...@eagleeyet.net>:

>  Where I am working we are working on transient cluster (temporary) using
> Amazon EMR. When I was reading up on how things work they suggested for
> monitoring to use ganglia to monitor memory usage and network usage etc.
> That way depending on how things are setup be it using an amazon s3 bucket
> for example and pulling data directly into the cluster the network link
> will always be saturated to ensure a constant flow of data.
>
> What I am suggesting is potentially looking at ganglia.
>
>
>
> ---
> Regards,
> Jonathan Aquilina
> Founder Eagle Eye T
>
>  On 2015-02-22 07:42, Fang Zhou wrote:
>
> Hi Jonathan,
>
> Thank you.
>
> The number of files impact on the memory usage in Namenode.
>
> I just want to get the real memory usage situation in Namenode.
>
> The memory used in heap always changes so that I have no idea about which
> value is the right one.
>
> Thanks,
> Tim
>
>  On Feb 22, 2015, at 12:22 AM, Jonathan Aquilina <ja...@eagleeyet.net>
> wrote:
>
>  I am rather new to hadoop, but wouldnt the difference be potentially in
> how the files are split in terms of size?
>
>
> ---
> Regards,
> Jonathan Aquilina
> Founder Eagle Eye T
>
>  On 2015-02-21 21:54, Fang Zhou wrote:
>
> Hi All,
>
> I want to test the memory usage on Namenode and Datanode.
>
> I try to use jmap, jstat, proc/pid/stat, top, ps aux, and Hadoop website interface to check the memory.
> The values I get from them are different. I also found that the memory always changes periodically.
> This is the first thing confused me.
>
> I thought the more files stored in Namenode, the more memory usage in Namenode and Datanode.
> I also thought the memory used in Namenode should be larger than the memory used in each Datanode.
> However, some results show my ideas are wrong.
> For example, I test the memory usage of Namenode with 6000 and 1000 files.
> The "6000" memory is less than "1000" memory from jmap's results.
> I also found that the memory usage in Datanode is larger than the memory used in Namenode.
>
> I really don't know how to get the memory usage in Namenode and Datanode.
>
> Can anyone give me some advices?
>
> Thanks,
> Tim
>
>

Re: How can I get the memory usage in Namenode and Datanode?

Posted by Tim Chou <ti...@gmail.com>.

Hi Jonathan,

Very useful information. I will look at the ganglia.

However, I do not have the administrative privilege for the cluster. I
don't know if I can install Ganglia in the cluster.

Thank you for your information.

Best,
Tim

2015-02-22 0:53 GMT-06:00 Jonathan Aquilina <ja...@eagleeyet.net>:

>  Where I am working we are working on transient cluster (temporary) using
> Amazon EMR. When I was reading up on how things work they suggested for
> monitoring to use ganglia to monitor memory usage and network usage etc.
> That way depending on how things are setup be it using an amazon s3 bucket
> for example and pulling data directly into the cluster the network link
> will always be saturated to ensure a constant flow of data.
>
> What I am suggesting is potentially looking at ganglia.
>
>
>
> ---
> Regards,
> Jonathan Aquilina
> Founder Eagle Eye T
>
>  On 2015-02-22 07:42, Fang Zhou wrote:
>
> Hi Jonathan,
>
> Thank you.
>
> The number of files impact on the memory usage in Namenode.
>
> I just want to get the real memory usage situation in Namenode.
>
> The memory used in heap always changes so that I have no idea about which
> value is the right one.
>
> Thanks,
> Tim
>
>  On Feb 22, 2015, at 12:22 AM, Jonathan Aquilina <ja...@eagleeyet.net>
> wrote:
>
>  I am rather new to hadoop, but wouldnt the difference be potentially in
> how the files are split in terms of size?
>
>
> ---
> Regards,
> Jonathan Aquilina
> Founder Eagle Eye T
>
>  On 2015-02-21 21:54, Fang Zhou wrote:
>
> Hi All,
>
> I want to test the memory usage on Namenode and Datanode.
>
> I try to use jmap, jstat, proc/pid/stat, top, ps aux, and Hadoop website interface to check the memory.
> The values I get from them are different. I also found that the memory always changes periodically.
> This is the first thing confused me.
>
> I thought the more files stored in Namenode, the more memory usage in Namenode and Datanode.
> I also thought the memory used in Namenode should be larger than the memory used in each Datanode.
> However, some results show my ideas are wrong.
> For example, I test the memory usage of Namenode with 6000 and 1000 files.
> The "6000" memory is less than "1000" memory from jmap's results.
> I also found that the memory usage in Datanode is larger than the memory used in Namenode.
>
> I really don't know how to get the memory usage in Namenode and Datanode.
>
> Can anyone give me some advices?
>
> Thanks,
> Tim
>
>

Re: How can I get the memory usage in Namenode and Datanode?

Posted by Tim Chou <ti...@gmail.com>.

Hi Jonathan,

Very useful information. I will look at the ganglia.

However, I do not have the administrative privilege for the cluster. I
don't know if I can install Ganglia in the cluster.

Thank you for your information.

Best,
Tim

2015-02-22 0:53 GMT-06:00 Jonathan Aquilina <ja...@eagleeyet.net>:

>  Where I am working we are working on transient cluster (temporary) using
> Amazon EMR. When I was reading up on how things work they suggested for
> monitoring to use ganglia to monitor memory usage and network usage etc.
> That way depending on how things are setup be it using an amazon s3 bucket
> for example and pulling data directly into the cluster the network link
> will always be saturated to ensure a constant flow of data.
>
> What I am suggesting is potentially looking at ganglia.
>
>
>
> ---
> Regards,
> Jonathan Aquilina
> Founder Eagle Eye T
>
>  On 2015-02-22 07:42, Fang Zhou wrote:
>
> Hi Jonathan,
>
> Thank you.
>
> The number of files impact on the memory usage in Namenode.
>
> I just want to get the real memory usage situation in Namenode.
>
> The memory used in heap always changes so that I have no idea about which
> value is the right one.
>
> Thanks,
> Tim
>
>  On Feb 22, 2015, at 12:22 AM, Jonathan Aquilina <ja...@eagleeyet.net>
> wrote:
>
>  I am rather new to hadoop, but wouldnt the difference be potentially in
> how the files are split in terms of size?
>
>
> ---
> Regards,
> Jonathan Aquilina
> Founder Eagle Eye T
>
>  On 2015-02-21 21:54, Fang Zhou wrote:
>
> Hi All,
>
> I want to test the memory usage on Namenode and Datanode.
>
> I try to use jmap, jstat, proc/pid/stat, top, ps aux, and Hadoop website interface to check the memory.
> The values I get from them are different. I also found that the memory always changes periodically.
> This is the first thing confused me.
>
> I thought the more files stored in Namenode, the more memory usage in Namenode and Datanode.
> I also thought the memory used in Namenode should be larger than the memory used in each Datanode.
> However, some results show my ideas are wrong.
> For example, I test the memory usage of Namenode with 6000 and 1000 files.
> The "6000" memory is less than "1000" memory from jmap's results.
> I also found that the memory usage in Datanode is larger than the memory used in Namenode.
>
> I really don't know how to get the memory usage in Namenode and Datanode.
>
> Can anyone give me some advices?
>
> Thanks,
> Tim
>
>

Re: How can I get the memory usage in Namenode and Datanode?

Posted by Jonathan Aquilina <ja...@eagleeyet.net>.

 

Where I am working we are working on transient cluster (temporary) using
Amazon EMR. When I was reading up on how things work they suggested for
monitoring to use ganglia to monitor memory usage and network usage etc.
That way depending on how things are setup be it using an amazon s3
bucket for example and pulling data directly into the cluster the
network link will always be saturated to ensure a constant flow of data.


What I am suggesting is potentially looking at ganglia. 

---
Regards,
Jonathan Aquilina
Founder Eagle Eye T

On 2015-02-22 07:42, Fang Zhou wrote: 

> Hi Jonathan, 
> 
> Thank you. 
> 
> The number of files impact on the memory usage in Namenode. 
> 
> I just want to get the real memory usage situation in Namenode. 
> 
> The memory used in heap always changes so that I have no idea about which value is the right one. 
> 
> Thanks, 
> Tim 
> 
> On Feb 22, 2015, at 12:22 AM, Jonathan Aquilina <ja...@eagleeyet.net> wrote: 
> 
> I am rather new to hadoop, but wouldnt the difference be potentially in how the files are split in terms of size? 
> 
> ---
> Regards,
> Jonathan Aquilina
> Founder Eagle Eye T
> 
> On 2015-02-21 21:54, Fang Zhou wrote: 
> 
> Hi All,
> 
> I want to test the memory usage on Namenode and Datanode.
> 
> I try to use jmap, jstat, proc/pid/stat, top, ps aux, and Hadoop website interface to check the memory.
> The values I get from them are different. I also found that the memory always changes periodically.
> This is the first thing confused me.
> 
> I thought the more files stored in Namenode, the more memory usage in Namenode and Datanode.
> I also thought the memory used in Namenode should be larger than the memory used in each Datanode.
> However, some results show my ideas are wrong.
> For example, I test the memory usage of Namenode with 6000 and 1000 files.
> The "6000" memory is less than "1000" memory from jmap's results. 
> I also found that the memory usage in Datanode is larger than the memory used in Namenode.
> 
> I really don't know how to get the memory usage in Namenode and Datanode.
> 
> Can anyone give me some advices?
> 
> Thanks,
> Tim

Re: How can I get the memory usage in Namenode and Datanode?

Posted by Jonathan Aquilina <ja...@eagleeyet.net>.

 

Where I am working we are working on transient cluster (temporary) using
Amazon EMR. When I was reading up on how things work they suggested for
monitoring to use ganglia to monitor memory usage and network usage etc.
That way depending on how things are setup be it using an amazon s3
bucket for example and pulling data directly into the cluster the
network link will always be saturated to ensure a constant flow of data.


What I am suggesting is potentially looking at ganglia. 

---
Regards,
Jonathan Aquilina
Founder Eagle Eye T

On 2015-02-22 07:42, Fang Zhou wrote: 

> Hi Jonathan, 
> 
> Thank you. 
> 
> The number of files impact on the memory usage in Namenode. 
> 
> I just want to get the real memory usage situation in Namenode. 
> 
> The memory used in heap always changes so that I have no idea about which value is the right one. 
> 
> Thanks, 
> Tim 
> 
> On Feb 22, 2015, at 12:22 AM, Jonathan Aquilina <ja...@eagleeyet.net> wrote: 
> 
> I am rather new to hadoop, but wouldnt the difference be potentially in how the files are split in terms of size? 
> 
> ---
> Regards,
> Jonathan Aquilina
> Founder Eagle Eye T
> 
> On 2015-02-21 21:54, Fang Zhou wrote: 
> 
> Hi All,
> 
> I want to test the memory usage on Namenode and Datanode.
> 
> I try to use jmap, jstat, proc/pid/stat, top, ps aux, and Hadoop website interface to check the memory.
> The values I get from them are different. I also found that the memory always changes periodically.
> This is the first thing confused me.
> 
> I thought the more files stored in Namenode, the more memory usage in Namenode and Datanode.
> I also thought the memory used in Namenode should be larger than the memory used in each Datanode.
> However, some results show my ideas are wrong.
> For example, I test the memory usage of Namenode with 6000 and 1000 files.
> The "6000" memory is less than "1000" memory from jmap's results. 
> I also found that the memory usage in Datanode is larger than the memory used in Namenode.
> 
> I really don't know how to get the memory usage in Namenode and Datanode.
> 
> Can anyone give me some advices?
> 
> Thanks,
> Tim

Re: How can I get the memory usage in Namenode and Datanode?

Posted by Jonathan Aquilina <ja...@eagleeyet.net>.

 

Where I am working we are working on transient cluster (temporary) using
Amazon EMR. When I was reading up on how things work they suggested for
monitoring to use ganglia to monitor memory usage and network usage etc.
That way depending on how things are setup be it using an amazon s3
bucket for example and pulling data directly into the cluster the
network link will always be saturated to ensure a constant flow of data.


What I am suggesting is potentially looking at ganglia. 

---
Regards,
Jonathan Aquilina
Founder Eagle Eye T

On 2015-02-22 07:42, Fang Zhou wrote: 

> Hi Jonathan, 
> 
> Thank you. 
> 
> The number of files impact on the memory usage in Namenode. 
> 
> I just want to get the real memory usage situation in Namenode. 
> 
> The memory used in heap always changes so that I have no idea about which value is the right one. 
> 
> Thanks, 
> Tim 
> 
> On Feb 22, 2015, at 12:22 AM, Jonathan Aquilina <ja...@eagleeyet.net> wrote: 
> 
> I am rather new to hadoop, but wouldnt the difference be potentially in how the files are split in terms of size? 
> 
> ---
> Regards,
> Jonathan Aquilina
> Founder Eagle Eye T
> 
> On 2015-02-21 21:54, Fang Zhou wrote: 
> 
> Hi All,
> 
> I want to test the memory usage on Namenode and Datanode.
> 
> I try to use jmap, jstat, proc/pid/stat, top, ps aux, and Hadoop website interface to check the memory.
> The values I get from them are different. I also found that the memory always changes periodically.
> This is the first thing confused me.
> 
> I thought the more files stored in Namenode, the more memory usage in Namenode and Datanode.
> I also thought the memory used in Namenode should be larger than the memory used in each Datanode.
> However, some results show my ideas are wrong.
> For example, I test the memory usage of Namenode with 6000 and 1000 files.
> The "6000" memory is less than "1000" memory from jmap's results. 
> I also found that the memory usage in Datanode is larger than the memory used in Namenode.
> 
> I really don't know how to get the memory usage in Namenode and Datanode.
> 
> Can anyone give me some advices?
> 
> Thanks,
> Tim

Re: How can I get the memory usage in Namenode and Datanode?

Posted by Jonathan Aquilina <ja...@eagleeyet.net>.

 

Where I am working we are working on transient cluster (temporary) using
Amazon EMR. When I was reading up on how things work they suggested for
monitoring to use ganglia to monitor memory usage and network usage etc.
That way depending on how things are setup be it using an amazon s3
bucket for example and pulling data directly into the cluster the
network link will always be saturated to ensure a constant flow of data.


What I am suggesting is potentially looking at ganglia. 

---
Regards,
Jonathan Aquilina
Founder Eagle Eye T

On 2015-02-22 07:42, Fang Zhou wrote: 

> Hi Jonathan, 
> 
> Thank you. 
> 
> The number of files impact on the memory usage in Namenode. 
> 
> I just want to get the real memory usage situation in Namenode. 
> 
> The memory used in heap always changes so that I have no idea about which value is the right one. 
> 
> Thanks, 
> Tim 
> 
> On Feb 22, 2015, at 12:22 AM, Jonathan Aquilina <ja...@eagleeyet.net> wrote: 
> 
> I am rather new to hadoop, but wouldnt the difference be potentially in how the files are split in terms of size? 
> 
> ---
> Regards,
> Jonathan Aquilina
> Founder Eagle Eye T
> 
> On 2015-02-21 21:54, Fang Zhou wrote: 
> 
> Hi All,
> 
> I want to test the memory usage on Namenode and Datanode.
> 
> I try to use jmap, jstat, proc/pid/stat, top, ps aux, and Hadoop website interface to check the memory.
> The values I get from them are different. I also found that the memory always changes periodically.
> This is the first thing confused me.
> 
> I thought the more files stored in Namenode, the more memory usage in Namenode and Datanode.
> I also thought the memory used in Namenode should be larger than the memory used in each Datanode.
> However, some results show my ideas are wrong.
> For example, I test the memory usage of Namenode with 6000 and 1000 files.
> The "6000" memory is less than "1000" memory from jmap's results. 
> I also found that the memory usage in Datanode is larger than the memory used in Namenode.
> 
> I really don't know how to get the memory usage in Namenode and Datanode.
> 
> Can anyone give me some advices?
> 
> Thanks,
> Tim

Re: How can I get the memory usage in Namenode and Datanode?

Posted by Fang Zhou <ti...@gmail.com>.

Hi Jonathan,

Thank you.

The number of files impact on the memory usage in Namenode.

I just want to get the real memory usage situation in Namenode.

The memory used in heap always changes so that I have no idea about which value is the right one.

Thanks,
Tim

> On Feb 22, 2015, at 12:22 AM, Jonathan Aquilina <ja...@eagleeyet.net> wrote:
> 
> I am rather new to hadoop, but wouldnt the difference be potentially in how the files are split in terms of size?
> 
>  
> ---
> Regards,
> Jonathan Aquilina
> Founder Eagle Eye T
> On 2015-02-21 21:54, Fang Zhou wrote:
> 
>> Hi All,
>> 
>> I want to test the memory usage on Namenode and Datanode.
>> 
>> I try to use jmap, jstat, proc/pid/stat, top, ps aux, and Hadoop website interface to check the memory.
>> The values I get from them are different. I also found that the memory always changes periodically.
>> This is the first thing confused me.
>> 
>> I thought the more files stored in Namenode, the more memory usage in Namenode and Datanode.
>> I also thought the memory used in Namenode should be larger than the memory used in each Datanode.
>> However, some results show my ideas are wrong.
>> For example, I test the memory usage of Namenode with 6000 and 1000 files.
>> The "6000" memory is less than "1000" memory from jmap's results. 
>> I also found that the memory usage in Datanode is larger than the memory used in Namenode.
>> 
>> I really don't know how to get the memory usage in Namenode and Datanode.
>> 
>> Can anyone give me some advices?
>> 
>> Thanks,
>> Tim

Re: How can I get the memory usage in Namenode and Datanode?

Posted by Fang Zhou <ti...@gmail.com>.

Hi Jonathan,

Thank you.

The number of files impact on the memory usage in Namenode.

I just want to get the real memory usage situation in Namenode.

The memory used in heap always changes so that I have no idea about which value is the right one.

Thanks,
Tim

> On Feb 22, 2015, at 12:22 AM, Jonathan Aquilina <ja...@eagleeyet.net> wrote:
> 
> I am rather new to hadoop, but wouldnt the difference be potentially in how the files are split in terms of size?
> 
>  
> ---
> Regards,
> Jonathan Aquilina
> Founder Eagle Eye T
> On 2015-02-21 21:54, Fang Zhou wrote:
> 
>> Hi All,
>> 
>> I want to test the memory usage on Namenode and Datanode.
>> 
>> I try to use jmap, jstat, proc/pid/stat, top, ps aux, and Hadoop website interface to check the memory.
>> The values I get from them are different. I also found that the memory always changes periodically.
>> This is the first thing confused me.
>> 
>> I thought the more files stored in Namenode, the more memory usage in Namenode and Datanode.
>> I also thought the memory used in Namenode should be larger than the memory used in each Datanode.
>> However, some results show my ideas are wrong.
>> For example, I test the memory usage of Namenode with 6000 and 1000 files.
>> The "6000" memory is less than "1000" memory from jmap's results. 
>> I also found that the memory usage in Datanode is larger than the memory used in Namenode.
>> 
>> I really don't know how to get the memory usage in Namenode and Datanode.
>> 
>> Can anyone give me some advices?
>> 
>> Thanks,
>> Tim

Re: How can I get the memory usage in Namenode and Datanode?

Posted by Fang Zhou <ti...@gmail.com>.

Hi Jonathan,

Thank you.

The number of files impact on the memory usage in Namenode.

I just want to get the real memory usage situation in Namenode.

The memory used in heap always changes so that I have no idea about which value is the right one.

Thanks,
Tim

> On Feb 22, 2015, at 12:22 AM, Jonathan Aquilina <ja...@eagleeyet.net> wrote:
> 
> I am rather new to hadoop, but wouldnt the difference be potentially in how the files are split in terms of size?
> 
>  
> ---
> Regards,
> Jonathan Aquilina
> Founder Eagle Eye T
> On 2015-02-21 21:54, Fang Zhou wrote:
> 
>> Hi All,
>> 
>> I want to test the memory usage on Namenode and Datanode.
>> 
>> I try to use jmap, jstat, proc/pid/stat, top, ps aux, and Hadoop website interface to check the memory.
>> The values I get from them are different. I also found that the memory always changes periodically.
>> This is the first thing confused me.
>> 
>> I thought the more files stored in Namenode, the more memory usage in Namenode and Datanode.
>> I also thought the memory used in Namenode should be larger than the memory used in each Datanode.
>> However, some results show my ideas are wrong.
>> For example, I test the memory usage of Namenode with 6000 and 1000 files.
>> The "6000" memory is less than "1000" memory from jmap's results. 
>> I also found that the memory usage in Datanode is larger than the memory used in Namenode.
>> 
>> I really don't know how to get the memory usage in Namenode and Datanode.
>> 
>> Can anyone give me some advices?
>> 
>> Thanks,
>> Tim

Re: How can I get the memory usage in Namenode and Datanode?

Posted by Fang Zhou <ti...@gmail.com>.

Hi Jonathan,

Thank you.

The number of files impact on the memory usage in Namenode.

I just want to get the real memory usage situation in Namenode.

The memory used in heap always changes so that I have no idea about which value is the right one.

Thanks,
Tim

> On Feb 22, 2015, at 12:22 AM, Jonathan Aquilina <ja...@eagleeyet.net> wrote:
> 
> I am rather new to hadoop, but wouldnt the difference be potentially in how the files are split in terms of size?
> 
>  
> ---
> Regards,
> Jonathan Aquilina
> Founder Eagle Eye T
> On 2015-02-21 21:54, Fang Zhou wrote:
> 
>> Hi All,
>> 
>> I want to test the memory usage on Namenode and Datanode.
>> 
>> I try to use jmap, jstat, proc/pid/stat, top, ps aux, and Hadoop website interface to check the memory.
>> The values I get from them are different. I also found that the memory always changes periodically.
>> This is the first thing confused me.
>> 
>> I thought the more files stored in Namenode, the more memory usage in Namenode and Datanode.
>> I also thought the memory used in Namenode should be larger than the memory used in each Datanode.
>> However, some results show my ideas are wrong.
>> For example, I test the memory usage of Namenode with 6000 and 1000 files.
>> The "6000" memory is less than "1000" memory from jmap's results. 
>> I also found that the memory usage in Datanode is larger than the memory used in Namenode.
>> 
>> I really don't know how to get the memory usage in Namenode and Datanode.
>> 
>> Can anyone give me some advices?
>> 
>> Thanks,
>> Tim