You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by Shiyuan Xiao <sh...@ericsson.com> on 2014/09/01 08:19:56 UTC

CPU utilization keeps increasing when using HDFS

Hi

We have written a MapReduce application based on Hadoop 2.4 which keeps reading data from HDFS(Pseudo-distributed mode in one node).
And we found the CPU system time and user time of the application keeps increasing when it is running. If we changed the application to read data from local disk without changing any other business logic, the CPU utilization kept stable. So we have conclusion that the CPU utilization is related to HDFS.
We want to know whether this issue is really related to HDFS and is there any solution to fix it?

[cid:image001.png@01CFC5EE.E02C6D50]

Thanks a lot!

BR/Shiyuan

RE: CPU utilization keeps increasing when using HDFS

Posted by Shiyuan Xiao <sh...@ericsson.com>.

Yes, the client process used the most CPU shares.

But could you please help explain why the CPU utilization kept increasing?  We are sure that the traffic of provisioned data into HDFS was stable.

Thanks

BR/Shiyuan

From: Gordon Wang [mailto:gwang@pivotal.io]
Sent: 2014年9月1日 15:48
To: user@hadoop.apache.org
Subject: Re: CPU utilization keeps increasing when using HDFS

Because you are using one node Pseudo cluster. When HDFS client write data to HDFS, client will compute the data chunk checksum and the datanode will verify it. It costs cpu shares.
You can monitoring the cpu usages for each process. I guess the NameNode cpu usage is OK. But the client process and DataNode process might use most of the cpu shares.

On Mon, Sep 1, 2014 at 3:09 PM, Shiyuan Xiao <sh...@ericsson.com>> wrote:
Because we are running the application with accessing local disk now, I can’t give the “top” command’s output when running with HDFS.

But we used “top” and “pidstat”  to check the CPU utilization of our application, I  can confirm  the CPU utilization of our application was increasing and the CPU utilization of datanode, namenode, resourcemanager and NodeManager processes kept stable.


Below the “top” command’s output when the application is accessing local disk:
[reporting@ms1 ~]$ top

top - 15:04:58 up 33 days, 24 min,  3 users,  load average: 4.05, 4.08, 3.92
Tasks: 361 total,   1 running, 360 sleeping,   0 stopped,   0 zombie
Cpu(s): 34.5%us,  2.3%sy,  0.0%ni, 63.2%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  66068256k total, 54013596k used, 12054660k free,  3400140k buffers
Swap:  2097144k total,   268376k used,  1828768k free, 41202752k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
33364 reportin  20   0 1628m 745m  17m S 168.2  1.2   0:05.07 /usr/java/default/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.me
33130 reportin  20   0 1078m 246m  18m S 130.7  0.4   0:08.10 /usr/java/default/bin/java -Xmx1000m -Djava.library.path=/opt/hadoop/d
33439 reportin  20   0 1613m 143m  17m S 108.1  0.2   0:03.26 /usr/java/default/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.me
25690 reportin  20   0 1724m 530m  18m S  8.6  0.8   4:31.44 /usr/java/default/bin/java -Dproc_nodemanager -Xmx1000m -Dhadoop.log.di
32879 reportin  20   0 1679m 370m  18m S  6.6  0.6   0:09.13 /usr/java/default/bin/java -Dlog4j.configuration=container-log4j.proper
32642 reportin  20   0 1662m 372m  18m S  6.0  0.6   0:09.22 /usr/java/default/bin/java -Dlog4j.configuration=container-log4j.proper
25200 reportin  20   0 1639m 326m  18m S  2.0  0.5   0:42.49 /usr/java/default/bin/java -Dproc_datanode -Xmx1000m -Djava.library.pat
25576 reportin  20   0 1804m 400m  18m S  2.0  0.6   0:53.13 /usr/java/default/bin/java -Dproc_resourcemanager -Xmx1000m -Dhadoop.lo
25058 reportin  20   0 1622m 401m  18m S  1.7  0.6   0:42.88 /usr/java/default/bin/java -Dproc_namenode -Xmx1000m -Djava.library.pat
33262 reportin  20   0 15260 1556 1012 R  0.7  0.0   0:00.04 top
 2984 root      20   0 1227m  14m 1324 S  0.3  0.0  52:00.05 /usr/bin/python /opt/ericsson/nms/litp//bin/landscape_service.py --daem
32019 reportin  20   0 1090m 248m  18m S  0.3  0.4   0:09.21 /usr/java/default/bin/java -Xmx1000m -Djava.library.path=/opt/hadoop/de
49403 hyperic   20   0 5333m 216m  15m S  0.3  0.3  37:31.53 /usr/java/default/jre//bin/java -Djava.security.auth.login.config=../..
50715 reportin  20   0 5472m 380m  13m S  0.3  0.6   3:57.15 java -Xmx2048m -XX:MaxPermSize=128m -Dlogback.configurationFile=/opt/er
    1 root      20   0 19228 1100  896 S  0.0  0.0  10:53.91 /sbin/init
    2 root      20   0     0    0    0 S  0.0  0.0   0:00.02 [kthreadd]
    3 root      RT   0     0    0    0 S  0.0  0.0   0:27.01 [migration/0]


From: Stanley Shi [mailto:sshi@pivotal.io<ma...@pivotal.io>]
Sent: 2014年9月1日 14:32
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: CPU utilization keeps increasing when using HDFS

Would you please give the output of the "top" command? at least to show that the HDFS process did use that much of CPU;

On Mon, Sep 1, 2014 at 2:19 PM, Shiyuan Xiao <sh...@ericsson.com>> wrote:
Hi

We have written a MapReduce application based on Hadoop 2.4 which keeps reading data from HDFS(Pseudo-distributed mode in one node).
And we found the CPU system time and user time of the application keeps increasing when it is running. If we changed the application to read data from local disk without changing any other business logic, the CPU utilization kept stable. So we have conclusion that the CPU utilization is related to HDFS.
We want to know whether this issue is really related to HDFS and is there any solution to fix it?

[cid:image001.png@01CFC608.634A1070]

Thanks a lot!

BR/Shiyuan



--
Regards,
Stanley Shi,
[http://www.gopivotal.com/files/media/logos/pivotal-logo-email-signature.png]



--
Regards
Gordon Wang

RE: CPU utilization keeps increasing when using HDFS

Posted by Shiyuan Xiao <sh...@ericsson.com>.

Yes, the client process used the most CPU shares.

But could you please help explain why the CPU utilization kept increasing?  We are sure that the traffic of provisioned data into HDFS was stable.

Thanks

BR/Shiyuan

From: Gordon Wang [mailto:gwang@pivotal.io]
Sent: 2014年9月1日 15:48
To: user@hadoop.apache.org
Subject: Re: CPU utilization keeps increasing when using HDFS

Because you are using one node Pseudo cluster. When HDFS client write data to HDFS, client will compute the data chunk checksum and the datanode will verify it. It costs cpu shares.
You can monitoring the cpu usages for each process. I guess the NameNode cpu usage is OK. But the client process and DataNode process might use most of the cpu shares.

On Mon, Sep 1, 2014 at 3:09 PM, Shiyuan Xiao <sh...@ericsson.com>> wrote:
Because we are running the application with accessing local disk now, I can’t give the “top” command’s output when running with HDFS.

But we used “top” and “pidstat”  to check the CPU utilization of our application, I  can confirm  the CPU utilization of our application was increasing and the CPU utilization of datanode, namenode, resourcemanager and NodeManager processes kept stable.


Below the “top” command’s output when the application is accessing local disk:
[reporting@ms1 ~]$ top

top - 15:04:58 up 33 days, 24 min,  3 users,  load average: 4.05, 4.08, 3.92
Tasks: 361 total,   1 running, 360 sleeping,   0 stopped,   0 zombie
Cpu(s): 34.5%us,  2.3%sy,  0.0%ni, 63.2%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  66068256k total, 54013596k used, 12054660k free,  3400140k buffers
Swap:  2097144k total,   268376k used,  1828768k free, 41202752k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
33364 reportin  20   0 1628m 745m  17m S 168.2  1.2   0:05.07 /usr/java/default/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.me
33130 reportin  20   0 1078m 246m  18m S 130.7  0.4   0:08.10 /usr/java/default/bin/java -Xmx1000m -Djava.library.path=/opt/hadoop/d
33439 reportin  20   0 1613m 143m  17m S 108.1  0.2   0:03.26 /usr/java/default/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.me
25690 reportin  20   0 1724m 530m  18m S  8.6  0.8   4:31.44 /usr/java/default/bin/java -Dproc_nodemanager -Xmx1000m -Dhadoop.log.di
32879 reportin  20   0 1679m 370m  18m S  6.6  0.6   0:09.13 /usr/java/default/bin/java -Dlog4j.configuration=container-log4j.proper
32642 reportin  20   0 1662m 372m  18m S  6.0  0.6   0:09.22 /usr/java/default/bin/java -Dlog4j.configuration=container-log4j.proper
25200 reportin  20   0 1639m 326m  18m S  2.0  0.5   0:42.49 /usr/java/default/bin/java -Dproc_datanode -Xmx1000m -Djava.library.pat
25576 reportin  20   0 1804m 400m  18m S  2.0  0.6   0:53.13 /usr/java/default/bin/java -Dproc_resourcemanager -Xmx1000m -Dhadoop.lo
25058 reportin  20   0 1622m 401m  18m S  1.7  0.6   0:42.88 /usr/java/default/bin/java -Dproc_namenode -Xmx1000m -Djava.library.pat
33262 reportin  20   0 15260 1556 1012 R  0.7  0.0   0:00.04 top
 2984 root      20   0 1227m  14m 1324 S  0.3  0.0  52:00.05 /usr/bin/python /opt/ericsson/nms/litp//bin/landscape_service.py --daem
32019 reportin  20   0 1090m 248m  18m S  0.3  0.4   0:09.21 /usr/java/default/bin/java -Xmx1000m -Djava.library.path=/opt/hadoop/de
49403 hyperic   20   0 5333m 216m  15m S  0.3  0.3  37:31.53 /usr/java/default/jre//bin/java -Djava.security.auth.login.config=../..
50715 reportin  20   0 5472m 380m  13m S  0.3  0.6   3:57.15 java -Xmx2048m -XX:MaxPermSize=128m -Dlogback.configurationFile=/opt/er
    1 root      20   0 19228 1100  896 S  0.0  0.0  10:53.91 /sbin/init
    2 root      20   0     0    0    0 S  0.0  0.0   0:00.02 [kthreadd]
    3 root      RT   0     0    0    0 S  0.0  0.0   0:27.01 [migration/0]


From: Stanley Shi [mailto:sshi@pivotal.io<ma...@pivotal.io>]
Sent: 2014年9月1日 14:32
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: CPU utilization keeps increasing when using HDFS

Would you please give the output of the "top" command? at least to show that the HDFS process did use that much of CPU;

On Mon, Sep 1, 2014 at 2:19 PM, Shiyuan Xiao <sh...@ericsson.com>> wrote:
Hi

We have written a MapReduce application based on Hadoop 2.4 which keeps reading data from HDFS(Pseudo-distributed mode in one node).
And we found the CPU system time and user time of the application keeps increasing when it is running. If we changed the application to read data from local disk without changing any other business logic, the CPU utilization kept stable. So we have conclusion that the CPU utilization is related to HDFS.
We want to know whether this issue is really related to HDFS and is there any solution to fix it?

[cid:image001.png@01CFC608.634A1070]

Thanks a lot!

BR/Shiyuan



--
Regards,
Stanley Shi,
[http://www.gopivotal.com/files/media/logos/pivotal-logo-email-signature.png]



--
Regards
Gordon Wang

RE: CPU utilization keeps increasing when using HDFS

Posted by Shiyuan Xiao <sh...@ericsson.com>.

Yes, the client process used the most CPU shares.

But could you please help explain why the CPU utilization kept increasing?  We are sure that the traffic of provisioned data into HDFS was stable.

Thanks

BR/Shiyuan

From: Gordon Wang [mailto:gwang@pivotal.io]
Sent: 2014年9月1日 15:48
To: user@hadoop.apache.org
Subject: Re: CPU utilization keeps increasing when using HDFS

Because you are using one node Pseudo cluster. When HDFS client write data to HDFS, client will compute the data chunk checksum and the datanode will verify it. It costs cpu shares.
You can monitoring the cpu usages for each process. I guess the NameNode cpu usage is OK. But the client process and DataNode process might use most of the cpu shares.

On Mon, Sep 1, 2014 at 3:09 PM, Shiyuan Xiao <sh...@ericsson.com>> wrote:
Because we are running the application with accessing local disk now, I can’t give the “top” command’s output when running with HDFS.

But we used “top” and “pidstat”  to check the CPU utilization of our application, I  can confirm  the CPU utilization of our application was increasing and the CPU utilization of datanode, namenode, resourcemanager and NodeManager processes kept stable.


Below the “top” command’s output when the application is accessing local disk:
[reporting@ms1 ~]$ top

top - 15:04:58 up 33 days, 24 min,  3 users,  load average: 4.05, 4.08, 3.92
Tasks: 361 total,   1 running, 360 sleeping,   0 stopped,   0 zombie
Cpu(s): 34.5%us,  2.3%sy,  0.0%ni, 63.2%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  66068256k total, 54013596k used, 12054660k free,  3400140k buffers
Swap:  2097144k total,   268376k used,  1828768k free, 41202752k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
33364 reportin  20   0 1628m 745m  17m S 168.2  1.2   0:05.07 /usr/java/default/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.me
33130 reportin  20   0 1078m 246m  18m S 130.7  0.4   0:08.10 /usr/java/default/bin/java -Xmx1000m -Djava.library.path=/opt/hadoop/d
33439 reportin  20   0 1613m 143m  17m S 108.1  0.2   0:03.26 /usr/java/default/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.me
25690 reportin  20   0 1724m 530m  18m S  8.6  0.8   4:31.44 /usr/java/default/bin/java -Dproc_nodemanager -Xmx1000m -Dhadoop.log.di
32879 reportin  20   0 1679m 370m  18m S  6.6  0.6   0:09.13 /usr/java/default/bin/java -Dlog4j.configuration=container-log4j.proper
32642 reportin  20   0 1662m 372m  18m S  6.0  0.6   0:09.22 /usr/java/default/bin/java -Dlog4j.configuration=container-log4j.proper
25200 reportin  20   0 1639m 326m  18m S  2.0  0.5   0:42.49 /usr/java/default/bin/java -Dproc_datanode -Xmx1000m -Djava.library.pat
25576 reportin  20   0 1804m 400m  18m S  2.0  0.6   0:53.13 /usr/java/default/bin/java -Dproc_resourcemanager -Xmx1000m -Dhadoop.lo
25058 reportin  20   0 1622m 401m  18m S  1.7  0.6   0:42.88 /usr/java/default/bin/java -Dproc_namenode -Xmx1000m -Djava.library.pat
33262 reportin  20   0 15260 1556 1012 R  0.7  0.0   0:00.04 top
 2984 root      20   0 1227m  14m 1324 S  0.3  0.0  52:00.05 /usr/bin/python /opt/ericsson/nms/litp//bin/landscape_service.py --daem
32019 reportin  20   0 1090m 248m  18m S  0.3  0.4   0:09.21 /usr/java/default/bin/java -Xmx1000m -Djava.library.path=/opt/hadoop/de
49403 hyperic   20   0 5333m 216m  15m S  0.3  0.3  37:31.53 /usr/java/default/jre//bin/java -Djava.security.auth.login.config=../..
50715 reportin  20   0 5472m 380m  13m S  0.3  0.6   3:57.15 java -Xmx2048m -XX:MaxPermSize=128m -Dlogback.configurationFile=/opt/er
    1 root      20   0 19228 1100  896 S  0.0  0.0  10:53.91 /sbin/init
    2 root      20   0     0    0    0 S  0.0  0.0   0:00.02 [kthreadd]
    3 root      RT   0     0    0    0 S  0.0  0.0   0:27.01 [migration/0]


From: Stanley Shi [mailto:sshi@pivotal.io<ma...@pivotal.io>]
Sent: 2014年9月1日 14:32
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: CPU utilization keeps increasing when using HDFS

Would you please give the output of the "top" command? at least to show that the HDFS process did use that much of CPU;

On Mon, Sep 1, 2014 at 2:19 PM, Shiyuan Xiao <sh...@ericsson.com>> wrote:
Hi

We have written a MapReduce application based on Hadoop 2.4 which keeps reading data from HDFS(Pseudo-distributed mode in one node).
And we found the CPU system time and user time of the application keeps increasing when it is running. If we changed the application to read data from local disk without changing any other business logic, the CPU utilization kept stable. So we have conclusion that the CPU utilization is related to HDFS.
We want to know whether this issue is really related to HDFS and is there any solution to fix it?

[cid:image001.png@01CFC608.634A1070]

Thanks a lot!

BR/Shiyuan



--
Regards,
Stanley Shi,
[http://www.gopivotal.com/files/media/logos/pivotal-logo-email-signature.png]



--
Regards
Gordon Wang

RE: CPU utilization keeps increasing when using HDFS

Posted by Shiyuan Xiao <sh...@ericsson.com>.

Yes, the client process used the most CPU shares.

But could you please help explain why the CPU utilization kept increasing?  We are sure that the traffic of provisioned data into HDFS was stable.

Thanks

BR/Shiyuan

From: Gordon Wang [mailto:gwang@pivotal.io]
Sent: 2014年9月1日 15:48
To: user@hadoop.apache.org
Subject: Re: CPU utilization keeps increasing when using HDFS

Because you are using one node Pseudo cluster. When HDFS client write data to HDFS, client will compute the data chunk checksum and the datanode will verify it. It costs cpu shares.
You can monitoring the cpu usages for each process. I guess the NameNode cpu usage is OK. But the client process and DataNode process might use most of the cpu shares.

On Mon, Sep 1, 2014 at 3:09 PM, Shiyuan Xiao <sh...@ericsson.com>> wrote:
Because we are running the application with accessing local disk now, I can’t give the “top” command’s output when running with HDFS.

But we used “top” and “pidstat”  to check the CPU utilization of our application, I  can confirm  the CPU utilization of our application was increasing and the CPU utilization of datanode, namenode, resourcemanager and NodeManager processes kept stable.


Below the “top” command’s output when the application is accessing local disk:
[reporting@ms1 ~]$ top

top - 15:04:58 up 33 days, 24 min,  3 users,  load average: 4.05, 4.08, 3.92
Tasks: 361 total,   1 running, 360 sleeping,   0 stopped,   0 zombie
Cpu(s): 34.5%us,  2.3%sy,  0.0%ni, 63.2%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  66068256k total, 54013596k used, 12054660k free,  3400140k buffers
Swap:  2097144k total,   268376k used,  1828768k free, 41202752k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
33364 reportin  20   0 1628m 745m  17m S 168.2  1.2   0:05.07 /usr/java/default/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.me
33130 reportin  20   0 1078m 246m  18m S 130.7  0.4   0:08.10 /usr/java/default/bin/java -Xmx1000m -Djava.library.path=/opt/hadoop/d
33439 reportin  20   0 1613m 143m  17m S 108.1  0.2   0:03.26 /usr/java/default/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.me
25690 reportin  20   0 1724m 530m  18m S  8.6  0.8   4:31.44 /usr/java/default/bin/java -Dproc_nodemanager -Xmx1000m -Dhadoop.log.di
32879 reportin  20   0 1679m 370m  18m S  6.6  0.6   0:09.13 /usr/java/default/bin/java -Dlog4j.configuration=container-log4j.proper
32642 reportin  20   0 1662m 372m  18m S  6.0  0.6   0:09.22 /usr/java/default/bin/java -Dlog4j.configuration=container-log4j.proper
25200 reportin  20   0 1639m 326m  18m S  2.0  0.5   0:42.49 /usr/java/default/bin/java -Dproc_datanode -Xmx1000m -Djava.library.pat
25576 reportin  20   0 1804m 400m  18m S  2.0  0.6   0:53.13 /usr/java/default/bin/java -Dproc_resourcemanager -Xmx1000m -Dhadoop.lo
25058 reportin  20   0 1622m 401m  18m S  1.7  0.6   0:42.88 /usr/java/default/bin/java -Dproc_namenode -Xmx1000m -Djava.library.pat
33262 reportin  20   0 15260 1556 1012 R  0.7  0.0   0:00.04 top
 2984 root      20   0 1227m  14m 1324 S  0.3  0.0  52:00.05 /usr/bin/python /opt/ericsson/nms/litp//bin/landscape_service.py --daem
32019 reportin  20   0 1090m 248m  18m S  0.3  0.4   0:09.21 /usr/java/default/bin/java -Xmx1000m -Djava.library.path=/opt/hadoop/de
49403 hyperic   20   0 5333m 216m  15m S  0.3  0.3  37:31.53 /usr/java/default/jre//bin/java -Djava.security.auth.login.config=../..
50715 reportin  20   0 5472m 380m  13m S  0.3  0.6   3:57.15 java -Xmx2048m -XX:MaxPermSize=128m -Dlogback.configurationFile=/opt/er
    1 root      20   0 19228 1100  896 S  0.0  0.0  10:53.91 /sbin/init
    2 root      20   0     0    0    0 S  0.0  0.0   0:00.02 [kthreadd]
    3 root      RT   0     0    0    0 S  0.0  0.0   0:27.01 [migration/0]


From: Stanley Shi [mailto:sshi@pivotal.io<ma...@pivotal.io>]
Sent: 2014年9月1日 14:32
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: CPU utilization keeps increasing when using HDFS

Would you please give the output of the "top" command? at least to show that the HDFS process did use that much of CPU;

On Mon, Sep 1, 2014 at 2:19 PM, Shiyuan Xiao <sh...@ericsson.com>> wrote:
Hi

We have written a MapReduce application based on Hadoop 2.4 which keeps reading data from HDFS(Pseudo-distributed mode in one node).
And we found the CPU system time and user time of the application keeps increasing when it is running. If we changed the application to read data from local disk without changing any other business logic, the CPU utilization kept stable. So we have conclusion that the CPU utilization is related to HDFS.
We want to know whether this issue is really related to HDFS and is there any solution to fix it?

[cid:image001.png@01CFC608.634A1070]

Thanks a lot!

BR/Shiyuan



--
Regards,
Stanley Shi,
[http://www.gopivotal.com/files/media/logos/pivotal-logo-email-signature.png]



--
Regards
Gordon Wang

Re: CPU utilization keeps increasing when using HDFS

Posted by Gordon Wang <gw...@pivotal.io>.

Because you are using one node Pseudo cluster. When HDFS client write data
to HDFS, client will compute the data chunk checksum and the datanode will
verify it. It costs cpu shares.
You can monitoring the cpu usages for each process. I guess the NameNode
cpu usage is OK. But the client process and DataNode process might use most
of the cpu shares.


On Mon, Sep 1, 2014 at 3:09 PM, Shiyuan Xiao <sh...@ericsson.com>
wrote:

>  Because we are running the application with accessing local disk now, I
> can’t give the “top” command’s output when running with HDFS.
>
>
>
> But we used “top” and “pidstat”  to check the CPU utilization of our
> application, I  can confirm  the CPU utilization of our application was
> increasing and the CPU utilization of datanode, namenode, resourcemanager
> and NodeManager processes kept stable.
>
>
>
>
>
> Below the “top” command’s output when the application is accessing local
> disk:
>
> [reporting@ms1 ~]$ top
>
>
>
> top - 15:04:58 up 33 days, 24 min,  3 users,  load average: 4.05, 4.08,
> 3.92
>
> Tasks: 361 total,   1 running, 360 sleeping,   0 stopped,   0 zombie
>
> Cpu(s): 34.5%us,  2.3%sy,  0.0%ni, 63.2%id,  0.0%wa,  0.0%hi,  0.0%si,
> 0.0%st
>
> Mem:  66068256k total, 54013596k used, 12054660k free,  3400140k buffers
>
> Swap:  2097144k total,   268376k used,  1828768k free, 41202752k cached
>
>
>
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
> COMMAND
>
> 33364 reportin  20   0 1628m 745m  17m S 168.2  1.2   0:05.07
> /usr/java/default/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.me
>
> 33130 reportin  20   0 1078m 246m  18m S 130.7  0.4   0:08.10
> /usr/java/default/bin/java -Xmx1000m -Djava.library.path=/opt/hadoop/d
>
> 33439 reportin  20   0 1613m 143m  17m S 108.1  0.2   0:03.26
> /usr/java/default/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.me
>
> 25690 reportin  20   0 1724m 530m  18m S  8.6  0.8   4:31.44
> /usr/java/default/bin/java -Dproc_nodemanager -Xmx1000m -Dhadoop.log.di
>
> 32879 reportin  20   0 1679m 370m  18m S  6.6  0.6   0:09.13
> /usr/java/default/bin/java -Dlog4j.configuration=container-log4j.proper
>
> 32642 reportin  20   0 1662m 372m  18m S  6.0  0.6   0:09.22
> /usr/java/default/bin/java -Dlog4j.configuration=container-log4j.proper
>
> 25200 reportin  20   0 1639m 326m  18m S  2.0  0.5   0:42.49
> /usr/java/default/bin/java -Dproc_datanode -Xmx1000m -Djava.library.pat
>
> 25576 reportin  20   0 1804m 400m  18m S  2.0  0.6   0:53.13
> /usr/java/default/bin/java -Dproc_resourcemanager -Xmx1000m -Dhadoop.lo
>
> 25058 reportin  20   0 1622m 401m  18m S  1.7  0.6   0:42.88
> /usr/java/default/bin/java -Dproc_namenode -Xmx1000m -Djava.library.pat
>
> 33262 reportin  20   0 15260 1556 1012 R  0.7  0.0   0:00.04
> top
>
>  2984 root      20   0 1227m  14m 1324 S  0.3  0.0  52:00.05
> /usr/bin/python /opt/ericsson/nms/litp//bin/landscape_service.py --daem
>
> 32019 reportin  20   0 1090m 248m  18m S  0.3  0.4   0:09.21
> /usr/java/default/bin/java -Xmx1000m -Djava.library.path=/opt/hadoop/de
>
> 49403 hyperic   20   0 5333m 216m  15m S  0.3  0.3  37:31.53
> /usr/java/default/jre//bin/java -Djava.security.auth.login.config=../..
>
> 50715 reportin  20   0 5472m 380m  13m S  0.3  0.6   3:57.15 java
> -Xmx2048m -XX:MaxPermSize=128m -Dlogback.configurationFile=/opt/er
>
>     1 root      20   0 19228 1100  896 S  0.0  0.0  10:53.91
> /sbin/init
>
>     2 root      20   0     0    0    0 S  0.0  0.0   0:00.02
> [kthreadd]
>
>     3 root      RT   0     0    0    0 S  0.0  0.0   0:27.01
> [migration/0]
>
>
>
>
>
> *From:* Stanley Shi [mailto:sshi@pivotal.io]
> *Sent:* 2014年9月1日 14:32
> *To:* user@hadoop.apache.org
> *Subject:* Re: CPU utilization keeps increasing when using HDFS
>
>
>
> Would you please give the output of the "top" command? at least to show
> that the HDFS process did use that much of CPU;
>
>
>
> On Mon, Sep 1, 2014 at 2:19 PM, Shiyuan Xiao <sh...@ericsson.com>
> wrote:
>
> Hi
>
>
> We have written a MapReduce application based on Hadoop 2.4 which keeps
> reading data from HDFS(Pseudo-distributed mode in one node).  And we
> found the CPU system time and user time of the application keeps increasing
> when it is running. If we changed the application to read data from local
> disk without changing any other business logic, the CPU utilization kept
> stable. So we have conclusion that the CPU utilization is related to HDFS. We
> want to know whether this issue is really related to HDFS and is there any
> solution to fix it?
>
>
>
>
>
> Thanks a lot!
>
>
>
> BR/Shiyuan
>
>
>
>
>
> --
>
> Regards,
>
> *Stanley Shi,*
>
>


-- 
Regards
Gordon Wang

Re: CPU utilization keeps increasing when using HDFS

Posted by Gordon Wang <gw...@pivotal.io>.

Because you are using one node Pseudo cluster. When HDFS client write data
to HDFS, client will compute the data chunk checksum and the datanode will
verify it. It costs cpu shares.
You can monitoring the cpu usages for each process. I guess the NameNode
cpu usage is OK. But the client process and DataNode process might use most
of the cpu shares.


On Mon, Sep 1, 2014 at 3:09 PM, Shiyuan Xiao <sh...@ericsson.com>
wrote:

>  Because we are running the application with accessing local disk now, I
> can’t give the “top” command’s output when running with HDFS.
>
>
>
> But we used “top” and “pidstat”  to check the CPU utilization of our
> application, I  can confirm  the CPU utilization of our application was
> increasing and the CPU utilization of datanode, namenode, resourcemanager
> and NodeManager processes kept stable.
>
>
>
>
>
> Below the “top” command’s output when the application is accessing local
> disk:
>
> [reporting@ms1 ~]$ top
>
>
>
> top - 15:04:58 up 33 days, 24 min,  3 users,  load average: 4.05, 4.08,
> 3.92
>
> Tasks: 361 total,   1 running, 360 sleeping,   0 stopped,   0 zombie
>
> Cpu(s): 34.5%us,  2.3%sy,  0.0%ni, 63.2%id,  0.0%wa,  0.0%hi,  0.0%si,
> 0.0%st
>
> Mem:  66068256k total, 54013596k used, 12054660k free,  3400140k buffers
>
> Swap:  2097144k total,   268376k used,  1828768k free, 41202752k cached
>
>
>
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
> COMMAND
>
> 33364 reportin  20   0 1628m 745m  17m S 168.2  1.2   0:05.07
> /usr/java/default/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.me
>
> 33130 reportin  20   0 1078m 246m  18m S 130.7  0.4   0:08.10
> /usr/java/default/bin/java -Xmx1000m -Djava.library.path=/opt/hadoop/d
>
> 33439 reportin  20   0 1613m 143m  17m S 108.1  0.2   0:03.26
> /usr/java/default/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.me
>
> 25690 reportin  20   0 1724m 530m  18m S  8.6  0.8   4:31.44
> /usr/java/default/bin/java -Dproc_nodemanager -Xmx1000m -Dhadoop.log.di
>
> 32879 reportin  20   0 1679m 370m  18m S  6.6  0.6   0:09.13
> /usr/java/default/bin/java -Dlog4j.configuration=container-log4j.proper
>
> 32642 reportin  20   0 1662m 372m  18m S  6.0  0.6   0:09.22
> /usr/java/default/bin/java -Dlog4j.configuration=container-log4j.proper
>
> 25200 reportin  20   0 1639m 326m  18m S  2.0  0.5   0:42.49
> /usr/java/default/bin/java -Dproc_datanode -Xmx1000m -Djava.library.pat
>
> 25576 reportin  20   0 1804m 400m  18m S  2.0  0.6   0:53.13
> /usr/java/default/bin/java -Dproc_resourcemanager -Xmx1000m -Dhadoop.lo
>
> 25058 reportin  20   0 1622m 401m  18m S  1.7  0.6   0:42.88
> /usr/java/default/bin/java -Dproc_namenode -Xmx1000m -Djava.library.pat
>
> 33262 reportin  20   0 15260 1556 1012 R  0.7  0.0   0:00.04
> top
>
>  2984 root      20   0 1227m  14m 1324 S  0.3  0.0  52:00.05
> /usr/bin/python /opt/ericsson/nms/litp//bin/landscape_service.py --daem
>
> 32019 reportin  20   0 1090m 248m  18m S  0.3  0.4   0:09.21
> /usr/java/default/bin/java -Xmx1000m -Djava.library.path=/opt/hadoop/de
>
> 49403 hyperic   20   0 5333m 216m  15m S  0.3  0.3  37:31.53
> /usr/java/default/jre//bin/java -Djava.security.auth.login.config=../..
>
> 50715 reportin  20   0 5472m 380m  13m S  0.3  0.6   3:57.15 java
> -Xmx2048m -XX:MaxPermSize=128m -Dlogback.configurationFile=/opt/er
>
>     1 root      20   0 19228 1100  896 S  0.0  0.0  10:53.91
> /sbin/init
>
>     2 root      20   0     0    0    0 S  0.0  0.0   0:00.02
> [kthreadd]
>
>     3 root      RT   0     0    0    0 S  0.0  0.0   0:27.01
> [migration/0]
>
>
>
>
>
> *From:* Stanley Shi [mailto:sshi@pivotal.io]
> *Sent:* 2014年9月1日 14:32
> *To:* user@hadoop.apache.org
> *Subject:* Re: CPU utilization keeps increasing when using HDFS
>
>
>
> Would you please give the output of the "top" command? at least to show
> that the HDFS process did use that much of CPU;
>
>
>
> On Mon, Sep 1, 2014 at 2:19 PM, Shiyuan Xiao <sh...@ericsson.com>
> wrote:
>
> Hi
>
>
> We have written a MapReduce application based on Hadoop 2.4 which keeps
> reading data from HDFS(Pseudo-distributed mode in one node).  And we
> found the CPU system time and user time of the application keeps increasing
> when it is running. If we changed the application to read data from local
> disk without changing any other business logic, the CPU utilization kept
> stable. So we have conclusion that the CPU utilization is related to HDFS. We
> want to know whether this issue is really related to HDFS and is there any
> solution to fix it?
>
>
>
>
>
> Thanks a lot!
>
>
>
> BR/Shiyuan
>
>
>
>
>
> --
>
> Regards,
>
> *Stanley Shi,*
>
>


-- 
Regards
Gordon Wang

Re: CPU utilization keeps increasing when using HDFS

Posted by Gordon Wang <gw...@pivotal.io>.

Because you are using one node Pseudo cluster. When HDFS client write data
to HDFS, client will compute the data chunk checksum and the datanode will
verify it. It costs cpu shares.
You can monitoring the cpu usages for each process. I guess the NameNode
cpu usage is OK. But the client process and DataNode process might use most
of the cpu shares.


On Mon, Sep 1, 2014 at 3:09 PM, Shiyuan Xiao <sh...@ericsson.com>
wrote:

>  Because we are running the application with accessing local disk now, I
> can’t give the “top” command’s output when running with HDFS.
>
>
>
> But we used “top” and “pidstat”  to check the CPU utilization of our
> application, I  can confirm  the CPU utilization of our application was
> increasing and the CPU utilization of datanode, namenode, resourcemanager
> and NodeManager processes kept stable.
>
>
>
>
>
> Below the “top” command’s output when the application is accessing local
> disk:
>
> [reporting@ms1 ~]$ top
>
>
>
> top - 15:04:58 up 33 days, 24 min,  3 users,  load average: 4.05, 4.08,
> 3.92
>
> Tasks: 361 total,   1 running, 360 sleeping,   0 stopped,   0 zombie
>
> Cpu(s): 34.5%us,  2.3%sy,  0.0%ni, 63.2%id,  0.0%wa,  0.0%hi,  0.0%si,
> 0.0%st
>
> Mem:  66068256k total, 54013596k used, 12054660k free,  3400140k buffers
>
> Swap:  2097144k total,   268376k used,  1828768k free, 41202752k cached
>
>
>
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
> COMMAND
>
> 33364 reportin  20   0 1628m 745m  17m S 168.2  1.2   0:05.07
> /usr/java/default/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.me
>
> 33130 reportin  20   0 1078m 246m  18m S 130.7  0.4   0:08.10
> /usr/java/default/bin/java -Xmx1000m -Djava.library.path=/opt/hadoop/d
>
> 33439 reportin  20   0 1613m 143m  17m S 108.1  0.2   0:03.26
> /usr/java/default/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.me
>
> 25690 reportin  20   0 1724m 530m  18m S  8.6  0.8   4:31.44
> /usr/java/default/bin/java -Dproc_nodemanager -Xmx1000m -Dhadoop.log.di
>
> 32879 reportin  20   0 1679m 370m  18m S  6.6  0.6   0:09.13
> /usr/java/default/bin/java -Dlog4j.configuration=container-log4j.proper
>
> 32642 reportin  20   0 1662m 372m  18m S  6.0  0.6   0:09.22
> /usr/java/default/bin/java -Dlog4j.configuration=container-log4j.proper
>
> 25200 reportin  20   0 1639m 326m  18m S  2.0  0.5   0:42.49
> /usr/java/default/bin/java -Dproc_datanode -Xmx1000m -Djava.library.pat
>
> 25576 reportin  20   0 1804m 400m  18m S  2.0  0.6   0:53.13
> /usr/java/default/bin/java -Dproc_resourcemanager -Xmx1000m -Dhadoop.lo
>
> 25058 reportin  20   0 1622m 401m  18m S  1.7  0.6   0:42.88
> /usr/java/default/bin/java -Dproc_namenode -Xmx1000m -Djava.library.pat
>
> 33262 reportin  20   0 15260 1556 1012 R  0.7  0.0   0:00.04
> top
>
>  2984 root      20   0 1227m  14m 1324 S  0.3  0.0  52:00.05
> /usr/bin/python /opt/ericsson/nms/litp//bin/landscape_service.py --daem
>
> 32019 reportin  20   0 1090m 248m  18m S  0.3  0.4   0:09.21
> /usr/java/default/bin/java -Xmx1000m -Djava.library.path=/opt/hadoop/de
>
> 49403 hyperic   20   0 5333m 216m  15m S  0.3  0.3  37:31.53
> /usr/java/default/jre//bin/java -Djava.security.auth.login.config=../..
>
> 50715 reportin  20   0 5472m 380m  13m S  0.3  0.6   3:57.15 java
> -Xmx2048m -XX:MaxPermSize=128m -Dlogback.configurationFile=/opt/er
>
>     1 root      20   0 19228 1100  896 S  0.0  0.0  10:53.91
> /sbin/init
>
>     2 root      20   0     0    0    0 S  0.0  0.0   0:00.02
> [kthreadd]
>
>     3 root      RT   0     0    0    0 S  0.0  0.0   0:27.01
> [migration/0]
>
>
>
>
>
> *From:* Stanley Shi [mailto:sshi@pivotal.io]
> *Sent:* 2014年9月1日 14:32
> *To:* user@hadoop.apache.org
> *Subject:* Re: CPU utilization keeps increasing when using HDFS
>
>
>
> Would you please give the output of the "top" command? at least to show
> that the HDFS process did use that much of CPU;
>
>
>
> On Mon, Sep 1, 2014 at 2:19 PM, Shiyuan Xiao <sh...@ericsson.com>
> wrote:
>
> Hi
>
>
> We have written a MapReduce application based on Hadoop 2.4 which keeps
> reading data from HDFS(Pseudo-distributed mode in one node).  And we
> found the CPU system time and user time of the application keeps increasing
> when it is running. If we changed the application to read data from local
> disk without changing any other business logic, the CPU utilization kept
> stable. So we have conclusion that the CPU utilization is related to HDFS. We
> want to know whether this issue is really related to HDFS and is there any
> solution to fix it?
>
>
>
>
>
> Thanks a lot!
>
>
>
> BR/Shiyuan
>
>
>
>
>
> --
>
> Regards,
>
> *Stanley Shi,*
>
>


-- 
Regards
Gordon Wang

Re: CPU utilization keeps increasing when using HDFS

Posted by Gordon Wang <gw...@pivotal.io>.

Because you are using one node Pseudo cluster. When HDFS client write data
to HDFS, client will compute the data chunk checksum and the datanode will
verify it. It costs cpu shares.
You can monitoring the cpu usages for each process. I guess the NameNode
cpu usage is OK. But the client process and DataNode process might use most
of the cpu shares.


On Mon, Sep 1, 2014 at 3:09 PM, Shiyuan Xiao <sh...@ericsson.com>
wrote:

>  Because we are running the application with accessing local disk now, I
> can’t give the “top” command’s output when running with HDFS.
>
>
>
> But we used “top” and “pidstat”  to check the CPU utilization of our
> application, I  can confirm  the CPU utilization of our application was
> increasing and the CPU utilization of datanode, namenode, resourcemanager
> and NodeManager processes kept stable.
>
>
>
>
>
> Below the “top” command’s output when the application is accessing local
> disk:
>
> [reporting@ms1 ~]$ top
>
>
>
> top - 15:04:58 up 33 days, 24 min,  3 users,  load average: 4.05, 4.08,
> 3.92
>
> Tasks: 361 total,   1 running, 360 sleeping,   0 stopped,   0 zombie
>
> Cpu(s): 34.5%us,  2.3%sy,  0.0%ni, 63.2%id,  0.0%wa,  0.0%hi,  0.0%si,
> 0.0%st
>
> Mem:  66068256k total, 54013596k used, 12054660k free,  3400140k buffers
>
> Swap:  2097144k total,   268376k used,  1828768k free, 41202752k cached
>
>
>
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
> COMMAND
>
> 33364 reportin  20   0 1628m 745m  17m S 168.2  1.2   0:05.07
> /usr/java/default/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.me
>
> 33130 reportin  20   0 1078m 246m  18m S 130.7  0.4   0:08.10
> /usr/java/default/bin/java -Xmx1000m -Djava.library.path=/opt/hadoop/d
>
> 33439 reportin  20   0 1613m 143m  17m S 108.1  0.2   0:03.26
> /usr/java/default/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.me
>
> 25690 reportin  20   0 1724m 530m  18m S  8.6  0.8   4:31.44
> /usr/java/default/bin/java -Dproc_nodemanager -Xmx1000m -Dhadoop.log.di
>
> 32879 reportin  20   0 1679m 370m  18m S  6.6  0.6   0:09.13
> /usr/java/default/bin/java -Dlog4j.configuration=container-log4j.proper
>
> 32642 reportin  20   0 1662m 372m  18m S  6.0  0.6   0:09.22
> /usr/java/default/bin/java -Dlog4j.configuration=container-log4j.proper
>
> 25200 reportin  20   0 1639m 326m  18m S  2.0  0.5   0:42.49
> /usr/java/default/bin/java -Dproc_datanode -Xmx1000m -Djava.library.pat
>
> 25576 reportin  20   0 1804m 400m  18m S  2.0  0.6   0:53.13
> /usr/java/default/bin/java -Dproc_resourcemanager -Xmx1000m -Dhadoop.lo
>
> 25058 reportin  20   0 1622m 401m  18m S  1.7  0.6   0:42.88
> /usr/java/default/bin/java -Dproc_namenode -Xmx1000m -Djava.library.pat
>
> 33262 reportin  20   0 15260 1556 1012 R  0.7  0.0   0:00.04
> top
>
>  2984 root      20   0 1227m  14m 1324 S  0.3  0.0  52:00.05
> /usr/bin/python /opt/ericsson/nms/litp//bin/landscape_service.py --daem
>
> 32019 reportin  20   0 1090m 248m  18m S  0.3  0.4   0:09.21
> /usr/java/default/bin/java -Xmx1000m -Djava.library.path=/opt/hadoop/de
>
> 49403 hyperic   20   0 5333m 216m  15m S  0.3  0.3  37:31.53
> /usr/java/default/jre//bin/java -Djava.security.auth.login.config=../..
>
> 50715 reportin  20   0 5472m 380m  13m S  0.3  0.6   3:57.15 java
> -Xmx2048m -XX:MaxPermSize=128m -Dlogback.configurationFile=/opt/er
>
>     1 root      20   0 19228 1100  896 S  0.0  0.0  10:53.91
> /sbin/init
>
>     2 root      20   0     0    0    0 S  0.0  0.0   0:00.02
> [kthreadd]
>
>     3 root      RT   0     0    0    0 S  0.0  0.0   0:27.01
> [migration/0]
>
>
>
>
>
> *From:* Stanley Shi [mailto:sshi@pivotal.io]
> *Sent:* 2014年9月1日 14:32
> *To:* user@hadoop.apache.org
> *Subject:* Re: CPU utilization keeps increasing when using HDFS
>
>
>
> Would you please give the output of the "top" command? at least to show
> that the HDFS process did use that much of CPU;
>
>
>
> On Mon, Sep 1, 2014 at 2:19 PM, Shiyuan Xiao <sh...@ericsson.com>
> wrote:
>
> Hi
>
>
> We have written a MapReduce application based on Hadoop 2.4 which keeps
> reading data from HDFS(Pseudo-distributed mode in one node).  And we
> found the CPU system time and user time of the application keeps increasing
> when it is running. If we changed the application to read data from local
> disk without changing any other business logic, the CPU utilization kept
> stable. So we have conclusion that the CPU utilization is related to HDFS. We
> want to know whether this issue is really related to HDFS and is there any
> solution to fix it?
>
>
>
>
>
> Thanks a lot!
>
>
>
> BR/Shiyuan
>
>
>
>
>
> --
>
> Regards,
>
> *Stanley Shi,*
>
>


-- 
Regards
Gordon Wang

RE: CPU utilization keeps increasing when using HDFS

Posted by Shiyuan Xiao <sh...@ericsson.com>.

Because we are running the application with accessing local disk now, I can’t give the “top” command’s output when running with HDFS.

But we used “top” and “pidstat”  to check the CPU utilization of our application, I  can confirm  the CPU utilization of our application was increasing and the CPU utilization of datanode, namenode, resourcemanager and NodeManager processes kept stable.


Below the “top” command’s output when the application is accessing local disk:
[reporting@ms1 ~]$ top

top - 15:04:58 up 33 days, 24 min,  3 users,  load average: 4.05, 4.08, 3.92
Tasks: 361 total,   1 running, 360 sleeping,   0 stopped,   0 zombie
Cpu(s): 34.5%us,  2.3%sy,  0.0%ni, 63.2%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  66068256k total, 54013596k used, 12054660k free,  3400140k buffers
Swap:  2097144k total,   268376k used,  1828768k free, 41202752k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
33364 reportin  20   0 1628m 745m  17m S 168.2  1.2   0:05.07 /usr/java/default/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.me
33130 reportin  20   0 1078m 246m  18m S 130.7  0.4   0:08.10 /usr/java/default/bin/java -Xmx1000m -Djava.library.path=/opt/hadoop/d
33439 reportin  20   0 1613m 143m  17m S 108.1  0.2   0:03.26 /usr/java/default/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.me
25690 reportin  20   0 1724m 530m  18m S  8.6  0.8   4:31.44 /usr/java/default/bin/java -Dproc_nodemanager -Xmx1000m -Dhadoop.log.di
32879 reportin  20   0 1679m 370m  18m S  6.6  0.6   0:09.13 /usr/java/default/bin/java -Dlog4j.configuration=container-log4j.proper
32642 reportin  20   0 1662m 372m  18m S  6.0  0.6   0:09.22 /usr/java/default/bin/java -Dlog4j.configuration=container-log4j.proper
25200 reportin  20   0 1639m 326m  18m S  2.0  0.5   0:42.49 /usr/java/default/bin/java -Dproc_datanode -Xmx1000m -Djava.library.pat
25576 reportin  20   0 1804m 400m  18m S  2.0  0.6   0:53.13 /usr/java/default/bin/java -Dproc_resourcemanager -Xmx1000m -Dhadoop.lo
25058 reportin  20   0 1622m 401m  18m S  1.7  0.6   0:42.88 /usr/java/default/bin/java -Dproc_namenode -Xmx1000m -Djava.library.pat
33262 reportin  20   0 15260 1556 1012 R  0.7  0.0   0:00.04 top
 2984 root      20   0 1227m  14m 1324 S  0.3  0.0  52:00.05 /usr/bin/python /opt/ericsson/nms/litp//bin/landscape_service.py --daem
32019 reportin  20   0 1090m 248m  18m S  0.3  0.4   0:09.21 /usr/java/default/bin/java -Xmx1000m -Djava.library.path=/opt/hadoop/de
49403 hyperic   20   0 5333m 216m  15m S  0.3  0.3  37:31.53 /usr/java/default/jre//bin/java -Djava.security.auth.login.config=../..
50715 reportin  20   0 5472m 380m  13m S  0.3  0.6   3:57.15 java -Xmx2048m -XX:MaxPermSize=128m -Dlogback.configurationFile=/opt/er
    1 root      20   0 19228 1100  896 S  0.0  0.0  10:53.91 /sbin/init
    2 root      20   0     0    0    0 S  0.0  0.0   0:00.02 [kthreadd]
    3 root      RT   0     0    0    0 S  0.0  0.0   0:27.01 [migration/0]


From: Stanley Shi [mailto:sshi@pivotal.io]
Sent: 2014年9月1日 14:32
To: user@hadoop.apache.org
Subject: Re: CPU utilization keeps increasing when using HDFS

Would you please give the output of the "top" command? at least to show that the HDFS process did use that much of CPU;

On Mon, Sep 1, 2014 at 2:19 PM, Shiyuan Xiao <sh...@ericsson.com>> wrote:
Hi

We have written a MapReduce application based on Hadoop 2.4 which keeps reading data from HDFS(Pseudo-distributed mode in one node).
And we found the CPU system time and user time of the application keeps increasing when it is running. If we changed the application to read data from local disk without changing any other business logic, the CPU utilization kept stable. So we have conclusion that the CPU utilization is related to HDFS.
We want to know whether this issue is really related to HDFS and is there any solution to fix it?

[cid:image001.png@01CFC5F2.71B29440]

Thanks a lot!

BR/Shiyuan



--
Regards,
Stanley Shi,
[http://www.gopivotal.com/files/media/logos/pivotal-logo-email-signature.png]

RE: CPU utilization keeps increasing when using HDFS

Posted by Shiyuan Xiao <sh...@ericsson.com>.

Because we are running the application with accessing local disk now, I can’t give the “top” command’s output when running with HDFS.

But we used “top” and “pidstat”  to check the CPU utilization of our application, I  can confirm  the CPU utilization of our application was increasing and the CPU utilization of datanode, namenode, resourcemanager and NodeManager processes kept stable.


Below the “top” command’s output when the application is accessing local disk:
[reporting@ms1 ~]$ top

top - 15:04:58 up 33 days, 24 min,  3 users,  load average: 4.05, 4.08, 3.92
Tasks: 361 total,   1 running, 360 sleeping,   0 stopped,   0 zombie
Cpu(s): 34.5%us,  2.3%sy,  0.0%ni, 63.2%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  66068256k total, 54013596k used, 12054660k free,  3400140k buffers
Swap:  2097144k total,   268376k used,  1828768k free, 41202752k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
33364 reportin  20   0 1628m 745m  17m S 168.2  1.2   0:05.07 /usr/java/default/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.me
33130 reportin  20   0 1078m 246m  18m S 130.7  0.4   0:08.10 /usr/java/default/bin/java -Xmx1000m -Djava.library.path=/opt/hadoop/d
33439 reportin  20   0 1613m 143m  17m S 108.1  0.2   0:03.26 /usr/java/default/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.me
25690 reportin  20   0 1724m 530m  18m S  8.6  0.8   4:31.44 /usr/java/default/bin/java -Dproc_nodemanager -Xmx1000m -Dhadoop.log.di
32879 reportin  20   0 1679m 370m  18m S  6.6  0.6   0:09.13 /usr/java/default/bin/java -Dlog4j.configuration=container-log4j.proper
32642 reportin  20   0 1662m 372m  18m S  6.0  0.6   0:09.22 /usr/java/default/bin/java -Dlog4j.configuration=container-log4j.proper
25200 reportin  20   0 1639m 326m  18m S  2.0  0.5   0:42.49 /usr/java/default/bin/java -Dproc_datanode -Xmx1000m -Djava.library.pat
25576 reportin  20   0 1804m 400m  18m S  2.0  0.6   0:53.13 /usr/java/default/bin/java -Dproc_resourcemanager -Xmx1000m -Dhadoop.lo
25058 reportin  20   0 1622m 401m  18m S  1.7  0.6   0:42.88 /usr/java/default/bin/java -Dproc_namenode -Xmx1000m -Djava.library.pat
33262 reportin  20   0 15260 1556 1012 R  0.7  0.0   0:00.04 top
 2984 root      20   0 1227m  14m 1324 S  0.3  0.0  52:00.05 /usr/bin/python /opt/ericsson/nms/litp//bin/landscape_service.py --daem
32019 reportin  20   0 1090m 248m  18m S  0.3  0.4   0:09.21 /usr/java/default/bin/java -Xmx1000m -Djava.library.path=/opt/hadoop/de
49403 hyperic   20   0 5333m 216m  15m S  0.3  0.3  37:31.53 /usr/java/default/jre//bin/java -Djava.security.auth.login.config=../..
50715 reportin  20   0 5472m 380m  13m S  0.3  0.6   3:57.15 java -Xmx2048m -XX:MaxPermSize=128m -Dlogback.configurationFile=/opt/er
    1 root      20   0 19228 1100  896 S  0.0  0.0  10:53.91 /sbin/init
    2 root      20   0     0    0    0 S  0.0  0.0   0:00.02 [kthreadd]
    3 root      RT   0     0    0    0 S  0.0  0.0   0:27.01 [migration/0]


From: Stanley Shi [mailto:sshi@pivotal.io]
Sent: 2014年9月1日 14:32
To: user@hadoop.apache.org
Subject: Re: CPU utilization keeps increasing when using HDFS

Would you please give the output of the "top" command? at least to show that the HDFS process did use that much of CPU;

On Mon, Sep 1, 2014 at 2:19 PM, Shiyuan Xiao <sh...@ericsson.com>> wrote:
Hi

We have written a MapReduce application based on Hadoop 2.4 which keeps reading data from HDFS(Pseudo-distributed mode in one node).
And we found the CPU system time and user time of the application keeps increasing when it is running. If we changed the application to read data from local disk without changing any other business logic, the CPU utilization kept stable. So we have conclusion that the CPU utilization is related to HDFS.
We want to know whether this issue is really related to HDFS and is there any solution to fix it?

[cid:image001.png@01CFC5F2.71B29440]

Thanks a lot!

BR/Shiyuan



--
Regards,
Stanley Shi,
[http://www.gopivotal.com/files/media/logos/pivotal-logo-email-signature.png]

RE: CPU utilization keeps increasing when using HDFS

Posted by Shiyuan Xiao <sh...@ericsson.com>.

Because we are running the application with accessing local disk now, I can’t give the “top” command’s output when running with HDFS.

But we used “top” and “pidstat”  to check the CPU utilization of our application, I  can confirm  the CPU utilization of our application was increasing and the CPU utilization of datanode, namenode, resourcemanager and NodeManager processes kept stable.


Below the “top” command’s output when the application is accessing local disk:
[reporting@ms1 ~]$ top

top - 15:04:58 up 33 days, 24 min,  3 users,  load average: 4.05, 4.08, 3.92
Tasks: 361 total,   1 running, 360 sleeping,   0 stopped,   0 zombie
Cpu(s): 34.5%us,  2.3%sy,  0.0%ni, 63.2%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  66068256k total, 54013596k used, 12054660k free,  3400140k buffers
Swap:  2097144k total,   268376k used,  1828768k free, 41202752k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
33364 reportin  20   0 1628m 745m  17m S 168.2  1.2   0:05.07 /usr/java/default/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.me
33130 reportin  20   0 1078m 246m  18m S 130.7  0.4   0:08.10 /usr/java/default/bin/java -Xmx1000m -Djava.library.path=/opt/hadoop/d
33439 reportin  20   0 1613m 143m  17m S 108.1  0.2   0:03.26 /usr/java/default/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.me
25690 reportin  20   0 1724m 530m  18m S  8.6  0.8   4:31.44 /usr/java/default/bin/java -Dproc_nodemanager -Xmx1000m -Dhadoop.log.di
32879 reportin  20   0 1679m 370m  18m S  6.6  0.6   0:09.13 /usr/java/default/bin/java -Dlog4j.configuration=container-log4j.proper
32642 reportin  20   0 1662m 372m  18m S  6.0  0.6   0:09.22 /usr/java/default/bin/java -Dlog4j.configuration=container-log4j.proper
25200 reportin  20   0 1639m 326m  18m S  2.0  0.5   0:42.49 /usr/java/default/bin/java -Dproc_datanode -Xmx1000m -Djava.library.pat
25576 reportin  20   0 1804m 400m  18m S  2.0  0.6   0:53.13 /usr/java/default/bin/java -Dproc_resourcemanager -Xmx1000m -Dhadoop.lo
25058 reportin  20   0 1622m 401m  18m S  1.7  0.6   0:42.88 /usr/java/default/bin/java -Dproc_namenode -Xmx1000m -Djava.library.pat
33262 reportin  20   0 15260 1556 1012 R  0.7  0.0   0:00.04 top
 2984 root      20   0 1227m  14m 1324 S  0.3  0.0  52:00.05 /usr/bin/python /opt/ericsson/nms/litp//bin/landscape_service.py --daem
32019 reportin  20   0 1090m 248m  18m S  0.3  0.4   0:09.21 /usr/java/default/bin/java -Xmx1000m -Djava.library.path=/opt/hadoop/de
49403 hyperic   20   0 5333m 216m  15m S  0.3  0.3  37:31.53 /usr/java/default/jre//bin/java -Djava.security.auth.login.config=../..
50715 reportin  20   0 5472m 380m  13m S  0.3  0.6   3:57.15 java -Xmx2048m -XX:MaxPermSize=128m -Dlogback.configurationFile=/opt/er
    1 root      20   0 19228 1100  896 S  0.0  0.0  10:53.91 /sbin/init
    2 root      20   0     0    0    0 S  0.0  0.0   0:00.02 [kthreadd]
    3 root      RT   0     0    0    0 S  0.0  0.0   0:27.01 [migration/0]


From: Stanley Shi [mailto:sshi@pivotal.io]
Sent: 2014年9月1日 14:32
To: user@hadoop.apache.org
Subject: Re: CPU utilization keeps increasing when using HDFS

Would you please give the output of the "top" command? at least to show that the HDFS process did use that much of CPU;

On Mon, Sep 1, 2014 at 2:19 PM, Shiyuan Xiao <sh...@ericsson.com>> wrote:
Hi

We have written a MapReduce application based on Hadoop 2.4 which keeps reading data from HDFS(Pseudo-distributed mode in one node).
And we found the CPU system time and user time of the application keeps increasing when it is running. If we changed the application to read data from local disk without changing any other business logic, the CPU utilization kept stable. So we have conclusion that the CPU utilization is related to HDFS.
We want to know whether this issue is really related to HDFS and is there any solution to fix it?

[cid:image001.png@01CFC5F2.71B29440]

Thanks a lot!

BR/Shiyuan



--
Regards,
Stanley Shi,
[http://www.gopivotal.com/files/media/logos/pivotal-logo-email-signature.png]

RE: CPU utilization keeps increasing when using HDFS

Posted by Shiyuan Xiao <sh...@ericsson.com>.

Because we are running the application with accessing local disk now, I can’t give the “top” command’s output when running with HDFS.

But we used “top” and “pidstat”  to check the CPU utilization of our application, I  can confirm  the CPU utilization of our application was increasing and the CPU utilization of datanode, namenode, resourcemanager and NodeManager processes kept stable.


Below the “top” command’s output when the application is accessing local disk:
[reporting@ms1 ~]$ top

top - 15:04:58 up 33 days, 24 min,  3 users,  load average: 4.05, 4.08, 3.92
Tasks: 361 total,   1 running, 360 sleeping,   0 stopped,   0 zombie
Cpu(s): 34.5%us,  2.3%sy,  0.0%ni, 63.2%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  66068256k total, 54013596k used, 12054660k free,  3400140k buffers
Swap:  2097144k total,   268376k used,  1828768k free, 41202752k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
33364 reportin  20   0 1628m 745m  17m S 168.2  1.2   0:05.07 /usr/java/default/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.me
33130 reportin  20   0 1078m 246m  18m S 130.7  0.4   0:08.10 /usr/java/default/bin/java -Xmx1000m -Djava.library.path=/opt/hadoop/d
33439 reportin  20   0 1613m 143m  17m S 108.1  0.2   0:03.26 /usr/java/default/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.me
25690 reportin  20   0 1724m 530m  18m S  8.6  0.8   4:31.44 /usr/java/default/bin/java -Dproc_nodemanager -Xmx1000m -Dhadoop.log.di
32879 reportin  20   0 1679m 370m  18m S  6.6  0.6   0:09.13 /usr/java/default/bin/java -Dlog4j.configuration=container-log4j.proper
32642 reportin  20   0 1662m 372m  18m S  6.0  0.6   0:09.22 /usr/java/default/bin/java -Dlog4j.configuration=container-log4j.proper
25200 reportin  20   0 1639m 326m  18m S  2.0  0.5   0:42.49 /usr/java/default/bin/java -Dproc_datanode -Xmx1000m -Djava.library.pat
25576 reportin  20   0 1804m 400m  18m S  2.0  0.6   0:53.13 /usr/java/default/bin/java -Dproc_resourcemanager -Xmx1000m -Dhadoop.lo
25058 reportin  20   0 1622m 401m  18m S  1.7  0.6   0:42.88 /usr/java/default/bin/java -Dproc_namenode -Xmx1000m -Djava.library.pat
33262 reportin  20   0 15260 1556 1012 R  0.7  0.0   0:00.04 top
 2984 root      20   0 1227m  14m 1324 S  0.3  0.0  52:00.05 /usr/bin/python /opt/ericsson/nms/litp//bin/landscape_service.py --daem
32019 reportin  20   0 1090m 248m  18m S  0.3  0.4   0:09.21 /usr/java/default/bin/java -Xmx1000m -Djava.library.path=/opt/hadoop/de
49403 hyperic   20   0 5333m 216m  15m S  0.3  0.3  37:31.53 /usr/java/default/jre//bin/java -Djava.security.auth.login.config=../..
50715 reportin  20   0 5472m 380m  13m S  0.3  0.6   3:57.15 java -Xmx2048m -XX:MaxPermSize=128m -Dlogback.configurationFile=/opt/er
    1 root      20   0 19228 1100  896 S  0.0  0.0  10:53.91 /sbin/init
    2 root      20   0     0    0    0 S  0.0  0.0   0:00.02 [kthreadd]
    3 root      RT   0     0    0    0 S  0.0  0.0   0:27.01 [migration/0]


From: Stanley Shi [mailto:sshi@pivotal.io]
Sent: 2014年9月1日 14:32
To: user@hadoop.apache.org
Subject: Re: CPU utilization keeps increasing when using HDFS

Would you please give the output of the "top" command? at least to show that the HDFS process did use that much of CPU;

On Mon, Sep 1, 2014 at 2:19 PM, Shiyuan Xiao <sh...@ericsson.com>> wrote:
Hi

We have written a MapReduce application based on Hadoop 2.4 which keeps reading data from HDFS(Pseudo-distributed mode in one node).
And we found the CPU system time and user time of the application keeps increasing when it is running. If we changed the application to read data from local disk without changing any other business logic, the CPU utilization kept stable. So we have conclusion that the CPU utilization is related to HDFS.
We want to know whether this issue is really related to HDFS and is there any solution to fix it?

[cid:image001.png@01CFC5F2.71B29440]

Thanks a lot!

BR/Shiyuan



--
Regards,
Stanley Shi,
[http://www.gopivotal.com/files/media/logos/pivotal-logo-email-signature.png]

Re: CPU utilization keeps increasing when using HDFS

Posted by Stanley Shi <ss...@pivotal.io>.

Would you please give the output of the "top" command? at least to show
that the HDFS process did use that much of CPU;


On Mon, Sep 1, 2014 at 2:19 PM, Shiyuan Xiao <sh...@ericsson.com>
wrote:

>  Hi
>
>
> We have written a MapReduce application based on Hadoop 2.4 which keeps
> reading data from HDFS(Pseudo-distributed mode in one node).  And we
> found the CPU system time and user time of the application keeps increasing
> when it is running. If we changed the application to read data from local
> disk without changing any other business logic, the CPU utilization kept
> stable. So we have conclusion that the CPU utilization is related to HDFS. We
> want to know whether this issue is really related to HDFS and is there any
> solution to fix it?
>
>
>
>
>
> Thanks a lot!
>
>
>
> BR/Shiyuan
>



-- 
Regards,
*Stanley Shi,*

Re: CPU utilization keeps increasing when using HDFS

Posted by Stanley Shi <ss...@pivotal.io>.

Would you please give the output of the "top" command? at least to show
that the HDFS process did use that much of CPU;


On Mon, Sep 1, 2014 at 2:19 PM, Shiyuan Xiao <sh...@ericsson.com>
wrote:

>  Hi
>
>
> We have written a MapReduce application based on Hadoop 2.4 which keeps
> reading data from HDFS(Pseudo-distributed mode in one node).  And we
> found the CPU system time and user time of the application keeps increasing
> when it is running. If we changed the application to read data from local
> disk without changing any other business logic, the CPU utilization kept
> stable. So we have conclusion that the CPU utilization is related to HDFS. We
> want to know whether this issue is really related to HDFS and is there any
> solution to fix it?
>
>
>
>
>
> Thanks a lot!
>
>
>
> BR/Shiyuan
>



-- 
Regards,
*Stanley Shi,*

Re: CPU utilization keeps increasing when using HDFS

Posted by Stanley Shi <ss...@pivotal.io>.

Would you please give the output of the "top" command? at least to show
that the HDFS process did use that much of CPU;


On Mon, Sep 1, 2014 at 2:19 PM, Shiyuan Xiao <sh...@ericsson.com>
wrote:

>  Hi
>
>
> We have written a MapReduce application based on Hadoop 2.4 which keeps
> reading data from HDFS(Pseudo-distributed mode in one node).  And we
> found the CPU system time and user time of the application keeps increasing
> when it is running. If we changed the application to read data from local
> disk without changing any other business logic, the CPU utilization kept
> stable. So we have conclusion that the CPU utilization is related to HDFS. We
> want to know whether this issue is really related to HDFS and is there any
> solution to fix it?
>
>
>
>
>
> Thanks a lot!
>
>
>
> BR/Shiyuan
>



-- 
Regards,
*Stanley Shi,*

Re: CPU utilization keeps increasing when using HDFS

Posted by Stanley Shi <ss...@pivotal.io>.

Would you please give the output of the "top" command? at least to show
that the HDFS process did use that much of CPU;


On Mon, Sep 1, 2014 at 2:19 PM, Shiyuan Xiao <sh...@ericsson.com>
wrote:

>  Hi
>
>
> We have written a MapReduce application based on Hadoop 2.4 which keeps
> reading data from HDFS(Pseudo-distributed mode in one node).  And we
> found the CPU system time and user time of the application keeps increasing
> when it is running. If we changed the application to read data from local
> disk without changing any other business logic, the CPU utilization kept
> stable. So we have conclusion that the CPU utilization is related to HDFS. We
> want to know whether this issue is really related to HDFS and is there any
> solution to fix it?
>
>
>
>
>
> Thanks a lot!
>
>
>
> BR/Shiyuan
>



-- 
Regards,
*Stanley Shi,*