You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by Xiaohan <yi...@huawei.com> on 2012/10/24 16:44:45 UTC

Ask for help?

In our production environment. We encount a problem about the performance of NameNode.
We configure the sharestorge of NameNode with bookkeeper. And our version of hadoop is 2.0.1, bk is 4.1.0.

The problem is: When the hdfs system has run for a while(2-3 days), we found the performance descreased dramatically!
The benchmark with nnbench from hadoop-mapreduce-client-jobclient-2.0.1-tests.jar is like:

First use:
./yarn jar ../share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.0.1-tests.jar nnbench -operation create_write -numberOfFiles 10
We get:
12/10/20 20:05:43 INFO hdfs.NNBench:                TPS: Create/Write/Close: 52

Two days later, we get:
12/10/23 18:34:42 INFO hdfs.NNBench:                TPS: Create/Write/Close: 1
//The "Avg exec time (ms): Create/Write/Close:" is even larger, maybe than 1000ms, so the TPS here may be smaller for precision.

And the logs in NameNode, we found the difference from each of the times:

2012-10-20 20:05:43,249 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: **** Number of syncs: 1347 SyncTimes(ms): 14138 3677

2012-10-22 18:34:42,223 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: **** Number of syncs: 51 SyncTimes(ms): 34553 312

We inspect that it is the problem of Bookkeeper. Anyone ever encounter that or any clue for that? Thanks very much.
The environment is strictly controlled, and the logs can only be copied by hand. So the logs are not so detailed.

Re: Ask for help?

Posted by Ivan Kelly <iv...@apache.org>.
On Wed, Oct 24, 2012 at 02:44:45PM +0000, Xiaohan wrote:
> And the logs in NameNode, we found the difference from each of the times:
> 
> 2012-10-20 20:05:43,249 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: **** Number of syncs: 1347 SyncTimes(ms): 14138 3677
> 
> 2012-10-22 18:34:42,223 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: **** Number of syncs: 51 SyncTimes(ms): 34553 312
> 
> We inspect that it is the problem of Bookkeeper. Anyone ever encounter that or any clue for that? Thanks very much.
> The environment is strictly controlled, and the logs can only be copied by hand. So the logs are not so detailed.
How many bookies are you using? Are any of the bookies displaying disk
errors? what does iostat say on the bookies and on the namenode? 

It does look like the editlog is the culprit here. However it's not
clear that it's BK. If BK is the shared edits, it should be second in
the list of journals. From the sync times, the second journal seems to
be performing fine.

-Ivan