You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by dl...@comcast.net on 2015/07/31 02:10:39 UTC

SLive Tests

  I have been running some SLive tests over the past couple of days to test
NameNode performance on different filesystems. Our setup includes:

 

.         CentOS 6.6

.         CDH 5.3.0 with Java 7

.         2 NameNodes in an HA configuration

.         Both NameNodes are on the same type of hardware

.         Journal Nodes are running on the NameNode servers, and one Journal
Node running on another node (3 total)

.         NameNodes and JournalNodes are writing to different devices on the
NameNode machines

 

For the tests I used SLive to create a large number of files to create a
rather large NameNode data structure. Then, I ran some tests varying the #
of mappers and # of total operations, but always limiting the test to 30
minutes. The tests ran with a 70% read ratio and a mix of the other
operations to get to 100%. When changing the filesystems, we saved off the
fsimage and other information, reformatted the name, edits, checkpoints, and
journal directories, mounted them and put the fsimage and other information
back in their original location and restarted the NameNodes. We made sure to
change nothing else during the tests.

 

We tested ext3, ext4, and xfs filesystems. When we changed the filesystem
type, we changed it to the same type on all 3 machines (2NN + 1JN). Using
the total # of operations completed during the 30 minute test we found that
ext4 seemed to be the best choice. Using ext3 we completed 1% less
operations on average and with xfs it was about 30% less. I was a little
shocked by this, so we ran the xfs tests three times, attempting to tune the
XFS filesystem and mount options with no success. Thinking about this a
little, XFS has superior performance for parallel writes due to multiple
inode tables. So, some questions: 

 

1.       Is XFS known to be slower for single-threaded write performance
with multiple inode tables?

2.       are the writes to the edits and journals multi-threaded? I know
that they are sync'd, but is it a single writer?

3.       Is using the total # of ops the correct way to use SLive?

 

Next, I wanted to test the penalty for HA NameNode. So, we took the best
performing test (ext4) and changed the setup to non-HA, with a single NN on
one of the same NN machines. Running the same tests, the single NN exhibited
much less (like 10x less) total operations in the same amount of time. This
does not seem correct. What is different about the I/O paths for a single NN
(other than the writes to the JNs)? What did I do incorrectly?

 

Thanks,

Dave