You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Emre Sevinc <em...@gmail.com> on 2016/03/01 12:39:44 UTC

How does YARN decide to create how many containers? (Why the dramatic difference between S3a and HDFS?)

Hello,

I'm using the current version of Hadoop, and running some TestDFSIO
benchmarks (v. 1.8) to compare the cases where the default file system is
HDFS versus the default file system is an S3 bucket (used via S3a).

When reading 100 x 1 MB files with default file system as S3a, I observe
the number of max containers in YARN Web UI is less than the case for HDFS
as default, and S3a is about 4 times slower.

When reading 1000 x 10 KB files with default file system as S3a, I observe
that the number of max containers in YARN Web UI is at least 10 times less
than the case for HDFS as default, and S3a is about 16 times slower. (E.g.
*50 seconds* of test execution time with HDFS default, versus *16 minutes*
of test execution time with S3a default.)

The number of Launched Map Tasks is as expected in each case, there's *no
difference with respect* to that. But why is YARN creating at least 10
times less number of containers (e.g. 117 on HDFS versus 8 on S3a)? How
does YARN decide to create how many number of containers when the cluster's
vcores, RAM, and the job's input splits, and launched map tasks are the
same; and only the storage back-end is different?

It might be of course OK to expect a performance difference between HDFS
versus Amazon S3 (via S3a) when running the same TestDFSIO jobs; what I'm
after is understanding how YARN is deciding the number of max containers it
launches during those jobs, where only the default file system is changed,
because currently, it is like, when the default file system is S3a, YARN is
almost not using 90% of the parallelism (which it normally does when the
default file system is HDFS).

The cluster is a 15-node cluster, with 1 NameNode, 1 ResourceManager
(YARN), and 13 DataNodes (worker nodes). Each node has 128 GB RAM, and
48-core CPU. This is a dedicated testing cluster: during TestDFSIO test
runs, nothing else runs on the cluster.

For HDFS, the dfs.blocksize is 256m, and it uses 4 HDDs
(dfs.datanode.data.dir is set to
file:///mnt/hadoopData1,file:///mnt/hadoopData2,file:///mnt/hadoopData3,file:///mnt/hadoopData4).

For S3a, fs.s3a.block.size is set to 268435456, that is 256m, same as HDFS
default block size.

The Hadoop tmp directory is on an SSD (by setting hadoop.tmp.dir to
/mnt/ssd1/tmp)

The performance difference (default HDFS, versus default set to S3a) is
summarized below:

TestDFSIO v. 1.8  (READ)
fs.default.name                # of Files x Size of File   Launched Map
Tasks    Max # of containers observed in YARN Web UI    Test exec time sec
=============================  =========================
==================   ===========================================
==================
hdfs://hadoop1:9000            100  x  1 MB
100                 117                                               19
hdfs://hadoop1:9000            1000 x 10 KB
1000                 117                                               56
s3a://emre-hadoop-test-bucket  100  x  1 MB
100                  60                                               78
s3a://emre-hadoop-test-bucket  1000 x 10 KB
1000                   8                                             1012


PS: I've posted this question also to StackOverflow, at
http://stackoverflow.com/q/35721188/236007

--
Emre Sevinç