You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "Tony Reix (JIRA)" <ji...@apache.org> on 2014/06/30 11:51:24 UTC

[jira] [Created] (HDFS-6608) FsDatasetCache: hard-coded 4096 value in test is not appropriate for all HW

Tony Reix created HDFS-6608:
-------------------------------

             Summary: FsDatasetCache: hard-coded 4096 value in test is not appropriate for all HW
                 Key: HDFS-6608
                 URL: https://issues.apache.org/jira/browse/HDFS-6608
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: test
    Affects Versions: 3.0.0
         Environment: PPC64 (LE & BE, OpenJDK & IBM JVM, Ubuntu, RHEL 7 & RHEL 6.5)
            Reporter: Tony Reix


The value 4096 is hard-coded in HDFS code (product and tests).
It appears 171 times, including 8 times in product (not tests) code:
hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs : 163
hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs : 4
hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/fs/http : 3
hadoop-hdfs-httpfs/src/main/java/org/apache/hadoop/lib/wsrs : 1

This value deals with different subjects: files, block size, page size, etc.
4096 (as block size and page size) is appropriate for many systems, but not for PPC64, for which it is 65536.

Looking at HDFS product (not test) code, it seems (no 100% sure) that the code is OK (not using hard-coded page/block size). However someone should check this in depth.

his.maxBytes = dataset.datanode.getDnConf().getMaxLockedMemory();

However, at test level, the value 4096 is used in many places and it is very hard to understand if it depends on the HW architecture or not.

About test TestFsDatasetCache#testPageRounder, the HW value is sometimes got from the system :
 private static final long PAGE_SIZE = NativeIO.POSIX.getCacheManipulator().getOperatingSystemPageSize();
private static final long BLOCK_SIZE = PAGE_SIZE;
but there are several places where 4096 is used whenever it should depend on the HW value.

conf.setLong(DFSConfigKeys.DFS_DATANODE_MAX_LOCKED_MEMORY_KEY, CACHE_CAPACITY);
 With:
// Most Linux installs allow a default of 64KB locked memory
private static final long CACHE_CAPACITY = 64 * 1024
However, for PPC64, this value should be much bigger.

This TestFsDatasetCache#testPageRounder test is aimed to cache 5 pages of size 512. However, the page size is 65536 on PPC64 and 4064 on x86_64. Thus, the method in charge of reserving blocks in the HDFS cache will by 4096 bytes steps on x86_64 and 65536 bytes steps on PPC64 , whith a hard-coded limit : maxBytes = 65536 bytes

5 * 4096 = 20480 : OK
5 * 65536 = 327680 : KO : the test ends by TimeOut since the limit is overpassed at the very beginning and the test is still waiting.

As a conclusion, there are several issues to fix:
 - instead of using many hard-coded values 4096, the (test mainly) code should use Java constants built by using HW values (like : NativeIO.POSIX.getCacheManipulator().getOperatingSystemPageSize() )
 - several constants must be used since 4096 deals with different subjects, included some that do not depend on the HW
 - the test must be improved for handling cases where the limit is over-passed at the very beginning



--
This message was sent by Atlassian JIRA
(v6.2#6252)