You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "Mania Abdi (Jira)" <ji...@apache.org> on 2020/03/04 03:13:00 UTC

[jira] [Created] (HDFS-15206) HDFS synchronous reads from local file system

Mania Abdi created HDFS-15206:
---------------------------------

             Summary: HDFS synchronous reads from local file system
                 Key: HDFS-15206
                 URL: https://issues.apache.org/jira/browse/HDFS-15206
             Project: Hadoop HDFS
          Issue Type: Improvement
         Environment: !Screenshot from 2020-03-03 22-07-26.png!
            Reporter: Mania Abdi
         Attachments: Screenshot from 2020-03-03 22-07-26.png

Hello everyone,

 I ran a simple benchmark with runs ``` hadoop fs -get /file1.txt ```, and file1.txt has 1MB size and I capture the workflow of requests using XTrace. By evaluating the workflow trace, I noticed that datanode issues 64KB synchronous read request to local file system to read the data, and sends the data back and waits for completion. I had a code walk over HDFS code to verify the pattern and it was correct. I want to have two suggestions, (1) since each file in HDFS block size is usually 128MB, We could use the mmap mapping via FileChannel class to load the file into memory and enable file system prefetching and look ahead in background, instead of synchronously reading from disk. The second suggestion is to use asynchronous read operations to local disk of the datanode. I was wondering if there is a logic behind synchronous reads from the file system?

 

Code: 

 

 

 

 

XTrace: [http://brownsys.github.io/tracing-framework/xtrace/server/]

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org