You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "Zheng Shao (JIRA)" <ji...@apache.org> on 2019/01/25 07:49:00 UTC

[jira] [Created] (HDFS-14229) Nonblocking HDFS create|write

Zheng Shao created HDFS-14229:
---------------------------------

             Summary: Nonblocking HDFS create|write
                 Key: HDFS-14229
                 URL: https://issues.apache.org/jira/browse/HDFS-14229
             Project: Hadoop HDFS
          Issue Type: New Feature
          Components: hdfs-client
            Reporter: Zheng Shao


Right now, the create call on HDFS is blocking.  The write call can also be blocking if the write buffer reached its limit.

However, for most applications, the only requirement is that when "close" on a file is called, the file is persisted and visible in HDFS.  There is no need to make "create" visible right after the "create" call returns.

A particular use case of this is to use HDFS as a place to store shuffle data (in Spark, Map-Reduce, or other loose-coupled applications).

 

This Jira proposes that we add a new "async-hdfs://" protocol that maps to a new AsyncDistributedFileSystem class, whose create call is nonblocking but still returns a FSOutputStream that is never blocked on write (even when the file has not been physically created on HDFS yet).  The close call on the FSOutputStream will block until the creation and all previous writes are completed and the file is closed.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org