You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@hadoop.apache.org by xiao yang <ya...@gmail.com> on 2009/12/02 09:45:49 UTC

What if I want do random write in Hadoop?

FSDataInputStream is seekable, but FSDataOutputStream is not?
Why? What are the difficulties to support random write?

Thanks!
Xiao

Re: What if I want do random write in Hadoop?

Posted by Bernd Fondermann <be...@googlemail.com>.

On Wed, Dec 2, 2009 at 09:45, xiao yang <ya...@gmail.com> wrote:
> FSDataInputStream is seekable, but FSDataOutputStream is not?
> Why? What are the difficulties to support random write?

From
http://hadoop.apache.org/common/docs/r0.20.0/hdfs_design.html#Assumptions+and+Goals
>>>>
 Simple Coherency Model

HDFS applications need a write-once-read-many access model for files.
A file once created, written, and closed need not be changed. This
assumption simplifies data coherency issues and enables high
throughput data access. A Map/Reduce application or a web crawler
application fits perfectly with this model. There is a plan to support
appending-writes to files in the future.
<<<<

Data in HDFS is replicated (potentially across data center
boundaries). While changing a file, old copies of the data remain.
This results in consistency problems when massively reading in
parallel, one of the strength of HDFS. To avoid these complications,
changing written data is not possible.

Other distributed systems, like for example Apache CouchDB, have
different consistency models.

  Bernd