You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Brian Long <br...@dotspots.com> on 2009/02/26 23:14:13 UTC

Atomicity of file operations?

What kind of atomicity/visibility claims are made regarding the various
operations on a FileSystem?
I have multiple processes that write into local sequence files, then uploads
them into a remote directory in HDFS. A map/reduce job runs which operates
on whatever is in the directory. The processes are not synchronized with the
job, so it is entirely possible that the job might start as a file is being
uploaded. Thus, my concern is that the job may include a partially uploaded
file if "FileSystem.copyFromLocalFile" is not atomic (in the sense that the
file will not appear until all bytes are written).

Are any of the FileSystem API's atomic in this sense? What about, at the
very least, rename (e.g. first write to a temp hdfs location, then use
rename to atomically flip the file into the live directory)?

Thanks,
Brian

Re: Atomicity of file operations?

Posted by Brian Long <br...@dotspots.com>.
Thanks Brian. I will go with the copy to tmp and flip with rename model.
-B

On Thu, Feb 26, 2009 at 3:49 PM, Brian Bockelman <bb...@cse.unl.edu>wrote:

>
> On Feb 26, 2009, at 4:14 PM, Brian Long wrote:
>
>  What kind of atomicity/visibility claims are made regarding the various
>> operations on a FileSystem?
>> I have multiple processes that write into local sequence files, then
>> uploads
>> them into a remote directory in HDFS. A map/reduce job runs which operates
>> on whatever is in the directory. The processes are not synchronized with
>> the
>> job, so it is entirely possible that the job might start as a file is
>> being
>> uploaded. Thus, my concern is that the job may include a partially
>> uploaded
>> file if "FileSystem.copyFromLocalFile" is not atomic (in the sense that
>> the
>> file will not appear until all bytes are written).
>>
>
> Hey Brian,
>
> I can't speak for knowing about the whole file system, but I do know that,
> like you'd expect in Unix, open files which are being written to are
> visible.
>
>
>>
>> Are any of the FileSystem API's atomic in this sense? What about, at the
>> very least, rename (e.g. first write to a temp hdfs location, then use
>> rename to atomically flip the file into the live directory)?
>>
>>
> I'm not sure on this one; I suspect you're safe here.
>
> Brian
>

Re: Atomicity of file operations?

Posted by Brian Bockelman <bb...@cse.unl.edu>.
On Feb 26, 2009, at 4:14 PM, Brian Long wrote:

> What kind of atomicity/visibility claims are made regarding the  
> various
> operations on a FileSystem?
> I have multiple processes that write into local sequence files, then  
> uploads
> them into a remote directory in HDFS. A map/reduce job runs which  
> operates
> on whatever is in the directory. The processes are not synchronized  
> with the
> job, so it is entirely possible that the job might start as a file  
> is being
> uploaded. Thus, my concern is that the job may include a partially  
> uploaded
> file if "FileSystem.copyFromLocalFile" is not atomic (in the sense  
> that the
> file will not appear until all bytes are written).

Hey Brian,

I can't speak for knowing about the whole file system, but I do know  
that, like you'd expect in Unix, open files which are being written to  
are visible.

>
>
> Are any of the FileSystem API's atomic in this sense? What about, at  
> the
> very least, rename (e.g. first write to a temp hdfs location, then use
> rename to atomically flip the file into the live directory)?
>

I'm not sure on this one; I suspect you're safe here.

Brian