You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Ishaaq Chandy <is...@gmail.com> on 2011/03/01 05:22:24 UTC

atomicity of copyFromLocal

Hi all,
How "atomic" is the copyFromLocal call? i.e. on process is in the midst of
uploading a file to HDFS is it possible for another process to start reading
it before the upload is complete?

I am currently safeguarding my code from this possibility by uploading it to
a temporary directory and the renaming it to its final destination (the
assumption being that a rename is "more atomic" than copyFromLocal), but I'd
like to avoid doing this in two steps if possible.

Regards,
Ishaaq

Re: atomicity of copyFromLocal

Posted by st...@yahoo.com.
Cool - btw, it might be easier to identify uploaded files via a .tmp or .uploading extension instead of putting them in a temp folder. 
It's the usual approach... You can check out how firefox handles downloads, if you want to cover all the corner cases.

Take care,
 -stu
-----Original Message-----
From: Ishaaq Chandy <is...@gmail.com>
Date: Wed, 2 Mar 2011 08:16:08 
To: <hd...@hadoop.apache.org>; <st...@yahoo.com>
Reply-To: hdfs-user@hadoop.apache.org
Subject: Re: atomicity of copyFromLocal

Thanks Stu,
That is what I suspected but was hoping was not the case. The rename fix is
simple enough, even if a little ugly.
Regards,
Ishaaq

On 1 March 2011 15:51, <st...@yahoo.com> wrote:

> Pretty sure it's not atomic. I can read files I write via thrift well
> before they're done.
> Rename has always worked for me...
>
> Take care,
> -stu
> ------------------------------
> *From: * Ishaaq Chandy <is...@gmail.com>
> *Date: *Tue, 1 Mar 2011 15:22:24 +1100
> *To: *<hd...@hadoop.apache.org>
> *ReplyTo: * hdfs-user@hadoop.apache.org
> *Subject: *atomicity of copyFromLocal
>
> Hi all,
> How "atomic" is the copyFromLocal call? i.e. on process is in the midst of
> uploading a file to HDFS is it possible for another process to start reading
> it before the upload is complete?
>
> I am currently safeguarding my code from this possibility by uploading it
> to a temporary directory and the renaming it to its final destination (the
> assumption being that a rename is "more atomic" than copyFromLocal), but I'd
> like to avoid doing this in two steps if possible.
>
> Regards,
> Ishaaq
>


Re: atomicity of copyFromLocal

Posted by Ishaaq Chandy <is...@gmail.com>.
Thanks Stu,
That is what I suspected but was hoping was not the case. The rename fix is
simple enough, even if a little ugly.
Regards,
Ishaaq

On 1 March 2011 15:51, <st...@yahoo.com> wrote:

> Pretty sure it's not atomic. I can read files I write via thrift well
> before they're done.
> Rename has always worked for me...
>
> Take care,
> -stu
> ------------------------------
> *From: * Ishaaq Chandy <is...@gmail.com>
> *Date: *Tue, 1 Mar 2011 15:22:24 +1100
> *To: *<hd...@hadoop.apache.org>
> *ReplyTo: * hdfs-user@hadoop.apache.org
> *Subject: *atomicity of copyFromLocal
>
> Hi all,
> How "atomic" is the copyFromLocal call? i.e. on process is in the midst of
> uploading a file to HDFS is it possible for another process to start reading
> it before the upload is complete?
>
> I am currently safeguarding my code from this possibility by uploading it
> to a temporary directory and the renaming it to its final destination (the
> assumption being that a rename is "more atomic" than copyFromLocal), but I'd
> like to avoid doing this in two steps if possible.
>
> Regards,
> Ishaaq
>

Re: atomicity of copyFromLocal

Posted by st...@yahoo.com.
Pretty sure it's not atomic. I can read files I write via thrift well before they're done.
Rename has always worked for me...

Take care,
 -stu
-----Original Message-----
From: Ishaaq Chandy <is...@gmail.com>
Date: Tue, 1 Mar 2011 15:22:24 
To: <hd...@hadoop.apache.org>
Reply-To: hdfs-user@hadoop.apache.org
Subject: atomicity of copyFromLocal

Hi all,
How "atomic" is the copyFromLocal call? i.e. on process is in the midst of
uploading a file to HDFS is it possible for another process to start reading
it before the upload is complete?

I am currently safeguarding my code from this possibility by uploading it to
a temporary directory and the renaming it to its final destination (the
assumption being that a rename is "more atomic" than copyFromLocal), but I'd
like to avoid doing this in two steps if possible.

Regards,
Ishaaq