You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by Kun Ren <re...@gmail.com> on 2016/05/25 15:21:38 UTC

Cp command is not atomic

Hi Genius,

If I understand correctly, the shell command "cp" for the HDFS is not
atomic, is that correct?

For example:

./bin/hdfs dfs -cp input/a.xml input/b.xml

This command actually does 3 things, 1. read input/a.xml; 2. Create a new
file input/b.xml; 3. Write the content of a.xml to b.xml;

When I looked at the code, and the client side actually does the 3 steps
and there are no lock between the 3 step, does it mean that the cp command
is not guaranteed atomic?


Thanks a lot for your reply.

Re: Cp command is not atomic

Posted by Kun Ren <re...@gmail.com>.
Thanks a lot, Chris, this is helpful.

On Wed, May 25, 2016 at 12:33 PM, Chris Nauroth <cn...@hortonworks.com>
wrote:

> Hello Kun,
>
> You are correct that "hdfs dfs -cp" is not atomic, but the details of that
> are a bit different from what you described.  For the example you gave,
> the sequence of events would be:
>
> 1. Open a.xml.
> 2. Create file b.xml._COPYING_.
> 3. Copy the bytes from a.xml to b.xml._COPYING_.
> 4. Rename b.xml._COPYING_ to b.xml.
>
> b.xml._COPYING_ is a temporary file.  All the bytes are written to this
> location first.  Only if the full copy is successful, it proceeds to step
> 4 to rename it to its final destination at b.xml.  The rename is atomic,
> so overall, this has the effect that b.xml will never have
> partially-written data.  Either the whole copy succeeds or the copy fails
> and b.xml doesn't exist.
>
> However, even though the rename is atomic, we can't claim the overall
> operation is atomic.  For example, if the process dies between step 2 and
> step 3, then the command leaves a lingering side effect in the form of the
> b.xml._COPYING_ file.
>
> Perhaps it's sufficient for your use case that the final rename step is
> atomic.
>
> --Chris Nauroth
>
>
>
>
> On 5/25/16, 8:21 AM, "Kun Ren" <re...@gmail.com> wrote:
>
> >Hi Genius,
> >
> >If I understand correctly, the shell command "cp" for the HDFS is not
> >atomic, is that correct?
> >
> >For example:
> >
> >./bin/hdfs dfs -cp input/a.xml input/b.xml
> >
> >This command actually does 3 things, 1. read input/a.xml; 2. Create a new
> >file input/b.xml; 3. Write the content of a.xml to b.xml;
> >
> >When I looked at the code, and the client side actually does the 3 steps
> >and there are no lock between the 3 step, does it mean that the cp command
> >is not guaranteed atomic?
> >
> >
> >Thanks a lot for your reply.
>
>

Re: Cp command is not atomic

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hello Kun,

You are correct that "hdfs dfs -cp" is not atomic, but the details of that
are a bit different from what you described.  For the example you gave,
the sequence of events would be:

1. Open a.xml.
2. Create file b.xml._COPYING_.
3. Copy the bytes from a.xml to b.xml._COPYING_.
4. Rename b.xml._COPYING_ to b.xml.

b.xml._COPYING_ is a temporary file.  All the bytes are written to this
location first.  Only if the full copy is successful, it proceeds to step
4 to rename it to its final destination at b.xml.  The rename is atomic,
so overall, this has the effect that b.xml will never have
partially-written data.  Either the whole copy succeeds or the copy fails
and b.xml doesn't exist.

However, even though the rename is atomic, we can't claim the overall
operation is atomic.  For example, if the process dies between step 2 and
step 3, then the command leaves a lingering side effect in the form of the
b.xml._COPYING_ file.

Perhaps it's sufficient for your use case that the final rename step is
atomic.

--Chris Nauroth




On 5/25/16, 8:21 AM, "Kun Ren" <re...@gmail.com> wrote:

>Hi Genius,
>
>If I understand correctly, the shell command "cp" for the HDFS is not
>atomic, is that correct?
>
>For example:
>
>./bin/hdfs dfs -cp input/a.xml input/b.xml
>
>This command actually does 3 things, 1. read input/a.xml; 2. Create a new
>file input/b.xml; 3. Write the content of a.xml to b.xml;
>
>When I looked at the code, and the client side actually does the 3 steps
>and there are no lock between the 3 step, does it mean that the cp command
>is not guaranteed atomic?
>
>
>Thanks a lot for your reply.


---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org