You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by lei liu <li...@gmail.com> on 2012/10/28 07:40:47 UTC

ClientProtocol create、mkdirs 、rename and delete methods are not Idempotent

I think these methods should are idempotent, these methods should be repeated
calls to be harmless by same client.


Thanks,

LiuLei

Re: ClientProtocol create、mkdirs 、rename and delete methods are not Idempotent

Posted by Ted Dunning <td...@maprtech.com>.
Create cannot be idempotent with sequential files.  Doing the same create
twice creates two different files.

On Sun, Oct 28, 2012 at 10:25 PM, lei liu <li...@gmail.com> wrote:

> Thanks Ted for your reply.
>
> What is the the problem of watches and sequential files?  If you can
> describe in detail, I can better understand the problem.
>
> 2012/10/29 Ted Dunning <td...@maprtech.com>
>
>> Create cannot be idempotent because of the problem of watches and
>> sequential files.
>>
>> Similarly, mkdirs, rename and delete cannot generally be idempotent.  In
>> particular applications, you might find it is OK to treat them as such, but
>> there are definitely applications where they are not idempotent.
>>
>>
>> On Sun, Oct 28, 2012 at 2:40 AM, lei liu <li...@gmail.com> wrote:
>>
>>> I think these methods should are idempotent, these methods should be repeated
>>> calls to be harmless by same client.
>>>
>>>
>>> Thanks,
>>>
>>> LiuLei
>>>
>>
>>
>

Re: ClientProtocol create、mkdirs 、rename and delete methods are not Idempotent

Posted by Ted Dunning <td...@maprtech.com>.
Create cannot be idempotent with sequential files.  Doing the same create
twice creates two different files.

On Sun, Oct 28, 2012 at 10:25 PM, lei liu <li...@gmail.com> wrote:

> Thanks Ted for your reply.
>
> What is the the problem of watches and sequential files?  If you can
> describe in detail, I can better understand the problem.
>
> 2012/10/29 Ted Dunning <td...@maprtech.com>
>
>> Create cannot be idempotent because of the problem of watches and
>> sequential files.
>>
>> Similarly, mkdirs, rename and delete cannot generally be idempotent.  In
>> particular applications, you might find it is OK to treat them as such, but
>> there are definitely applications where they are not idempotent.
>>
>>
>> On Sun, Oct 28, 2012 at 2:40 AM, lei liu <li...@gmail.com> wrote:
>>
>>> I think these methods should are idempotent, these methods should be repeated
>>> calls to be harmless by same client.
>>>
>>>
>>> Thanks,
>>>
>>> LiuLei
>>>
>>
>>
>

Re: ClientProtocol create、mkdirs 、rename and delete methods are not Idempotent

Posted by Ted Dunning <td...@maprtech.com>.
Create cannot be idempotent with sequential files.  Doing the same create
twice creates two different files.

On Sun, Oct 28, 2012 at 10:25 PM, lei liu <li...@gmail.com> wrote:

> Thanks Ted for your reply.
>
> What is the the problem of watches and sequential files?  If you can
> describe in detail, I can better understand the problem.
>
> 2012/10/29 Ted Dunning <td...@maprtech.com>
>
>> Create cannot be idempotent because of the problem of watches and
>> sequential files.
>>
>> Similarly, mkdirs, rename and delete cannot generally be idempotent.  In
>> particular applications, you might find it is OK to treat them as such, but
>> there are definitely applications where they are not idempotent.
>>
>>
>> On Sun, Oct 28, 2012 at 2:40 AM, lei liu <li...@gmail.com> wrote:
>>
>>> I think these methods should are idempotent, these methods should be repeated
>>> calls to be harmless by same client.
>>>
>>>
>>> Thanks,
>>>
>>> LiuLei
>>>
>>
>>
>

Re: ClientProtocol create、mkdirs 、rename and delete methods are not Idempotent

Posted by Ted Dunning <td...@maprtech.com>.
Create cannot be idempotent with sequential files.  Doing the same create
twice creates two different files.

On Sun, Oct 28, 2012 at 10:25 PM, lei liu <li...@gmail.com> wrote:

> Thanks Ted for your reply.
>
> What is the the problem of watches and sequential files?  If you can
> describe in detail, I can better understand the problem.
>
> 2012/10/29 Ted Dunning <td...@maprtech.com>
>
>> Create cannot be idempotent because of the problem of watches and
>> sequential files.
>>
>> Similarly, mkdirs, rename and delete cannot generally be idempotent.  In
>> particular applications, you might find it is OK to treat them as such, but
>> there are definitely applications where they are not idempotent.
>>
>>
>> On Sun, Oct 28, 2012 at 2:40 AM, lei liu <li...@gmail.com> wrote:
>>
>>> I think these methods should are idempotent, these methods should be repeated
>>> calls to be harmless by same client.
>>>
>>>
>>> Thanks,
>>>
>>> LiuLei
>>>
>>
>>
>

Re: ClientProtocol create、mkdirs 、rename and delete methods are not Idempotent

Posted by lei liu <li...@gmail.com>.
Thanks Ted for your reply.

What is the the problem of watches and sequential files?  If you can
describe in detail, I can better understand the problem.

2012/10/29 Ted Dunning <td...@maprtech.com>

> Create cannot be idempotent because of the problem of watches and
> sequential files.
>
> Similarly, mkdirs, rename and delete cannot generally be idempotent.  In
> particular applications, you might find it is OK to treat them as such, but
> there are definitely applications where they are not idempotent.
>
>
> On Sun, Oct 28, 2012 at 2:40 AM, lei liu <li...@gmail.com> wrote:
>
>> I think these methods should are idempotent, these methods should be repeated
>> calls to be harmless by same client.
>>
>>
>> Thanks,
>>
>> LiuLei
>>
>
>

Re: ClientProtocol create、mkdirs 、rename and delete methods are not Idempotent

Posted by lei liu <li...@gmail.com>.
Thanks Ted for your reply.

What is the the problem of watches and sequential files?  If you can
describe in detail, I can better understand the problem.

2012/10/29 Ted Dunning <td...@maprtech.com>

> Create cannot be idempotent because of the problem of watches and
> sequential files.
>
> Similarly, mkdirs, rename and delete cannot generally be idempotent.  In
> particular applications, you might find it is OK to treat them as such, but
> there are definitely applications where they are not idempotent.
>
>
> On Sun, Oct 28, 2012 at 2:40 AM, lei liu <li...@gmail.com> wrote:
>
>> I think these methods should are idempotent, these methods should be repeated
>> calls to be harmless by same client.
>>
>>
>> Thanks,
>>
>> LiuLei
>>
>
>

Re: ClientProtocol create、mkdirs 、rename and delete methods are not Idempotent

Posted by lei liu <li...@gmail.com>.
Thanks Ted for your reply.

What is the the problem of watches and sequential files?  If you can
describe in detail, I can better understand the problem.

2012/10/29 Ted Dunning <td...@maprtech.com>

> Create cannot be idempotent because of the problem of watches and
> sequential files.
>
> Similarly, mkdirs, rename and delete cannot generally be idempotent.  In
> particular applications, you might find it is OK to treat them as such, but
> there are definitely applications where they are not idempotent.
>
>
> On Sun, Oct 28, 2012 at 2:40 AM, lei liu <li...@gmail.com> wrote:
>
>> I think these methods should are idempotent, these methods should be repeated
>> calls to be harmless by same client.
>>
>>
>> Thanks,
>>
>> LiuLei
>>
>
>

Re: ClientProtocol create、mkdirs 、rename and delete methods are not Idempotent

Posted by lei liu <li...@gmail.com>.
Hi Steve,

Thank you for your detailed and patiently  answered.  I understand that.


2012/11/5 Steve Loughran <st...@hortonworks.com>

>
>
> On 4 November 2012 17:25, lei liu <li...@gmail.com> wrote:
>
>> I want to know what applications are idempotent or not idempotent? and
>> Why? Could you give me a example.
>>
>>
>
>
> When you say "idempotent", I presume you mean the operation happens
> "at-most-once"; ignoring the degenerate case where all requests are
> rejected.
>
> you can take operations that fail if their conditions aren't met (delete
> path named="something") being the simplest. the operation can send an error
> back "file not found', but the client library can then downgrade that to an
> idempotent assertion: "when the acknowledgment was send from the namenode,
> there was nothing at the end of this path". Which will hold on a replay,
> though if someone creates a file in between, that replay could be
> observable.
>
>
> Now what about move(src,dest)?
>
> if it succeeds, then there is no src path, as it is now at "dest".
>
> What happens if you call it a second time? There is no src, only dest. You
> can't report that back as a success as it is clearly a failure: no src, no
> dest. It's hard to convert that into an assertion on the observable state
> of the system as the state doesn't reflect the history, so you need some
> temporal logic in there too:: at time t0 there existed a directory src, at
> time t1 the directory src no longer existed and its contents were now found
> under directory "dest".
>
> And again, what happens if worse someone else did something in between,
> created a src directory (which it could do, given that the first one has
> been renamed dest), the operation replays and the move takes place twice
> -you've just crossed into at-least-once operations, which is not what you
> wanted.
>
>
> At this point I'm sure you are thinking of having some kind of transaction
> journal, recording that at time Tn, transaction Xn moved the dir. Which
> means you have to start to collect a transaction log of what happened. Now
> effectively HDFS is a journalled file system, it does record a lot of
> things. It just doesn't record user transactions with it, or rescan the log
> whenever any operation comes in, so as to decided what to ignore.
>
> Or you just skip the filesystem changes and have some data structure
> recording "recent" transaction IDs; ignore repeated requests with the same
> IDs. Better, though you'd need to make that failure resistant -it's state
> must propagate to the journal and any failover namenodes so that a
> transaction replay will be idempotent even if the filesystem fails over
> between the original and replayed transaction. And of course all of this
> needs to be atomic with the filesystem state changes...
>
> Summary: It gets complicated fast. Throwing errors back to the caller
> makes life a lot simpler and lets the caller choose its own outcome -even
> though that's not always satisfactory.
>
> Alternatively: it's not that people don't want globally distributed
> transactions -it's just hard.
>
>
>
>
>>
>>
>
>> 2012/10/29 Ted Dunning <td...@maprtech.com>
>>
>>> Create cannot be idempotent because of the problem of watches and
>>> sequential files.
>>>
>>> Similarly, mkdirs, rename and delete cannot generally be idempotent.  In
>>> particular applications, you might find it is OK to treat them as such, but
>>> there are definitely applications where they are not idempotent.
>>>
>>>
>>> On Sun, Oct 28, 2012 at 2:40 AM, lei liu <li...@gmail.com> wrote:
>>>
>>>> I think these methods should are idempotent, these methods should be repeated
>>>> calls to be harmless by same client.
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> LiuLei
>>>>
>>>
>>>
>>
>

Re: ClientProtocol create、mkdirs 、rename and delete methods are not Idempotent

Posted by lei liu <li...@gmail.com>.
Hi Steve,

Thank you for your detailed and patiently  answered.  I understand that.


2012/11/5 Steve Loughran <st...@hortonworks.com>

>
>
> On 4 November 2012 17:25, lei liu <li...@gmail.com> wrote:
>
>> I want to know what applications are idempotent or not idempotent? and
>> Why? Could you give me a example.
>>
>>
>
>
> When you say "idempotent", I presume you mean the operation happens
> "at-most-once"; ignoring the degenerate case where all requests are
> rejected.
>
> you can take operations that fail if their conditions aren't met (delete
> path named="something") being the simplest. the operation can send an error
> back "file not found', but the client library can then downgrade that to an
> idempotent assertion: "when the acknowledgment was send from the namenode,
> there was nothing at the end of this path". Which will hold on a replay,
> though if someone creates a file in between, that replay could be
> observable.
>
>
> Now what about move(src,dest)?
>
> if it succeeds, then there is no src path, as it is now at "dest".
>
> What happens if you call it a second time? There is no src, only dest. You
> can't report that back as a success as it is clearly a failure: no src, no
> dest. It's hard to convert that into an assertion on the observable state
> of the system as the state doesn't reflect the history, so you need some
> temporal logic in there too:: at time t0 there existed a directory src, at
> time t1 the directory src no longer existed and its contents were now found
> under directory "dest".
>
> And again, what happens if worse someone else did something in between,
> created a src directory (which it could do, given that the first one has
> been renamed dest), the operation replays and the move takes place twice
> -you've just crossed into at-least-once operations, which is not what you
> wanted.
>
>
> At this point I'm sure you are thinking of having some kind of transaction
> journal, recording that at time Tn, transaction Xn moved the dir. Which
> means you have to start to collect a transaction log of what happened. Now
> effectively HDFS is a journalled file system, it does record a lot of
> things. It just doesn't record user transactions with it, or rescan the log
> whenever any operation comes in, so as to decided what to ignore.
>
> Or you just skip the filesystem changes and have some data structure
> recording "recent" transaction IDs; ignore repeated requests with the same
> IDs. Better, though you'd need to make that failure resistant -it's state
> must propagate to the journal and any failover namenodes so that a
> transaction replay will be idempotent even if the filesystem fails over
> between the original and replayed transaction. And of course all of this
> needs to be atomic with the filesystem state changes...
>
> Summary: It gets complicated fast. Throwing errors back to the caller
> makes life a lot simpler and lets the caller choose its own outcome -even
> though that's not always satisfactory.
>
> Alternatively: it's not that people don't want globally distributed
> transactions -it's just hard.
>
>
>
>
>>
>>
>
>> 2012/10/29 Ted Dunning <td...@maprtech.com>
>>
>>> Create cannot be idempotent because of the problem of watches and
>>> sequential files.
>>>
>>> Similarly, mkdirs, rename and delete cannot generally be idempotent.  In
>>> particular applications, you might find it is OK to treat them as such, but
>>> there are definitely applications where they are not idempotent.
>>>
>>>
>>> On Sun, Oct 28, 2012 at 2:40 AM, lei liu <li...@gmail.com> wrote:
>>>
>>>> I think these methods should are idempotent, these methods should be repeated
>>>> calls to be harmless by same client.
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> LiuLei
>>>>
>>>
>>>
>>
>

Re: ClientProtocol create、mkdirs 、rename and delete methods are not Idempotent

Posted by lei liu <li...@gmail.com>.
Hi Steve,

Thank you for your detailed and patiently  answered.  I understand that.


2012/11/5 Steve Loughran <st...@hortonworks.com>

>
>
> On 4 November 2012 17:25, lei liu <li...@gmail.com> wrote:
>
>> I want to know what applications are idempotent or not idempotent? and
>> Why? Could you give me a example.
>>
>>
>
>
> When you say "idempotent", I presume you mean the operation happens
> "at-most-once"; ignoring the degenerate case where all requests are
> rejected.
>
> you can take operations that fail if their conditions aren't met (delete
> path named="something") being the simplest. the operation can send an error
> back "file not found', but the client library can then downgrade that to an
> idempotent assertion: "when the acknowledgment was send from the namenode,
> there was nothing at the end of this path". Which will hold on a replay,
> though if someone creates a file in between, that replay could be
> observable.
>
>
> Now what about move(src,dest)?
>
> if it succeeds, then there is no src path, as it is now at "dest".
>
> What happens if you call it a second time? There is no src, only dest. You
> can't report that back as a success as it is clearly a failure: no src, no
> dest. It's hard to convert that into an assertion on the observable state
> of the system as the state doesn't reflect the history, so you need some
> temporal logic in there too:: at time t0 there existed a directory src, at
> time t1 the directory src no longer existed and its contents were now found
> under directory "dest".
>
> And again, what happens if worse someone else did something in between,
> created a src directory (which it could do, given that the first one has
> been renamed dest), the operation replays and the move takes place twice
> -you've just crossed into at-least-once operations, which is not what you
> wanted.
>
>
> At this point I'm sure you are thinking of having some kind of transaction
> journal, recording that at time Tn, transaction Xn moved the dir. Which
> means you have to start to collect a transaction log of what happened. Now
> effectively HDFS is a journalled file system, it does record a lot of
> things. It just doesn't record user transactions with it, or rescan the log
> whenever any operation comes in, so as to decided what to ignore.
>
> Or you just skip the filesystem changes and have some data structure
> recording "recent" transaction IDs; ignore repeated requests with the same
> IDs. Better, though you'd need to make that failure resistant -it's state
> must propagate to the journal and any failover namenodes so that a
> transaction replay will be idempotent even if the filesystem fails over
> between the original and replayed transaction. And of course all of this
> needs to be atomic with the filesystem state changes...
>
> Summary: It gets complicated fast. Throwing errors back to the caller
> makes life a lot simpler and lets the caller choose its own outcome -even
> though that's not always satisfactory.
>
> Alternatively: it's not that people don't want globally distributed
> transactions -it's just hard.
>
>
>
>
>>
>>
>
>> 2012/10/29 Ted Dunning <td...@maprtech.com>
>>
>>> Create cannot be idempotent because of the problem of watches and
>>> sequential files.
>>>
>>> Similarly, mkdirs, rename and delete cannot generally be idempotent.  In
>>> particular applications, you might find it is OK to treat them as such, but
>>> there are definitely applications where they are not idempotent.
>>>
>>>
>>> On Sun, Oct 28, 2012 at 2:40 AM, lei liu <li...@gmail.com> wrote:
>>>
>>>> I think these methods should are idempotent, these methods should be repeated
>>>> calls to be harmless by same client.
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> LiuLei
>>>>
>>>
>>>
>>
>

Re: ClientProtocol create、mkdirs 、rename and delete methods are not Idempotent

Posted by lei liu <li...@gmail.com>.
Hi Steve,

Thank you for your detailed and patiently  answered.  I understand that.


2012/11/5 Steve Loughran <st...@hortonworks.com>

>
>
> On 4 November 2012 17:25, lei liu <li...@gmail.com> wrote:
>
>> I want to know what applications are idempotent or not idempotent? and
>> Why? Could you give me a example.
>>
>>
>
>
> When you say "idempotent", I presume you mean the operation happens
> "at-most-once"; ignoring the degenerate case where all requests are
> rejected.
>
> you can take operations that fail if their conditions aren't met (delete
> path named="something") being the simplest. the operation can send an error
> back "file not found', but the client library can then downgrade that to an
> idempotent assertion: "when the acknowledgment was send from the namenode,
> there was nothing at the end of this path". Which will hold on a replay,
> though if someone creates a file in between, that replay could be
> observable.
>
>
> Now what about move(src,dest)?
>
> if it succeeds, then there is no src path, as it is now at "dest".
>
> What happens if you call it a second time? There is no src, only dest. You
> can't report that back as a success as it is clearly a failure: no src, no
> dest. It's hard to convert that into an assertion on the observable state
> of the system as the state doesn't reflect the history, so you need some
> temporal logic in there too:: at time t0 there existed a directory src, at
> time t1 the directory src no longer existed and its contents were now found
> under directory "dest".
>
> And again, what happens if worse someone else did something in between,
> created a src directory (which it could do, given that the first one has
> been renamed dest), the operation replays and the move takes place twice
> -you've just crossed into at-least-once operations, which is not what you
> wanted.
>
>
> At this point I'm sure you are thinking of having some kind of transaction
> journal, recording that at time Tn, transaction Xn moved the dir. Which
> means you have to start to collect a transaction log of what happened. Now
> effectively HDFS is a journalled file system, it does record a lot of
> things. It just doesn't record user transactions with it, or rescan the log
> whenever any operation comes in, so as to decided what to ignore.
>
> Or you just skip the filesystem changes and have some data structure
> recording "recent" transaction IDs; ignore repeated requests with the same
> IDs. Better, though you'd need to make that failure resistant -it's state
> must propagate to the journal and any failover namenodes so that a
> transaction replay will be idempotent even if the filesystem fails over
> between the original and replayed transaction. And of course all of this
> needs to be atomic with the filesystem state changes...
>
> Summary: It gets complicated fast. Throwing errors back to the caller
> makes life a lot simpler and lets the caller choose its own outcome -even
> though that's not always satisfactory.
>
> Alternatively: it's not that people don't want globally distributed
> transactions -it's just hard.
>
>
>
>
>>
>>
>
>> 2012/10/29 Ted Dunning <td...@maprtech.com>
>>
>>> Create cannot be idempotent because of the problem of watches and
>>> sequential files.
>>>
>>> Similarly, mkdirs, rename and delete cannot generally be idempotent.  In
>>> particular applications, you might find it is OK to treat them as such, but
>>> there are definitely applications where they are not idempotent.
>>>
>>>
>>> On Sun, Oct 28, 2012 at 2:40 AM, lei liu <li...@gmail.com> wrote:
>>>
>>>> I think these methods should are idempotent, these methods should be repeated
>>>> calls to be harmless by same client.
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> LiuLei
>>>>
>>>
>>>
>>
>

Re: ClientProtocol create、mkdirs 、rename and delete methods are not Idempotent

Posted by Steve Loughran <st...@hortonworks.com>.
On 4 November 2012 17:25, lei liu <li...@gmail.com> wrote:

> I want to know what applications are idempotent or not idempotent? and
> Why? Could you give me a example.
>
>


When you say "idempotent", I presume you mean the operation happens
"at-most-once"; ignoring the degenerate case where all requests are
rejected.

you can take operations that fail if their conditions aren't met (delete
path named="something") being the simplest. the operation can send an error
back "file not found', but the client library can then downgrade that to an
idempotent assertion: "when the acknowledgment was send from the namenode,
there was nothing at the end of this path". Which will hold on a replay,
though if someone creates a file in between, that replay could be
observable.


Now what about move(src,dest)?

if it succeeds, then there is no src path, as it is now at "dest".

What happens if you call it a second time? There is no src, only dest. You
can't report that back as a success as it is clearly a failure: no src, no
dest. It's hard to convert that into an assertion on the observable state
of the system as the state doesn't reflect the history, so you need some
temporal logic in there too:: at time t0 there existed a directory src, at
time t1 the directory src no longer existed and its contents were now found
under directory "dest".

And again, what happens if worse someone else did something in between,
created a src directory (which it could do, given that the first one has
been renamed dest), the operation replays and the move takes place twice
-you've just crossed into at-least-once operations, which is not what you
wanted.


At this point I'm sure you are thinking of having some kind of transaction
journal, recording that at time Tn, transaction Xn moved the dir. Which
means you have to start to collect a transaction log of what happened. Now
effectively HDFS is a journalled file system, it does record a lot of
things. It just doesn't record user transactions with it, or rescan the log
whenever any operation comes in, so as to decided what to ignore.

Or you just skip the filesystem changes and have some data structure
recording "recent" transaction IDs; ignore repeated requests with the same
IDs. Better, though you'd need to make that failure resistant -it's state
must propagate to the journal and any failover namenodes so that a
transaction replay will be idempotent even if the filesystem fails over
between the original and replayed transaction. And of course all of this
needs to be atomic with the filesystem state changes...

Summary: It gets complicated fast. Throwing errors back to the caller makes
life a lot simpler and lets the caller choose its own outcome -even though
that's not always satisfactory.

Alternatively: it's not that people don't want globally distributed
transactions -it's just hard.




>
>

> 2012/10/29 Ted Dunning <td...@maprtech.com>
>
>> Create cannot be idempotent because of the problem of watches and
>> sequential files.
>>
>> Similarly, mkdirs, rename and delete cannot generally be idempotent.  In
>> particular applications, you might find it is OK to treat them as such, but
>> there are definitely applications where they are not idempotent.
>>
>>
>> On Sun, Oct 28, 2012 at 2:40 AM, lei liu <li...@gmail.com> wrote:
>>
>>> I think these methods should are idempotent, these methods should be repeated
>>> calls to be harmless by same client.
>>>
>>>
>>> Thanks,
>>>
>>> LiuLei
>>>
>>
>>
>

Re: ClientProtocol create、mkdirs 、rename and delete methods are not Idempotent

Posted by Steve Loughran <st...@hortonworks.com>.
On 4 November 2012 17:25, lei liu <li...@gmail.com> wrote:

> I want to know what applications are idempotent or not idempotent? and
> Why? Could you give me a example.
>
>


When you say "idempotent", I presume you mean the operation happens
"at-most-once"; ignoring the degenerate case where all requests are
rejected.

you can take operations that fail if their conditions aren't met (delete
path named="something") being the simplest. the operation can send an error
back "file not found', but the client library can then downgrade that to an
idempotent assertion: "when the acknowledgment was send from the namenode,
there was nothing at the end of this path". Which will hold on a replay,
though if someone creates a file in between, that replay could be
observable.


Now what about move(src,dest)?

if it succeeds, then there is no src path, as it is now at "dest".

What happens if you call it a second time? There is no src, only dest. You
can't report that back as a success as it is clearly a failure: no src, no
dest. It's hard to convert that into an assertion on the observable state
of the system as the state doesn't reflect the history, so you need some
temporal logic in there too:: at time t0 there existed a directory src, at
time t1 the directory src no longer existed and its contents were now found
under directory "dest".

And again, what happens if worse someone else did something in between,
created a src directory (which it could do, given that the first one has
been renamed dest), the operation replays and the move takes place twice
-you've just crossed into at-least-once operations, which is not what you
wanted.


At this point I'm sure you are thinking of having some kind of transaction
journal, recording that at time Tn, transaction Xn moved the dir. Which
means you have to start to collect a transaction log of what happened. Now
effectively HDFS is a journalled file system, it does record a lot of
things. It just doesn't record user transactions with it, or rescan the log
whenever any operation comes in, so as to decided what to ignore.

Or you just skip the filesystem changes and have some data structure
recording "recent" transaction IDs; ignore repeated requests with the same
IDs. Better, though you'd need to make that failure resistant -it's state
must propagate to the journal and any failover namenodes so that a
transaction replay will be idempotent even if the filesystem fails over
between the original and replayed transaction. And of course all of this
needs to be atomic with the filesystem state changes...

Summary: It gets complicated fast. Throwing errors back to the caller makes
life a lot simpler and lets the caller choose its own outcome -even though
that's not always satisfactory.

Alternatively: it's not that people don't want globally distributed
transactions -it's just hard.




>
>

> 2012/10/29 Ted Dunning <td...@maprtech.com>
>
>> Create cannot be idempotent because of the problem of watches and
>> sequential files.
>>
>> Similarly, mkdirs, rename and delete cannot generally be idempotent.  In
>> particular applications, you might find it is OK to treat them as such, but
>> there are definitely applications where they are not idempotent.
>>
>>
>> On Sun, Oct 28, 2012 at 2:40 AM, lei liu <li...@gmail.com> wrote:
>>
>>> I think these methods should are idempotent, these methods should be repeated
>>> calls to be harmless by same client.
>>>
>>>
>>> Thanks,
>>>
>>> LiuLei
>>>
>>
>>
>

Re: ClientProtocol create、mkdirs 、rename and delete methods are not Idempotent

Posted by Steve Loughran <st...@hortonworks.com>.
On 4 November 2012 17:25, lei liu <li...@gmail.com> wrote:

> I want to know what applications are idempotent or not idempotent? and
> Why? Could you give me a example.
>
>


When you say "idempotent", I presume you mean the operation happens
"at-most-once"; ignoring the degenerate case where all requests are
rejected.

you can take operations that fail if their conditions aren't met (delete
path named="something") being the simplest. the operation can send an error
back "file not found', but the client library can then downgrade that to an
idempotent assertion: "when the acknowledgment was send from the namenode,
there was nothing at the end of this path". Which will hold on a replay,
though if someone creates a file in between, that replay could be
observable.


Now what about move(src,dest)?

if it succeeds, then there is no src path, as it is now at "dest".

What happens if you call it a second time? There is no src, only dest. You
can't report that back as a success as it is clearly a failure: no src, no
dest. It's hard to convert that into an assertion on the observable state
of the system as the state doesn't reflect the history, so you need some
temporal logic in there too:: at time t0 there existed a directory src, at
time t1 the directory src no longer existed and its contents were now found
under directory "dest".

And again, what happens if worse someone else did something in between,
created a src directory (which it could do, given that the first one has
been renamed dest), the operation replays and the move takes place twice
-you've just crossed into at-least-once operations, which is not what you
wanted.


At this point I'm sure you are thinking of having some kind of transaction
journal, recording that at time Tn, transaction Xn moved the dir. Which
means you have to start to collect a transaction log of what happened. Now
effectively HDFS is a journalled file system, it does record a lot of
things. It just doesn't record user transactions with it, or rescan the log
whenever any operation comes in, so as to decided what to ignore.

Or you just skip the filesystem changes and have some data structure
recording "recent" transaction IDs; ignore repeated requests with the same
IDs. Better, though you'd need to make that failure resistant -it's state
must propagate to the journal and any failover namenodes so that a
transaction replay will be idempotent even if the filesystem fails over
between the original and replayed transaction. And of course all of this
needs to be atomic with the filesystem state changes...

Summary: It gets complicated fast. Throwing errors back to the caller makes
life a lot simpler and lets the caller choose its own outcome -even though
that's not always satisfactory.

Alternatively: it's not that people don't want globally distributed
transactions -it's just hard.




>
>

> 2012/10/29 Ted Dunning <td...@maprtech.com>
>
>> Create cannot be idempotent because of the problem of watches and
>> sequential files.
>>
>> Similarly, mkdirs, rename and delete cannot generally be idempotent.  In
>> particular applications, you might find it is OK to treat them as such, but
>> there are definitely applications where they are not idempotent.
>>
>>
>> On Sun, Oct 28, 2012 at 2:40 AM, lei liu <li...@gmail.com> wrote:
>>
>>> I think these methods should are idempotent, these methods should be repeated
>>> calls to be harmless by same client.
>>>
>>>
>>> Thanks,
>>>
>>> LiuLei
>>>
>>
>>
>

Re: ClientProtocol create、mkdirs 、rename and delete methods are not Idempotent

Posted by Steve Loughran <st...@hortonworks.com>.
On 4 November 2012 17:25, lei liu <li...@gmail.com> wrote:

> I want to know what applications are idempotent or not idempotent? and
> Why? Could you give me a example.
>
>


When you say "idempotent", I presume you mean the operation happens
"at-most-once"; ignoring the degenerate case where all requests are
rejected.

you can take operations that fail if their conditions aren't met (delete
path named="something") being the simplest. the operation can send an error
back "file not found', but the client library can then downgrade that to an
idempotent assertion: "when the acknowledgment was send from the namenode,
there was nothing at the end of this path". Which will hold on a replay,
though if someone creates a file in between, that replay could be
observable.


Now what about move(src,dest)?

if it succeeds, then there is no src path, as it is now at "dest".

What happens if you call it a second time? There is no src, only dest. You
can't report that back as a success as it is clearly a failure: no src, no
dest. It's hard to convert that into an assertion on the observable state
of the system as the state doesn't reflect the history, so you need some
temporal logic in there too:: at time t0 there existed a directory src, at
time t1 the directory src no longer existed and its contents were now found
under directory "dest".

And again, what happens if worse someone else did something in between,
created a src directory (which it could do, given that the first one has
been renamed dest), the operation replays and the move takes place twice
-you've just crossed into at-least-once operations, which is not what you
wanted.


At this point I'm sure you are thinking of having some kind of transaction
journal, recording that at time Tn, transaction Xn moved the dir. Which
means you have to start to collect a transaction log of what happened. Now
effectively HDFS is a journalled file system, it does record a lot of
things. It just doesn't record user transactions with it, or rescan the log
whenever any operation comes in, so as to decided what to ignore.

Or you just skip the filesystem changes and have some data structure
recording "recent" transaction IDs; ignore repeated requests with the same
IDs. Better, though you'd need to make that failure resistant -it's state
must propagate to the journal and any failover namenodes so that a
transaction replay will be idempotent even if the filesystem fails over
between the original and replayed transaction. And of course all of this
needs to be atomic with the filesystem state changes...

Summary: It gets complicated fast. Throwing errors back to the caller makes
life a lot simpler and lets the caller choose its own outcome -even though
that's not always satisfactory.

Alternatively: it's not that people don't want globally distributed
transactions -it's just hard.




>
>

> 2012/10/29 Ted Dunning <td...@maprtech.com>
>
>> Create cannot be idempotent because of the problem of watches and
>> sequential files.
>>
>> Similarly, mkdirs, rename and delete cannot generally be idempotent.  In
>> particular applications, you might find it is OK to treat them as such, but
>> there are definitely applications where they are not idempotent.
>>
>>
>> On Sun, Oct 28, 2012 at 2:40 AM, lei liu <li...@gmail.com> wrote:
>>
>>> I think these methods should are idempotent, these methods should be repeated
>>> calls to be harmless by same client.
>>>
>>>
>>> Thanks,
>>>
>>> LiuLei
>>>
>>
>>
>

Re: ClientProtocol create、mkdirs 、rename and delete methods are not Idempotent

Posted by lei liu <li...@gmail.com>.
I want to know what applications are idempotent or not idempotent? and
Why? Could you give me a example.

Thank you


2012/10/29 Ted Dunning <td...@maprtech.com>

> Create cannot be idempotent because of the problem of watches and
> sequential files.
>
> Similarly, mkdirs, rename and delete cannot generally be idempotent.  In
> particular applications, you might find it is OK to treat them as such, but
> there are definitely applications where they are not idempotent.
>
>
> On Sun, Oct 28, 2012 at 2:40 AM, lei liu <li...@gmail.com> wrote:
>
>> I think these methods should are idempotent, these methods should be repeated
>> calls to be harmless by same client.
>>
>>
>> Thanks,
>>
>> LiuLei
>>
>
>

Re: ClientProtocol create、mkdirs 、rename and delete methods are not Idempotent

Posted by lei liu <li...@gmail.com>.
I want to know what applications are idempotent or not idempotent? and
Why? Could you give me a example.

Thank you


2012/10/29 Ted Dunning <td...@maprtech.com>

> Create cannot be idempotent because of the problem of watches and
> sequential files.
>
> Similarly, mkdirs, rename and delete cannot generally be idempotent.  In
> particular applications, you might find it is OK to treat them as such, but
> there are definitely applications where they are not idempotent.
>
>
> On Sun, Oct 28, 2012 at 2:40 AM, lei liu <li...@gmail.com> wrote:
>
>> I think these methods should are idempotent, these methods should be repeated
>> calls to be harmless by same client.
>>
>>
>> Thanks,
>>
>> LiuLei
>>
>
>

Re: ClientProtocol create、mkdirs 、rename and delete methods are not Idempotent

Posted by lei liu <li...@gmail.com>.
Thanks Ted for your reply.

What is the the problem of watches and sequential files?  If you can
describe in detail, I can better understand the problem.

2012/10/29 Ted Dunning <td...@maprtech.com>

> Create cannot be idempotent because of the problem of watches and
> sequential files.
>
> Similarly, mkdirs, rename and delete cannot generally be idempotent.  In
> particular applications, you might find it is OK to treat them as such, but
> there are definitely applications where they are not idempotent.
>
>
> On Sun, Oct 28, 2012 at 2:40 AM, lei liu <li...@gmail.com> wrote:
>
>> I think these methods should are idempotent, these methods should be repeated
>> calls to be harmless by same client.
>>
>>
>> Thanks,
>>
>> LiuLei
>>
>
>

Re: ClientProtocol create、mkdirs 、rename and delete methods are not Idempotent

Posted by lei liu <li...@gmail.com>.
I want to know what applications are idempotent or not idempotent? and
Why? Could you give me a example.

Thank you


2012/10/29 Ted Dunning <td...@maprtech.com>

> Create cannot be idempotent because of the problem of watches and
> sequential files.
>
> Similarly, mkdirs, rename and delete cannot generally be idempotent.  In
> particular applications, you might find it is OK to treat them as such, but
> there are definitely applications where they are not idempotent.
>
>
> On Sun, Oct 28, 2012 at 2:40 AM, lei liu <li...@gmail.com> wrote:
>
>> I think these methods should are idempotent, these methods should be repeated
>> calls to be harmless by same client.
>>
>>
>> Thanks,
>>
>> LiuLei
>>
>
>

Re: ClientProtocol create、mkdirs 、rename and delete methods are not Idempotent

Posted by lei liu <li...@gmail.com>.
I want to know what applications are idempotent or not idempotent? and
Why? Could you give me a example.

Thank you


2012/10/29 Ted Dunning <td...@maprtech.com>

> Create cannot be idempotent because of the problem of watches and
> sequential files.
>
> Similarly, mkdirs, rename and delete cannot generally be idempotent.  In
> particular applications, you might find it is OK to treat them as such, but
> there are definitely applications where they are not idempotent.
>
>
> On Sun, Oct 28, 2012 at 2:40 AM, lei liu <li...@gmail.com> wrote:
>
>> I think these methods should are idempotent, these methods should be repeated
>> calls to be harmless by same client.
>>
>>
>> Thanks,
>>
>> LiuLei
>>
>
>

Re: ClientProtocol create、mkdirs 、rename and delete methods are not Idempotent

Posted by Ted Dunning <td...@maprtech.com>.
Create cannot be idempotent because of the problem of watches and
sequential files.

Similarly, mkdirs, rename and delete cannot generally be idempotent.  In
particular applications, you might find it is OK to treat them as such, but
there are definitely applications where they are not idempotent.

On Sun, Oct 28, 2012 at 2:40 AM, lei liu <li...@gmail.com> wrote:

> I think these methods should are idempotent, these methods should be repeated
> calls to be harmless by same client.
>
>
> Thanks,
>
> LiuLei
>

Re: ClientProtocol create、mkdirs 、rename and delete methods are not Idempotent

Posted by Ted Dunning <td...@maprtech.com>.
Create cannot be idempotent because of the problem of watches and
sequential files.

Similarly, mkdirs, rename and delete cannot generally be idempotent.  In
particular applications, you might find it is OK to treat them as such, but
there are definitely applications where they are not idempotent.

On Sun, Oct 28, 2012 at 2:40 AM, lei liu <li...@gmail.com> wrote:

> I think these methods should are idempotent, these methods should be repeated
> calls to be harmless by same client.
>
>
> Thanks,
>
> LiuLei
>

Re: ClientProtocol create、mkdirs 、rename and delete methods are not Idempotent

Posted by Ted Dunning <td...@maprtech.com>.
Create cannot be idempotent because of the problem of watches and
sequential files.

Similarly, mkdirs, rename and delete cannot generally be idempotent.  In
particular applications, you might find it is OK to treat them as such, but
there are definitely applications where they are not idempotent.

On Sun, Oct 28, 2012 at 2:40 AM, lei liu <li...@gmail.com> wrote:

> I think these methods should are idempotent, these methods should be repeated
> calls to be harmless by same client.
>
>
> Thanks,
>
> LiuLei
>

Re: ClientProtocol create、mkdirs 、rename and delete methods are not Idempotent

Posted by Ted Dunning <td...@maprtech.com>.
Create cannot be idempotent because of the problem of watches and
sequential files.

Similarly, mkdirs, rename and delete cannot generally be idempotent.  In
particular applications, you might find it is OK to treat them as such, but
there are definitely applications where they are not idempotent.

On Sun, Oct 28, 2012 at 2:40 AM, lei liu <li...@gmail.com> wrote:

> I think these methods should are idempotent, these methods should be repeated
> calls to be harmless by same client.
>
>
> Thanks,
>
> LiuLei
>