You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@curator.apache.org by Michael Peterson <qu...@gmail.com> on 2015/01/20 16:23:13 UTC

InterProcessMutex doesn't detect deletion of lock file

Hi,

I am fairly new to Curator and ZK, so apologies if this is has been asked
before.  I haven't found anything yet that addresses it.

My ZK use case is very simple - HA failover.  Two processes get launched -
one does the work and the other waits to take over in case the other dies
or otherwise stops working.

The Curator InterProcessMutex fits the bill.  However, without too much
effort I've found a scenario where both Process A and Process B both think
they are the owner at the same time and start doing the work, causing data
corruption.

The scenario is simply to delete the lock file, which I did via the ZK CLI
(zkCli.sh).  The problem is that the InterProcessMutex currently holding
the lock doesn't seem to notice that the lock file got deleted, but the
InterProcessMutex in the waiting (failover) process *does* notice and
creates a new lock and starts doing work.

Does the InterProcessMutex set a watch on the lock file it creates?  If
not, why not?


Idea #1:

I tried setting all the Listeners I could figure how to set to detect the
NodeDeleted event:

- CuratorListener
- ConnectionStateListener
- UnhandledErrorListener

but none get signaled when I manually delete the lock file.


Idea #2:

Is the solution to set my own watch on the lock file that the IPMutex
created?  If so, I see that one way to get the file name of the lock is to
call InterProcessMutex#getParticipantNodes().  But the problem is that
there can be more than one lock file - it seems

    [zk: localhost:2181(CONNECTED) 7] ls /XXX/masterlock
    [_c_c1dc399d-b6e4-4051-bd5c-2e300e62bc58-lock-0000000003,
_c_bf5de8b2-ed33-4f89-a737-4061f2072c3f-lock-0000000000]

    [zk: localhost:2181(CONNECTED) 37] ls /XXX/masterlock
    [_c_63490235-7ab6-461d-bab2-401d4439db4f-lock-0000000018, \
     _c_1e57c64e-b990-4f9a-96f9-fccf56c0421e-lock-0000000012, \
     _c_f09ee1e5-0e47-47a7-961e-d7745ffbfc28-lock-0000000017, \
     _c_2f9ebe06-b91c-4886-b916-34ff1fa83541-lock-0000000016]

And it seems that I can't use the one with the smallest sequential lock
number, because the smallest one might be hanging around from a crashed
lockholder and it has expired yet - that is the case in the above example:
lock-00000012 is just waiting to be expired after a crash.

So I don't know how to tell which lock is "mine" to set a watch on using
that method.



Idea #3:

I see that the InterProcessMutex also takes an optional
`LockInternalsDriver` argument.  I looked into that code and there I see
that it has access to the lock file name.  In addition, in the
`getsTheLock` method it creates a PredicateResults object with a
`pathToWatch` arg, which sounds promising, but in the default impl with my
setup that pathToWatch is null.

So I then created my own CustomLockInternalsDriver and put the lock-file
name in pathToWatch (not sure that would work), but when I set
`pathToWatch` to the actual lock path, still nothing happens when I delete
the file.

So then I recorded the path to my lock in the CustomLockInternalsDriver so
I could get it in my mainline code and set a WATCH manually/myself.  That
ends up working.  But that's a lot of work and it's not at all clear what
the right solution is and whether it is dangerous to fiddle with creating
my own LockInternalsDriver impl.

What is the right way to solve this issue?


--- How to REPRODUCE ---

Here's a link to a gist with my test code:
https://gist.github.com/quux00/f6be8fe223a7832ef514
Also a gist to my CustomLockInternalsDriver:
https://gist.github.com/quux00/ab37cedc46cb5368c853

Start up two instances of that code. One will indicate it is "working" and
the other "waiting". I then use zkCli.sh to delete the file:

    $ ./zkCli.sh
    [zk: localhost:2181(CONNECTED) 111] ls /XXX/masterlock
    [_c_fd2dcb51-d5e1-4f27-afdf-7a8f75c1b85b-lock-0000000006]
    [zk: localhost:2181(CONNECTED) 112] delete
/XXX/masterlock/_c_fd2dcb51-d5e1-4f27-afdf-7a8f75c1b85b-lock-0000000006
    [zk: localhost:2181(CONNECTED) 113] ls /XXX/masterlock
    []

The "waiting" process will now create a new lock file and now both
processes are "working".

Thank you,
Michael

Re: InterProcessMutex doesn't detect deletion of lock file

Posted by Mike Drob <ma...@cloudera.com>.
I'm with Jordan on this. I would not expect Curator to continue to work if
somebody did "rm -rf /" on their server. I also do not think this is a
situation that we should have to account for, despite personally having
seen it more than once. I'll admit that I'm exaggerating the argument a
little bit, but at some point we have to trust that underlying
infrastructure, like the file system, will be there and operators won't
break it.

On Tue, Jan 20, 2015 at 11:48 AM, Jordan Zimmerman <
jordan@jordanzimmerman.com> wrote:

> In the many years of Curators’ existence no one that I know has had an
> issue with this. ZooKeeper is very robust and nodes do not get deleted
> abnormally like this. You are posing a hypothetical situation. It’s not
> reasonable to handle every single edge case. This would be the equivalent
> of someone going into the production database and arbitrarily deleting
> records. The locking code is already incredibly complicated and I wouldn’t
> want to burden it with this new behavior and overhead. However, if you can
> make it work reasonably please provide a PR and the committers will look at
> it.
>
> -Jordan
>
>
>
> On January 20, 2015 at 12:38:36 PM, Michael Peterson (quux00@gmail.com)
> wrote:
>
> > But manually deleting the lock node is not normal behavior.
> > It should never happen in production.
>
> I agree that it would be abnormal.  But abnormal doesn't mean impossible.
>
> > Can you explain the scenario in more detail?
>
> There may be a bug in ZK (now or in the future) that in some rare cases
> deletes a file when it should not.
>
> Or a team might in the practice of managing their ZK ensemble via the ZK
> CLI and someone might accidentally type:
> "delete /XXX/masterlock
> /_c_c6101d8e-5af2-4290-8bc6-4005048c9a77-lock-0000000000"
>
> rather than
>
> "get /XXX/masterlock
> /_c_c6101d8e-5af2-4290-8bc6-4005048c9a77-lock-0000000000".
>
> Or even worse, type
> "rmr /XXX/masterlock".
>
> (I've seen a somewhat similar manual mistake done on HDFS of a production
> Hadoop system where months of data was deleted using up-arrow too fast
> and issuing a -rmr instead of -ls cmd.)
>
> For a system where I need to be absolutely sure that I and only I have the
> lock, this abnormal "backdoor" deletion possibility worries me.  To build a
> truly robust system, you have to handle all the possibilities you can.
>
> The https://issues.apache.org/jira/browse/CURATOR-171 issue referenced
> earlier seems to be arguing the same thing.
>
>
> On Tue, Jan 20, 2015 at 11:42 AM, Jordan Zimmerman <jordan@jordanzimmerman
> .com> wrote:
>
>>  But manually deleting the lock node is not normal behavior. It should
>> never happen in production. Can you explain the scenario in more detail?
>>
>>  -JZ
>>
>>

Re: InterProcessMutex doesn't detect deletion of lock file

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.
In the many years of Curators’ existence no one that I know has had an issue with this. ZooKeeper is very robust and nodes do not get deleted abnormally like this. You are posing a hypothetical situation. It’s not reasonable to handle every single edge case. This would be the equivalent of someone going into the production database and arbitrarily deleting records. The locking code is already incredibly complicated and I wouldn’t want to burden it with this new behavior and overhead. However, if you can make it work reasonably please provide a PR and the committers will look at it.

-Jordan



On January 20, 2015 at 12:38:36 PM, Michael Peterson (quux00@gmail.com) wrote:

> But manually deleting the lock node is not normal behavior.
> It should never happen in production.

I agree that it would be abnormal.  But abnormal doesn't mean impossible.

> Can you explain the scenario in more detail?

There may be a bug in ZK (now or in the future) that in some rare cases deletes a file when it should not.

Or a team might in the practice of managing their ZK ensemble via the ZK CLI and someone might accidentally type:
"delete /XXX/masterlock/_c_c6101d8e-5af2-4290-8bc6-4005048c9a77-lock-0000000000"

rather than

"get /XXX/masterlock/_c_c6101d8e-5af2-4290-8bc6-4005048c9a77-lock-0000000000". 

Or even worse, type
"rmr /XXX/masterlock". 

(I've seen a somewhat similar manual mistake done on HDFS of a production Hadoop system where months of data was deleted using up-arrow too fast and issuing a -rmr instead of -ls cmd.)

For a system where I need to be absolutely sure that I and only I have the lock, this abnormal "backdoor" deletion possibility worries me.  To build a truly robust system, you have to handle all the possibilities you can.

The https://issues.apache.org/jira/browse/CURATOR-171 issue referenced earlier seems to be arguing the same thing.


On Tue, Jan 20, 2015 at 11:42 AM, Jordan Zimmerman <jo...@jordanzimmerman.com> wrote:
But manually deleting the lock node is not normal behavior. It should never happen in production. Can you explain the scenario in more detail? 

-JZ


Re: InterProcessMutex doesn't detect deletion of lock file

Posted by Michael Peterson <qu...@gmail.com>.
> But manually deleting the lock node is not normal behavior.
> It should never happen in production.

I agree that it would be abnormal.  But abnormal doesn't mean impossible.

> Can you explain the scenario in more detail?

There may be a bug in ZK (now or in the future) that in some rare cases
deletes a file when it should not.

Or a team might in the practice of managing their ZK ensemble via the ZK CLI
and someone might accidentally type:
"delete /XXX/masterlock/_c_c6101d8e-5af2-4290-8bc6-4005048c9a77-lock-0000000000"


rather than

"get /XXX/masterlock/_c_c6101d8e-5af2-4290-8bc6-4005048c9a77-lock-0000000000".


Or even worse, type
"rmr /XXX/masterlock".

(I've seen a somewhat similar manual mistake done on HDFS of a production
Hadoop system where months of data was deleted using up-arrow too fast and
issuing a -rmr instead of -ls cmd.)

For a system where I need to be absolutely sure that I and only I have the
lock, this abnormal "backdoor" deletion possibility worries me.  To build a
truly robust system, you have to handle all the possibilities you can.

The https://issues.apache.org/jira/browse/CURATOR-171 issue referenced
earlier seems to be arguing the same thing.


On Tue, Jan 20, 2015 at 11:42 AM, Jordan Zimmerman <jordan@jordanzimmerman
.com> wrote:

> But manually deleting the lock node is not normal behavior. It should
> never happen in production. Can you explain the scenario in more detail?
>
> -JZ
>
>

Re: InterProcessMutex doesn't detect deletion of lock file

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.
But manually deleting the lock node is not normal behavior. It should never happen in production. Can you explain the scenario in more detail? 

-JZ



On January 20, 2015 at 10:47:20 AM, John Vines (vines@apache.org) wrote:

Sounds similar to https://issues.apache.org/jira/browse/CURATOR-171

On Tue, Jan 20, 2015 at 10:23 AM, Michael Peterson <qu...@gmail.com> wrote:
Hi,

I am fairly new to Curator and ZK, so apologies if this is has been asked before.  I haven't found anything yet that addresses it.

My ZK use case is very simple - HA failover.  Two processes get launched - one does the work and the other waits to take over in case the other dies or otherwise stops working.

The Curator InterProcessMutex fits the bill.  However, without too much effort I've found a scenario where both Process A and Process B both think they are the owner at the same time and start doing the work, causing data corruption.

The scenario is simply to delete the lock file, which I did via the ZK CLI (zkCli.sh).  The problem is that the InterProcessMutex currently holding the lock doesn't seem to notice that the lock file got deleted, but the InterProcessMutex in the waiting (failover) process *does* notice and creates a new lock and starts doing work.

Does the InterProcessMutex set a watch on the lock file it creates?  If not, why not?


Idea #1:

I tried setting all the Listeners I could figure how to set to detect the NodeDeleted event:

- CuratorListener
- ConnectionStateListener
- UnhandledErrorListener

but none get signaled when I manually delete the lock file.


Idea #2:

Is the solution to set my own watch on the lock file that the IPMutex created?  If so, I see that one way to get the file name of the lock is to call InterProcessMutex#getParticipantNodes().  But the problem is that there can be more than one lock file - it seems

    [zk: localhost:2181(CONNECTED) 7] ls /XXX/masterlock
    [_c_c1dc399d-b6e4-4051-bd5c-2e300e62bc58-lock-0000000003, _c_bf5de8b2-ed33-4f89-a737-4061f2072c3f-lock-0000000000]

    [zk: localhost:2181(CONNECTED) 37] ls /XXX/masterlock
    [_c_63490235-7ab6-461d-bab2-401d4439db4f-lock-0000000018, \
     _c_1e57c64e-b990-4f9a-96f9-fccf56c0421e-lock-0000000012, \
     _c_f09ee1e5-0e47-47a7-961e-d7745ffbfc28-lock-0000000017, \
     _c_2f9ebe06-b91c-4886-b916-34ff1fa83541-lock-0000000016]

And it seems that I can't use the one with the smallest sequential lock number, because the smallest one might be hanging around from a crashed lockholder and it has expired yet - that is the case in the above example: lock-00000012 is just waiting to be expired after a crash.

So I don't know how to tell which lock is "mine" to set a watch on using that method.



Idea #3:

I see that the InterProcessMutex also takes an optional `LockInternalsDriver` argument.  I looked into that code and there I see that it has access to the lock file name.  In addition, in the `getsTheLock` method it creates a PredicateResults object with a `pathToWatch` arg, which sounds promising, but in the default impl with my setup that pathToWatch is null. 

So I then created my own CustomLockInternalsDriver and put the lock-file name in pathToWatch (not sure that would work), but when I set `pathToWatch` to the actual lock path, still nothing happens when I delete the file.

So then I recorded the path to my lock in the CustomLockInternalsDriver so I could get it in my mainline code and set a WATCH manually/myself.  That ends up working.  But that's a lot of work and it's not at all clear what the right solution is and whether it is dangerous to fiddle with creating my own LockInternalsDriver impl.

What is the right way to solve this issue?


--- How to REPRODUCE ---

Here's a link to a gist with my test code:   https://gist.github.com/quux00/f6be8fe223a7832ef514
Also a gist to my CustomLockInternalsDriver: https://gist.github.com/quux00/ab37cedc46cb5368c853

Start up two instances of that code. One will indicate it is "working" and the other "waiting". I then use zkCli.sh to delete the file:

    $ ./zkCli.sh
    [zk: localhost:2181(CONNECTED) 111] ls /XXX/masterlock
    [_c_fd2dcb51-d5e1-4f27-afdf-7a8f75c1b85b-lock-0000000006]
    [zk: localhost:2181(CONNECTED) 112] delete /XXX/masterlock/_c_fd2dcb51-d5e1-4f27-afdf-7a8f75c1b85b-lock-0000000006
    [zk: localhost:2181(CONNECTED) 113] ls /XXX/masterlock
    []

The "waiting" process will now create a new lock file and now both processes are "working".

Thank you,
Michael



Re: InterProcessMutex doesn't detect deletion of lock file

Posted by Michael Peterson <qu...@gmail.com>.
> It does not. It doesn’t because it’s extra overhead that isn’t needed.
Watchers are not free. That said, if you want to be able to revoke a lock
from another process, InterProcessMutex has the makeRevocable() method and
the RevocationListener.

My issue is not related to revocation.  My issue is that the
InterProcessMutex as implemented is not safe for the scenario I outlined.
Some people may prefer safety over slighly better performance, so I would
argue that there needs to be an option on InterProcessMutex that sets a
watch on the lock file.

-Michael


On Tue, Jan 20, 2015 at 11:22 AM, Jordan Zimmerman <
jordan@jordanzimmerman.com> wrote:

>
> Does the InterProcessMutex set a watch on the lock file it creates?  If
> not, why not?
>
> It does not. It doesn’t because it’s extra overhead that isn’t needed.
> Watchers are not free. That said, if you want to be able to revoke a lock
> from another process, InterProcessMutex has the makeRevocable() method and
> the RevocationListener.
>
>
> -Jordan
>

Re: InterProcessMutex doesn't detect deletion of lock file

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.
Does the InterProcessMutex set a watch on the lock file it creates?  If not, why not?
It does not. It doesn’t because it’s extra overhead that isn’t needed. Watchers are not free. That said, if you want to be able to revoke a lock from another process, InterProcessMutex has the makeRevocable() method and the RevocationListener.



-Jordan


Re: InterProcessMutex doesn't detect deletion of lock file

Posted by Michael Peterson <qu...@gmail.com>.
Thanks. That bug is for LeaderLatch. Should I open another bug on
InterProcessMutex?  Or just add commentary to the CURATOR-171 issue?

Can anyone address my workaround option (Idea #3 above) - namely
implementing my own custom LockInternalsDriver and setting my own WATCH on
the lock file.  Any ideas if that will hit problems?

On Tue, Jan 20, 2015 at 10:46 AM, John Vines <vi...@apache.org> wrote:

> Sounds similar to https://issues.apache.org/jira/browse/CURATOR-171
>
> On Tue, Jan 20, 2015 at 10:23 AM, Michael Peterson <qu...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I am fairly new to Curator and ZK, so apologies if this is has been asked
>> before.  I haven't found anything yet that addresses it.
>>
>> My ZK use case is very simple - HA failover.  Two processes get launched
>> - one does the work and the other waits to take over in case the other dies
>> or otherwise stops working.
>>
>> The Curator InterProcessMutex fits the bill.  However, without too much
>> effort I've found a scenario where both Process A and Process B both think
>> they are the owner at the same time and start doing the work, causing data
>> corruption.
>>
>> The scenario is simply to delete the lock file, which I did via the ZK
>> CLI (zkCli.sh).  The problem is that the InterProcessMutex currently
>> holding the lock doesn't seem to notice that the lock file got deleted, but
>> the InterProcessMutex in the waiting (failover) process *does* notice and
>> creates a new lock and starts doing work.
>>
>> Does the InterProcessMutex set a watch on the lock file it creates?  If
>> not, why not?
>>
>>
>> Idea #1:
>>
>> I tried setting all the Listeners I could figure how to set to detect the
>> NodeDeleted event:
>>
>> - CuratorListener
>> - ConnectionStateListener
>> - UnhandledErrorListener
>>
>> but none get signaled when I manually delete the lock file.
>>
>>
>> Idea #2:
>>
>> Is the solution to set my own watch on the lock file that the IPMutex
>> created?  If so, I see that one way to get the file name of the lock is to
>> call InterProcessMutex#getParticipantNodes().  But the problem is that
>> there can be more than one lock file - it seems
>>
>>     [zk: localhost:2181(CONNECTED) 7] ls /XXX/masterlock
>>     [_c_c1dc399d-b6e4-4051-bd5c-2e300e62bc58-lock-0000000003,
>> _c_bf5de8b2-ed33-4f89-a737-4061f2072c3f-lock-0000000000]
>>
>>     [zk: localhost:2181(CONNECTED) 37] ls /XXX/masterlock
>>     [_c_63490235-7ab6-461d-bab2-401d4439db4f-lock-0000000018, \
>>      _c_1e57c64e-b990-4f9a-96f9-fccf56c0421e-lock-0000000012, \
>>      _c_f09ee1e5-0e47-47a7-961e-d7745ffbfc28-lock-0000000017, \
>>      _c_2f9ebe06-b91c-4886-b916-34ff1fa83541-lock-0000000016]
>>
>> And it seems that I can't use the one with the smallest sequential lock
>> number, because the smallest one might be hanging around from a crashed
>> lockholder and it has expired yet - that is the case in the above example:
>> lock-00000012 is just waiting to be expired after a crash.
>>
>> So I don't know how to tell which lock is "mine" to set a watch on using
>> that method.
>>
>>
>>
>> Idea #3:
>>
>> I see that the InterProcessMutex also takes an optional
>> `LockInternalsDriver` argument.  I looked into that code and there I see
>> that it has access to the lock file name.  In addition, in the
>> `getsTheLock` method it creates a PredicateResults object with a
>> `pathToWatch` arg, which sounds promising, but in the default impl with my
>> setup that pathToWatch is null.
>>
>> So I then created my own CustomLockInternalsDriver and put the lock-file
>> name in pathToWatch (not sure that would work), but when I set
>> `pathToWatch` to the actual lock path, still nothing happens when I delete
>> the file.
>>
>> So then I recorded the path to my lock in the CustomLockInternalsDriver
>> so I could get it in my mainline code and set a WATCH manually/myself.
>> That ends up working.  But that's a lot of work and it's not at all clear
>> what the right solution is and whether it is dangerous to fiddle with
>> creating my own LockInternalsDriver impl.
>>
>> What is the right way to solve this issue?
>>
>>
>> --- How to REPRODUCE ---
>>
>> Here's a link to a gist with my test code:
>> https://gist.github.com/quux00/f6be8fe223a7832ef514
>> Also a gist to my CustomLockInternalsDriver:
>> https://gist.github.com/quux00/ab37cedc46cb5368c853
>>
>> Start up two instances of that code. One will indicate it is "working"
>> and the other "waiting". I then use zkCli.sh to delete the file:
>>
>>     $ ./zkCli.sh
>>     [zk: localhost:2181(CONNECTED) 111] ls /XXX/masterlock
>>     [_c_fd2dcb51-d5e1-4f27-afdf-7a8f75c1b85b-lock-0000000006]
>>     [zk: localhost:2181(CONNECTED) 112] delete
>> /XXX/masterlock/_c_fd2dcb51-d5e1-4f27-afdf-7a8f75c1b85b-lock-0000000006
>>     [zk: localhost:2181(CONNECTED) 113] ls /XXX/masterlock
>>     []
>>
>> The "waiting" process will now create a new lock file and now both
>> processes are "working".
>>
>> Thank you,
>> Michael
>>
>>
>

Re: InterProcessMutex doesn't detect deletion of lock file

Posted by John Vines <vi...@apache.org>.
Sounds similar to https://issues.apache.org/jira/browse/CURATOR-171

On Tue, Jan 20, 2015 at 10:23 AM, Michael Peterson <qu...@gmail.com> wrote:

> Hi,
>
> I am fairly new to Curator and ZK, so apologies if this is has been asked
> before.  I haven't found anything yet that addresses it.
>
> My ZK use case is very simple - HA failover.  Two processes get launched -
> one does the work and the other waits to take over in case the other dies
> or otherwise stops working.
>
> The Curator InterProcessMutex fits the bill.  However, without too much
> effort I've found a scenario where both Process A and Process B both think
> they are the owner at the same time and start doing the work, causing data
> corruption.
>
> The scenario is simply to delete the lock file, which I did via the ZK CLI
> (zkCli.sh).  The problem is that the InterProcessMutex currently holding
> the lock doesn't seem to notice that the lock file got deleted, but the
> InterProcessMutex in the waiting (failover) process *does* notice and
> creates a new lock and starts doing work.
>
> Does the InterProcessMutex set a watch on the lock file it creates?  If
> not, why not?
>
>
> Idea #1:
>
> I tried setting all the Listeners I could figure how to set to detect the
> NodeDeleted event:
>
> - CuratorListener
> - ConnectionStateListener
> - UnhandledErrorListener
>
> but none get signaled when I manually delete the lock file.
>
>
> Idea #2:
>
> Is the solution to set my own watch on the lock file that the IPMutex
> created?  If so, I see that one way to get the file name of the lock is to
> call InterProcessMutex#getParticipantNodes().  But the problem is that
> there can be more than one lock file - it seems
>
>     [zk: localhost:2181(CONNECTED) 7] ls /XXX/masterlock
>     [_c_c1dc399d-b6e4-4051-bd5c-2e300e62bc58-lock-0000000003,
> _c_bf5de8b2-ed33-4f89-a737-4061f2072c3f-lock-0000000000]
>
>     [zk: localhost:2181(CONNECTED) 37] ls /XXX/masterlock
>     [_c_63490235-7ab6-461d-bab2-401d4439db4f-lock-0000000018, \
>      _c_1e57c64e-b990-4f9a-96f9-fccf56c0421e-lock-0000000012, \
>      _c_f09ee1e5-0e47-47a7-961e-d7745ffbfc28-lock-0000000017, \
>      _c_2f9ebe06-b91c-4886-b916-34ff1fa83541-lock-0000000016]
>
> And it seems that I can't use the one with the smallest sequential lock
> number, because the smallest one might be hanging around from a crashed
> lockholder and it has expired yet - that is the case in the above example:
> lock-00000012 is just waiting to be expired after a crash.
>
> So I don't know how to tell which lock is "mine" to set a watch on using
> that method.
>
>
>
> Idea #3:
>
> I see that the InterProcessMutex also takes an optional
> `LockInternalsDriver` argument.  I looked into that code and there I see
> that it has access to the lock file name.  In addition, in the
> `getsTheLock` method it creates a PredicateResults object with a
> `pathToWatch` arg, which sounds promising, but in the default impl with my
> setup that pathToWatch is null.
>
> So I then created my own CustomLockInternalsDriver and put the lock-file
> name in pathToWatch (not sure that would work), but when I set
> `pathToWatch` to the actual lock path, still nothing happens when I delete
> the file.
>
> So then I recorded the path to my lock in the CustomLockInternalsDriver so
> I could get it in my mainline code and set a WATCH manually/myself.  That
> ends up working.  But that's a lot of work and it's not at all clear what
> the right solution is and whether it is dangerous to fiddle with creating
> my own LockInternalsDriver impl.
>
> What is the right way to solve this issue?
>
>
> --- How to REPRODUCE ---
>
> Here's a link to a gist with my test code:
> https://gist.github.com/quux00/f6be8fe223a7832ef514
> Also a gist to my CustomLockInternalsDriver:
> https://gist.github.com/quux00/ab37cedc46cb5368c853
>
> Start up two instances of that code. One will indicate it is "working" and
> the other "waiting". I then use zkCli.sh to delete the file:
>
>     $ ./zkCli.sh
>     [zk: localhost:2181(CONNECTED) 111] ls /XXX/masterlock
>     [_c_fd2dcb51-d5e1-4f27-afdf-7a8f75c1b85b-lock-0000000006]
>     [zk: localhost:2181(CONNECTED) 112] delete
> /XXX/masterlock/_c_fd2dcb51-d5e1-4f27-afdf-7a8f75c1b85b-lock-0000000006
>     [zk: localhost:2181(CONNECTED) 113] ls /XXX/masterlock
>     []
>
> The "waiting" process will now create a new lock file and now both
> processes are "working".
>
> Thank you,
> Michael
>
>