You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@roller.apache.org by Elias Torres <el...@torrez.us> on 2006/11/23 01:10:38 UTC

TaskLock implementation

Folks,

I have been doing some work on clustered search (more on future email)
and I begun looking at the task lock recently implemented in 3.1. I
believe (and have verified with Allen) that we might need to make some
changes to both the API and implementation so it works properly on a
cluster of size 2 or more. Currently I think we can have a couple of
places where two nodes can obtain a lock at the same time.

I'm working on some pseudo-code you all can review to make sure we are
safe from these issues above. I also want to refactor the interfaces to
reflect more the use of leases as opposed to locks.

interface LeaseManager {

  registerLease(String name, String id, long leaseTime);
  renewLease(String name, String id);
  unregisterLease(String name, String id);

}

If you notice, I'm introducing a new property (id) that will be used to
identify which node in the cluster has the lease in order that we can
allow for renewals if they desire so. Currently, we only have support
for task name and that's enough for what I'm trying to do. I'll explain
more in my next email and if I'm brief on this one is because I must
worry about dinner and putting the kids to bed.

-Elias

Re: TaskLock implementation

Posted by Elias Torres <el...@torrez.us>.
On 11/22/06, Allen Gilliland <Al...@sun.com> wrote:
>
>
> Elias Torres wrote:
> > On 11/22/06, Allen Gilliland <Al...@sun.com> wrote:
> >>
> >>
> >> Elias Torres wrote:
> >> > Folks,
> >> >
> >> > I have been doing some work on clustered search (more on future email)
> >> > and I begun looking at the task lock recently implemented in 3.1. I
> >> > believe (and have verified with Allen) that we might need to make some
> >> > changes to both the API and implementation so it works properly on a
> >> > cluster of size 2 or more. Currently I think we can have a couple of
> >> > places where two nodes can obtain a lock at the same time.
> >> >
> >> > I'm working on some pseudo-code you all can review to make sure we are
> >> > safe from these issues above. I also want to refactor the interfaces to
> >> > reflect more the use of leases as opposed to locks.
> >> >
> >> > interface LeaseManager {
> >> >
> >> >   registerLease(String name, String id, long leaseTime);
> >> >   renewLease(String name, String id);
> >> >   unregisterLease(String name, String id);
> >> >
> >> > }
> >>
> >> I don't mind any of these changes if it's going to improve the
> >> implementation, but now that I look at it I don't see how this is all
> >> that different from what we have now ...
> >>
> >> acquireLock(RollerTask)
> >> releaseLock(RollerTask)
> >> isLocked(RollerTask)
> >
> > What's the use case for isLocked? I'm not sure why would I need to
> > test if something is locked. If I need it, I try to acquire it.
>
> Ummm ... well for just doing the leasing it's probably not required.  I
> think the reason I wanted it was for better condition handling so that
> it's possible that we could want to do something special in the event
> that a task is trying to run but is currently locked.  If all you do is
> try to acquire the lock and fail you may not know that it failed because
> it was locked, it could have failed because you couldn't contact the db,
> or because the interval time hadn't elapsed yet.
>
> We could get by without it though.
>
>
> >
> >>
> >> I can see how just calling things a lease is possibly a better way to
> >> describe what's happening, but calling registerLease() is the exact same
> >> thing as acquiring a lock in the current code.  When you acquire a lock
> >> now you get the lock for a given amount of time, or until you manually
> >> release it.
> >
> > I didn't say that changing the name would mean different function, but
> > we must be consistent in our naming. JINI doesn't call it a
> > LockManager, it calls it a LeaseManager. If anything, I know you are
> > always a proponent of naming things what they do/mean.
>
> Fair enough.  I guess I just don't see that just because JINI calls it a
> lease that we have to call it the same thing.  But it doesn't really
> matter to me, calling it a lease is fine.
>
>
> >
> >>
> >> The current code doesn't have any way to renew a lease (or lock), so
> >> that definitely sounds like a good addition if we plan to have tasks
> >> which may run over their initial lease time and want to extend the
> >> lease.  I didn't really envision that happening, but it sounds
> >> reasonable.
> >>
> >
> > k. cool beans.
> >
> >> And of course, unregistering a lease is the same thing as releasing a
> >> lock, right?  So all in all I don't think we are really changing much.
> >>
> >> But in any case, I have a few comments ...
> >>
> >> 1. I would prefer that these just remain in the ThreadManager where the
> >> existing methods are now.  I don't think we need a whole new Manager for
> >> this.
> >
> > I'm fine with them staying in the ThreadManager.
> >
> >>
> >> 2. I think we would still need an isLeased() method which is basically
> >> just a convenience for any task which is about to try and obtain a lease
> >> to see if it is already leased.
> >
> > See my comment above.
> >
> >>
> >> 3. I also think that we want to stick with having these methods take in
> >> RollerTask objects rather than individualized parameters.  The
> >> RollerTask class already forces subclasses to define a name and a lease
> >> time, so those are already included, and it should be easy enough to
> >> just add in an id attribute which would just need to be unique to each
> >> node of the cluster somehow.
> >
> > I'm sort of ok with using a POJO instead of parameters. Read query
> > below for counter argument.
> >
> > One thing I forgot was that we need to make RollerTask more abstract
> > since your implementation assumes a Task wants to lock/unlock on every
> > run. I believe we need a really abstract RollerTask and have two or
> > three kinds of tasks: register/unregister lease in single run,
> > register/renew lease on every run (master indexer scenario), no-op
> > (e.g. independent task).
>
> That's fine with me, but what's the no-op scenario?  I would think that
> all tasks should go through the leasing process in some form or another.

no-op is when I want a task to run on a schedule by your Timer class
and doesn't need to acquire a lease or anything. For example, the
SlaveIndexTask in my clustered search work. All it needs to do is
check a file on a network share, if found copy it over, else exit.

>
>
> >
> >>
> >> I think the big changes which you didn't really detail in this email is
> >> that we would be shifting more of the logic around dealing with time
> >> from app time to database time to make sure that all nodes are properly
> >> synced.  Then we would also want to ensure the leasing process was rigid
> >> enough to prevent any possible race conditions, which I believe at the
> >> moment is not the case.  Accomplishing those two things definitely sound
> >> like a good idea to me.
> >
> > Let me take a quick stab at it.
> >
> > 0. We need a constraint on task name so we can't have duplicate rows.
>
> agreed.
>
>
> >
> > registerLease()
> >
> > 1. try to update the table if the lease is expired or we already have
> > this lease
> >
> > update task_lease set name=name, id=id, duration=duration,
> > starttime=CURRENT_TIMESTAMP where name = name and starttime + duration
> > < CURRENT_TIMESTAMP or id = id
> >
> > if it returns 1 row updated, we are done.
> > else if return 0 rows updated, we need to insert a new row, if
> > success, we obtained the lock, else we return false, didn't register
> > lease.
>
> looks good to me.
>
>
> >
> > unregisterLease() is easy, just remove the row, if name and id match.
>
> i'm not sure about deleting the row.  i think we need to keep the rows
> so that we can maintain the last run time for a task, even if it was
> unleased.  the last run time is essential to ensuring that all nodes are
> on a synchronized clock for the execution times of the tasks.

I see that it might be good information to maintain the row to know
last run information.

>
> in fact, it may be ideal if there is some kind of initLease(RollerTask)
> method which can be used to initialize the row in the database in an
> unleased state.  this simplifies the logic a bit as well because then we
> only have to do the update statement to acquire the lock, not insert logic.
>

Most locking/leasing interfaces do without init. The less number of
methods the better. Less semantics to define and ways to get it wrong.

>
> >
> > renewLease() is basically just the update query above. if returns 1
> > row updated success, else failure.
> >
> > Maybe the registerLease can be changed to first test for existence of
> > row with name = name. Anyways, this is roughly what I'm thinking and
> > if you notice, we don't make too much use of the POJO, so passing a
> > POJO seems too complicated for just a few parameters in one of the
> > methods.
>
> overall it sounds good to me.
>
> -- Allen
>
>
> >
> > -Elias
> >
> >>
> >> -- Allen
> >>
> >>
> >> >
> >> > If you notice, I'm introducing a new property (id) that will be used to
> >> > identify which node in the cluster has the lease in order that we can
> >> > allow for renewals if they desire so. Currently, we only have support
> >> > for task name and that's enough for what I'm trying to do. I'll explain
> >> > more in my next email and if I'm brief on this one is because I must
> >> > worry about dinner and putting the kids to bed.
> >> >
> >> > -Elias
> >>
>

Re: TaskLock implementation

Posted by Allen Gilliland <Al...@Sun.COM>.

Elias Torres wrote:
> On 11/22/06, Allen Gilliland <Al...@sun.com> wrote:
>>
>>
>> Elias Torres wrote:
>> > Folks,
>> >
>> > I have been doing some work on clustered search (more on future email)
>> > and I begun looking at the task lock recently implemented in 3.1. I
>> > believe (and have verified with Allen) that we might need to make some
>> > changes to both the API and implementation so it works properly on a
>> > cluster of size 2 or more. Currently I think we can have a couple of
>> > places where two nodes can obtain a lock at the same time.
>> >
>> > I'm working on some pseudo-code you all can review to make sure we are
>> > safe from these issues above. I also want to refactor the interfaces to
>> > reflect more the use of leases as opposed to locks.
>> >
>> > interface LeaseManager {
>> >
>> >   registerLease(String name, String id, long leaseTime);
>> >   renewLease(String name, String id);
>> >   unregisterLease(String name, String id);
>> >
>> > }
>>
>> I don't mind any of these changes if it's going to improve the
>> implementation, but now that I look at it I don't see how this is all
>> that different from what we have now ...
>>
>> acquireLock(RollerTask)
>> releaseLock(RollerTask)
>> isLocked(RollerTask)
> 
> What's the use case for isLocked? I'm not sure why would I need to
> test if something is locked. If I need it, I try to acquire it.

Ummm ... well for just doing the leasing it's probably not required.  I 
think the reason I wanted it was for better condition handling so that 
it's possible that we could want to do something special in the event 
that a task is trying to run but is currently locked.  If all you do is 
try to acquire the lock and fail you may not know that it failed because 
it was locked, it could have failed because you couldn't contact the db, 
or because the interval time hadn't elapsed yet.

We could get by without it though.


> 
>>
>> I can see how just calling things a lease is possibly a better way to
>> describe what's happening, but calling registerLease() is the exact same
>> thing as acquiring a lock in the current code.  When you acquire a lock
>> now you get the lock for a given amount of time, or until you manually
>> release it.
> 
> I didn't say that changing the name would mean different function, but
> we must be consistent in our naming. JINI doesn't call it a
> LockManager, it calls it a LeaseManager. If anything, I know you are
> always a proponent of naming things what they do/mean.

Fair enough.  I guess I just don't see that just because JINI calls it a 
lease that we have to call it the same thing.  But it doesn't really 
matter to me, calling it a lease is fine.


> 
>>
>> The current code doesn't have any way to renew a lease (or lock), so
>> that definitely sounds like a good addition if we plan to have tasks
>> which may run over their initial lease time and want to extend the
>> lease.  I didn't really envision that happening, but it sounds 
>> reasonable.
>>
> 
> k. cool beans.
> 
>> And of course, unregistering a lease is the same thing as releasing a
>> lock, right?  So all in all I don't think we are really changing much.
>>
>> But in any case, I have a few comments ...
>>
>> 1. I would prefer that these just remain in the ThreadManager where the
>> existing methods are now.  I don't think we need a whole new Manager for
>> this.
> 
> I'm fine with them staying in the ThreadManager.
> 
>>
>> 2. I think we would still need an isLeased() method which is basically
>> just a convenience for any task which is about to try and obtain a lease
>> to see if it is already leased.
> 
> See my comment above.
> 
>>
>> 3. I also think that we want to stick with having these methods take in
>> RollerTask objects rather than individualized parameters.  The
>> RollerTask class already forces subclasses to define a name and a lease
>> time, so those are already included, and it should be easy enough to
>> just add in an id attribute which would just need to be unique to each
>> node of the cluster somehow.
> 
> I'm sort of ok with using a POJO instead of parameters. Read query
> below for counter argument.
> 
> One thing I forgot was that we need to make RollerTask more abstract
> since your implementation assumes a Task wants to lock/unlock on every
> run. I believe we need a really abstract RollerTask and have two or
> three kinds of tasks: register/unregister lease in single run,
> register/renew lease on every run (master indexer scenario), no-op
> (e.g. independent task).

That's fine with me, but what's the no-op scenario?  I would think that 
all tasks should go through the leasing process in some form or another.


> 
>>
>> I think the big changes which you didn't really detail in this email is
>> that we would be shifting more of the logic around dealing with time
>> from app time to database time to make sure that all nodes are properly
>> synced.  Then we would also want to ensure the leasing process was rigid
>> enough to prevent any possible race conditions, which I believe at the
>> moment is not the case.  Accomplishing those two things definitely sound
>> like a good idea to me.
> 
> Let me take a quick stab at it.
> 
> 0. We need a constraint on task name so we can't have duplicate rows.

agreed.


> 
> registerLease()
> 
> 1. try to update the table if the lease is expired or we already have 
> this lease
> 
> update task_lease set name=name, id=id, duration=duration,
> starttime=CURRENT_TIMESTAMP where name = name and starttime + duration
> < CURRENT_TIMESTAMP or id = id
> 
> if it returns 1 row updated, we are done.
> else if return 0 rows updated, we need to insert a new row, if
> success, we obtained the lock, else we return false, didn't register
> lease.

looks good to me.


> 
> unregisterLease() is easy, just remove the row, if name and id match.

i'm not sure about deleting the row.  i think we need to keep the rows 
so that we can maintain the last run time for a task, even if it was 
unleased.  the last run time is essential to ensuring that all nodes are 
on a synchronized clock for the execution times of the tasks.

in fact, it may be ideal if there is some kind of initLease(RollerTask) 
method which can be used to initialize the row in the database in an 
unleased state.  this simplifies the logic a bit as well because then we 
only have to do the update statement to acquire the lock, not insert logic.


> 
> renewLease() is basically just the update query above. if returns 1
> row updated success, else failure.
> 
> Maybe the registerLease can be changed to first test for existence of
> row with name = name. Anyways, this is roughly what I'm thinking and
> if you notice, we don't make too much use of the POJO, so passing a
> POJO seems too complicated for just a few parameters in one of the
> methods.

overall it sounds good to me.

-- Allen


> 
> -Elias
> 
>>
>> -- Allen
>>
>>
>> >
>> > If you notice, I'm introducing a new property (id) that will be used to
>> > identify which node in the cluster has the lease in order that we can
>> > allow for renewals if they desire so. Currently, we only have support
>> > for task name and that's enough for what I'm trying to do. I'll explain
>> > more in my next email and if I'm brief on this one is because I must
>> > worry about dinner and putting the kids to bed.
>> >
>> > -Elias
>>

Re: TaskLock implementation

Posted by Elias Torres <el...@torrez.us>.
On 11/22/06, Allen Gilliland <Al...@sun.com> wrote:
>
>
> Elias Torres wrote:
> > Folks,
> >
> > I have been doing some work on clustered search (more on future email)
> > and I begun looking at the task lock recently implemented in 3.1. I
> > believe (and have verified with Allen) that we might need to make some
> > changes to both the API and implementation so it works properly on a
> > cluster of size 2 or more. Currently I think we can have a couple of
> > places where two nodes can obtain a lock at the same time.
> >
> > I'm working on some pseudo-code you all can review to make sure we are
> > safe from these issues above. I also want to refactor the interfaces to
> > reflect more the use of leases as opposed to locks.
> >
> > interface LeaseManager {
> >
> >   registerLease(String name, String id, long leaseTime);
> >   renewLease(String name, String id);
> >   unregisterLease(String name, String id);
> >
> > }
>
> I don't mind any of these changes if it's going to improve the
> implementation, but now that I look at it I don't see how this is all
> that different from what we have now ...
>
> acquireLock(RollerTask)
> releaseLock(RollerTask)
> isLocked(RollerTask)

What's the use case for isLocked? I'm not sure why would I need to
test if something is locked. If I need it, I try to acquire it.

>
> I can see how just calling things a lease is possibly a better way to
> describe what's happening, but calling registerLease() is the exact same
> thing as acquiring a lock in the current code.  When you acquire a lock
> now you get the lock for a given amount of time, or until you manually
> release it.

I didn't say that changing the name would mean different function, but
we must be consistent in our naming. JINI doesn't call it a
LockManager, it calls it a LeaseManager. If anything, I know you are
always a proponent of naming things what they do/mean.

>
> The current code doesn't have any way to renew a lease (or lock), so
> that definitely sounds like a good addition if we plan to have tasks
> which may run over their initial lease time and want to extend the
> lease.  I didn't really envision that happening, but it sounds reasonable.
>

k. cool beans.

> And of course, unregistering a lease is the same thing as releasing a
> lock, right?  So all in all I don't think we are really changing much.
>
> But in any case, I have a few comments ...
>
> 1. I would prefer that these just remain in the ThreadManager where the
> existing methods are now.  I don't think we need a whole new Manager for
> this.

I'm fine with them staying in the ThreadManager.

>
> 2. I think we would still need an isLeased() method which is basically
> just a convenience for any task which is about to try and obtain a lease
> to see if it is already leased.

See my comment above.

>
> 3. I also think that we want to stick with having these methods take in
> RollerTask objects rather than individualized parameters.  The
> RollerTask class already forces subclasses to define a name and a lease
> time, so those are already included, and it should be easy enough to
> just add in an id attribute which would just need to be unique to each
> node of the cluster somehow.

I'm sort of ok with using a POJO instead of parameters. Read query
below for counter argument.

One thing I forgot was that we need to make RollerTask more abstract
since your implementation assumes a Task wants to lock/unlock on every
run. I believe we need a really abstract RollerTask and have two or
three kinds of tasks: register/unregister lease in single run,
register/renew lease on every run (master indexer scenario), no-op
(e.g. independent task).

>
> I think the big changes which you didn't really detail in this email is
> that we would be shifting more of the logic around dealing with time
> from app time to database time to make sure that all nodes are properly
> synced.  Then we would also want to ensure the leasing process was rigid
> enough to prevent any possible race conditions, which I believe at the
> moment is not the case.  Accomplishing those two things definitely sound
> like a good idea to me.

Let me take a quick stab at it.

0. We need a constraint on task name so we can't have duplicate rows.

registerLease()

1. try to update the table if the lease is expired or we already have this lease

update task_lease set name=name, id=id, duration=duration,
starttime=CURRENT_TIMESTAMP where name = name and starttime + duration
< CURRENT_TIMESTAMP or id = id

if it returns 1 row updated, we are done.
else if return 0 rows updated, we need to insert a new row, if
success, we obtained the lock, else we return false, didn't register
lease.

unregisterLease() is easy, just remove the row, if name and id match.

renewLease() is basically just the update query above. if returns 1
row updated success, else failure.

Maybe the registerLease can be changed to first test for existence of
row with name = name. Anyways, this is roughly what I'm thinking and
if you notice, we don't make too much use of the POJO, so passing a
POJO seems too complicated for just a few parameters in one of the
methods.

-Elias

>
> -- Allen
>
>
> >
> > If you notice, I'm introducing a new property (id) that will be used to
> > identify which node in the cluster has the lease in order that we can
> > allow for renewals if they desire so. Currently, we only have support
> > for task name and that's enough for what I'm trying to do. I'll explain
> > more in my next email and if I'm brief on this one is because I must
> > worry about dinner and putting the kids to bed.
> >
> > -Elias
>

Re: TaskLock implementation

Posted by Allen Gilliland <Al...@Sun.COM>.

Elias Torres wrote:
> Folks,
> 
> I have been doing some work on clustered search (more on future email)
> and I begun looking at the task lock recently implemented in 3.1. I
> believe (and have verified with Allen) that we might need to make some
> changes to both the API and implementation so it works properly on a
> cluster of size 2 or more. Currently I think we can have a couple of
> places where two nodes can obtain a lock at the same time.
> 
> I'm working on some pseudo-code you all can review to make sure we are
> safe from these issues above. I also want to refactor the interfaces to
> reflect more the use of leases as opposed to locks.
> 
> interface LeaseManager {
> 
>   registerLease(String name, String id, long leaseTime);
>   renewLease(String name, String id);
>   unregisterLease(String name, String id);
> 
> }

I don't mind any of these changes if it's going to improve the 
implementation, but now that I look at it I don't see how this is all 
that different from what we have now ...

acquireLock(RollerTask)
releaseLock(RollerTask)
isLocked(RollerTask)

I can see how just calling things a lease is possibly a better way to 
describe what's happening, but calling registerLease() is the exact same 
thing as acquiring a lock in the current code.  When you acquire a lock 
now you get the lock for a given amount of time, or until you manually 
release it.

The current code doesn't have any way to renew a lease (or lock), so 
that definitely sounds like a good addition if we plan to have tasks 
which may run over their initial lease time and want to extend the 
lease.  I didn't really envision that happening, but it sounds reasonable.

And of course, unregistering a lease is the same thing as releasing a 
lock, right?  So all in all I don't think we are really changing much.

But in any case, I have a few comments ...

1. I would prefer that these just remain in the ThreadManager where the 
existing methods are now.  I don't think we need a whole new Manager for 
this.

2. I think we would still need an isLeased() method which is basically 
just a convenience for any task which is about to try and obtain a lease 
to see if it is already leased.

3. I also think that we want to stick with having these methods take in 
RollerTask objects rather than individualized parameters.  The 
RollerTask class already forces subclasses to define a name and a lease 
time, so those are already included, and it should be easy enough to 
just add in an id attribute which would just need to be unique to each 
node of the cluster somehow.

I think the big changes which you didn't really detail in this email is 
that we would be shifting more of the logic around dealing with time 
from app time to database time to make sure that all nodes are properly 
synced.  Then we would also want to ensure the leasing process was rigid 
enough to prevent any possible race conditions, which I believe at the 
moment is not the case.  Accomplishing those two things definitely sound 
like a good idea to me.

-- Allen


> 
> If you notice, I'm introducing a new property (id) that will be used to
> identify which node in the cluster has the lease in order that we can
> allow for renewals if they desire so. Currently, we only have support
> for task name and that's enough for what I'm trying to do. I'll explain
> more in my next email and if I'm brief on this one is because I must
> worry about dinner and putting the kids to bed.
> 
> -Elias