You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@river.apache.org by "Hudson (JIRA)" <ji...@apache.org> on 2011/06/19 03:20:48 UTC

[jira] [Commented] (RIVER-142) concurrency problem in DGC lease expiration handling

    [ https://issues.apache.org/jira/browse/RIVER-142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051625#comment-13051625 ] 

Hudson commented on RIVER-142:
------------------------------

Integrated in River-trunk #493 (See [https://builds.apache.org/job/River-trunk/493/])
    River-142  Slightly different to the original patch, this commit fixes delayed garbage collection synchronization issues by processing expired leases immediately, without locking the entire object table.  Lease has been changed to be responsible for expiry,  notification and processing (on the garbage collection thread),  synchronized internally.  A Lease in the object table must now be replaced once it expires and cannot be renewed, it is removed from the table after it is marked expired, to prevent garbage collection of potentially active leases.  Internal classes have been separated from ObjectTable and BasicExportTable to encapsulate or simplify synchronization and locking.  Target is now more faithful to Exporter.unexport's documented behaviour and interrupts dispatched method calls when force is true when possible.

I wasn't able to create a test to simulate the original failure condition, to do so requires a large number of leases to be processed (to create a time window to process garbage collection of leases after releasing the table lock) and proper timing of dirty calls, garbage collection and clean calls.  The new code processes the lease immediately and isn't subject to the time window.


> concurrency problem in DGC lease expiration handling
> ----------------------------------------------------
>
>                 Key: RIVER-142
>                 URL: https://issues.apache.org/jira/browse/RIVER-142
>             Project: River
>          Issue Type: Bug
>          Components: net_jini_jeri
>    Affects Versions: jtsk_2.0
>            Reporter: Peter Jones
>            Assignee: Peter Firmstone
>            Priority: Minor
>         Attachments: River-142.patch
>
>
> Bugtraq ID [4848840|http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4848840]
> In the server-side DGC implementation's thread that check's for lease expirations ({{com.sun.jini.jeri.internal.runtime.ObjectTable.LeaseChecker.run}}), it checks for them while synchronized on the overall lease table, but it delays notifying the expired leases' individual registered {{Targets}} about the expirations until after it has released the lease table lock.  This approach was taken from the JRMP implementation, which is that way because of the fix for 4118056 (a previous deadlock bug-- but now, I'm thinking that the JRMP implementation has this bug too).
> The problem seems to be that after releasing the lease table lock, it is possible for another lease renewal/request to come in (from the same DGC client and for the same remote object) that would then be invalidated by the subsequent {{Target}} notification made by the lease expiration check thread-- and thus the client's lease renewal (for that remote object) will be forgotten.  It would appear that the synchronization approach here needs to be reconsidered.
> h4. ( Comments note: )
> In addition to the basic problem of the expired-then-renewed client being removed from the referenced set, there is also the problem of the sequence table entry being forgotten-- which prevents detection of a "late clean call".
> Normally, late clean calls are not a problem because sequence numbers are retained while the client is in the referenced set (and there is no such thing as a "strong dirty").  But in this case, with the following order of events on the server side:
> # dirty, seqNo=2
> # (lease expiration)
> # clean, seqNo=1
> The primary bug here is that the first two events will leave the client missing from the referenced set.  But the secondary bug is that even if that's fixed, with the sequence number forgotten, the third event (the "late clean call") will still cause the client to be removed from the referenced set.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Re: [jira] [Commented] (RIVER-142) concurrency problem in DGC lease expiration handling

Posted by Peter Firmstone <ji...@zeus.net.au>.
Sorry, dumb question, they're already running...

Peter Firmstone wrote:
> How can I get Hudson to run the qa tests?
>
> They all pass on Solaris sparc, I'd like to see Linux x86 and Windows 
> test results if possible.
>
> I'd like to mark this issue resolved if possible.
>
> Some jtreg tests are failing on my machine due to expired keys, but 
> this is unrelated, same tests failed prior to changes.
>
> Cheers,
>
> Peter.
>
> Hudson (JIRA) wrote:
>>     [ 
>> https://issues.apache.org/jira/browse/RIVER-142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051625#comment-13051625 
>> ]
>> Hudson commented on RIVER-142:
>> ------------------------------
>>
>> Integrated in River-trunk #493 (See 
>> [https://builds.apache.org/job/River-trunk/493/])
>>     River-142  Slightly different to the original patch, this commit 
>> fixes delayed garbage collection synchronization issues by processing 
>> expired leases immediately, without locking the entire object table.  
>> Lease has been changed to be responsible for expiry,  notification 
>> and processing (on the garbage collection thread),  synchronized 
>> internally.  A Lease in the object table must now be replaced once it 
>> expires and cannot be renewed, it is removed from the table after it 
>> is marked expired, to prevent garbage collection of potentially 
>> active leases.  Internal classes have been separated from ObjectTable 
>> and BasicExportTable to encapsulate or simplify synchronization and 
>> locking.  Target is now more faithful to Exporter.unexport's 
>> documented behaviour and interrupts dispatched method calls when 
>> force is true when possible.
>>
>> I wasn't able to create a test to simulate the original failure 
>> condition, to do so requires a large number of leases to be processed 
>> (to create a time window to process garbage collection of leases 
>> after releasing the table lock) and proper timing of dirty calls, 
>> garbage collection and clean calls.  The new code processes the lease 
>> immediately and isn't subject to the time window.
>>
>>
>>  
>>> concurrency problem in DGC lease expiration handling
>>> ----------------------------------------------------
>>>
>>>                 Key: RIVER-142
>>>                 URL: https://issues.apache.org/jira/browse/RIVER-142
>>>             Project: River
>>>          Issue Type: Bug
>>>          Components: net_jini_jeri
>>>    Affects Versions: jtsk_2.0
>>>            Reporter: Peter Jones
>>>            Assignee: Peter Firmstone
>>>            Priority: Minor
>>>         Attachments: River-142.patch
>>>
>>>
>>> Bugtraq ID 
>>> [4848840|http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4848840]
>>> In the server-side DGC implementation's thread that check's for 
>>> lease expirations 
>>> ({{com.sun.jini.jeri.internal.runtime.ObjectTable.LeaseChecker.run}}), 
>>> it checks for them while synchronized on the overall lease table, 
>>> but it delays notifying the expired leases' individual registered 
>>> {{Targets}} about the expirations until after it has released the 
>>> lease table lock.  This approach was taken from the JRMP 
>>> implementation, which is that way because of the fix for 4118056 (a 
>>> previous deadlock bug-- but now, I'm thinking that the JRMP 
>>> implementation has this bug too).
>>> The problem seems to be that after releasing the lease table lock, 
>>> it is possible for another lease renewal/request to come in (from 
>>> the same DGC client and for the same remote object) that would then 
>>> be invalidated by the subsequent {{Target}} notification made by the 
>>> lease expiration check thread-- and thus the client's lease renewal 
>>> (for that remote object) will be forgotten.  It would appear that 
>>> the synchronization approach here needs to be reconsidered.
>>> h4. ( Comments note: )
>>> In addition to the basic problem of the expired-then-renewed client 
>>> being removed from the referenced set, there is also the problem of 
>>> the sequence table entry being forgotten-- which prevents detection 
>>> of a "late clean call".
>>> Normally, late clean calls are not a problem because sequence 
>>> numbers are retained while the client is in the referenced set (and 
>>> there is no such thing as a "strong dirty").  But in this case, with 
>>> the following order of events on the server side:
>>> # dirty, seqNo=2
>>> # (lease expiration)
>>> # clean, seqNo=1
>>> The primary bug here is that the first two events will leave the 
>>> client missing from the referenced set.  But the secondary bug is 
>>> that even if that's fixed, with the sequence number forgotten, the 
>>> third event (the "late clean call") will still cause the client to 
>>> be removed from the referenced set.
>>>     
>>
>> -- 
>> This message is automatically generated by JIRA.
>> For more information on JIRA, see: 
>> http://www.atlassian.com/software/jira
>>
>>        
>>   
>
>


Re: [jira] [Commented] (RIVER-142) concurrency problem in DGC lease expiration handling

Posted by Peter Firmstone <ji...@zeus.net.au>.
How can I get Hudson to run the qa tests?

They all pass on Solaris sparc, I'd like to see Linux x86 and Windows 
test results if possible.

I'd like to mark this issue resolved if possible.

Some jtreg tests are failing on my machine due to expired keys, but this 
is unrelated, same tests failed prior to changes.

Cheers,

Peter.

Hudson (JIRA) wrote:
>     [ https://issues.apache.org/jira/browse/RIVER-142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051625#comment-13051625 ] 
>
> Hudson commented on RIVER-142:
> ------------------------------
>
> Integrated in River-trunk #493 (See [https://builds.apache.org/job/River-trunk/493/])
>     River-142  Slightly different to the original patch, this commit fixes delayed garbage collection synchronization issues by processing expired leases immediately, without locking the entire object table.  Lease has been changed to be responsible for expiry,  notification and processing (on the garbage collection thread),  synchronized internally.  A Lease in the object table must now be replaced once it expires and cannot be renewed, it is removed from the table after it is marked expired, to prevent garbage collection of potentially active leases.  Internal classes have been separated from ObjectTable and BasicExportTable to encapsulate or simplify synchronization and locking.  Target is now more faithful to Exporter.unexport's documented behaviour and interrupts dispatched method calls when force is true when possible.
>
> I wasn't able to create a test to simulate the original failure condition, to do so requires a large number of leases to be processed (to create a time window to process garbage collection of leases after releasing the table lock) and proper timing of dirty calls, garbage collection and clean calls.  The new code processes the lease immediately and isn't subject to the time window.
>
>
>   
>> concurrency problem in DGC lease expiration handling
>> ----------------------------------------------------
>>
>>                 Key: RIVER-142
>>                 URL: https://issues.apache.org/jira/browse/RIVER-142
>>             Project: River
>>          Issue Type: Bug
>>          Components: net_jini_jeri
>>    Affects Versions: jtsk_2.0
>>            Reporter: Peter Jones
>>            Assignee: Peter Firmstone
>>            Priority: Minor
>>         Attachments: River-142.patch
>>
>>
>> Bugtraq ID [4848840|http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4848840]
>> In the server-side DGC implementation's thread that check's for lease expirations ({{com.sun.jini.jeri.internal.runtime.ObjectTable.LeaseChecker.run}}), it checks for them while synchronized on the overall lease table, but it delays notifying the expired leases' individual registered {{Targets}} about the expirations until after it has released the lease table lock.  This approach was taken from the JRMP implementation, which is that way because of the fix for 4118056 (a previous deadlock bug-- but now, I'm thinking that the JRMP implementation has this bug too).
>> The problem seems to be that after releasing the lease table lock, it is possible for another lease renewal/request to come in (from the same DGC client and for the same remote object) that would then be invalidated by the subsequent {{Target}} notification made by the lease expiration check thread-- and thus the client's lease renewal (for that remote object) will be forgotten.  It would appear that the synchronization approach here needs to be reconsidered.
>> h4. ( Comments note: )
>> In addition to the basic problem of the expired-then-renewed client being removed from the referenced set, there is also the problem of the sequence table entry being forgotten-- which prevents detection of a "late clean call".
>> Normally, late clean calls are not a problem because sequence numbers are retained while the client is in the referenced set (and there is no such thing as a "strong dirty").  But in this case, with the following order of events on the server side:
>> # dirty, seqNo=2
>> # (lease expiration)
>> # clean, seqNo=1
>> The primary bug here is that the first two events will leave the client missing from the referenced set.  But the secondary bug is that even if that's fixed, with the sequence number forgotten, the third event (the "late clean call") will still cause the client to be removed from the referenced set.
>>     
>
> --
> This message is automatically generated by JIRA.
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>         
>
>