You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@river.apache.org by Peter Firmstone <ji...@zeus.net.au> on 2014/01/25 05:25:42 UTC

River-344 - replacing TaskManager in SDM

  Notes:



  ServiceDiscoveryManager


    NotifyEventTask


If the task list contains any RegisterListenerTasks

or LookupTasks associated with this task's lookup service

(ProxyReg), and if those tasks were queued prior to this

task (have lower sequence numbers), then run those tasks

before this task (return true).


Additionally, if the task list contains any other

ServiceIdTasks associated with this task's service ID

which were queued prior to this task, then run those

tasks before this task.


If the criteria outlined above is not satisfied, then this

task can be run immediately (return false).



    LookupTask


If the task list contains any RegisterListenerTasks,

other LookupTasks, or NotifyEventTasks associated with

this task's lookup service (ProxyReg), if those tasks

were queued prior to this task (have lower sequence

numbers), then run those tasks before this task (return

true). Otherwise this task can be run immediately

(return false).



    ServiceIdTask


If there is at least one task in the given task list that is

associated with the same serviceID as this task, and that task

has a sequence number less than the sequence number of this task,

then run this task *after* the task in the list (return true);

otherwise run this task now (return false).




    ServiceDiscoveryManager, CacheTask classes

   957:  private final class RegisterListenerTask extends CacheTask {

1117:  private final class ProxyRegDropTask extends CacheTask {

1005:  private final class LookupTask extends CacheTask implements Task {

   647:  private static abstract class ServiceIdTask extends CacheTask implements Task {

1149:  private final class DiscardServiceTask extends CacheTask {

1169:  private final class NotifyEventTask extends ServiceIdTask {

1416:  private final class NewOldServiceTask extends ServiceIdTask {

1498:  private final class UnmapProxyTask extends ServiceIdTask {





  RegisterListenerTask extends CacheTask

/** This task class, when executed, first registers to receive

* ServiceEvents from the given ServiceRegistrar. If the registration

* process succeeds (no RemoteExceptions), it then executes the

* LookupTask to query the given ServiceRegistrar for a "snapshot"

* of its current state with respect to services that match the

* given template.

*

* Note that the order of execution of the two tasks is important.

* That is, the LookupTask must be executed only after registration

* for events has completed. This is because when an entity registers

* with the event mechanism of a ServiceRegistrar, the entity will

* only receive notification of events that occur "in the future",

* after the registration is made. The entity will not receive events

* about changes to the state of the ServiceRegistrar that may have

* occurred before or during the registration process.

*

* Thus, if the order of these tasks were reversed and the LookupTask

* were to be executed prior to the RegisterListenerTask, then the

* possibility exists for the occurrence of a change in the

* ServiceRegistrar's state between the time the LookupTask retrieves

* a snapshot of that state, and the time the event registration

* process has completed, resulting in an incorrect view of the

* current state of the ServiceRegistrar.

*/



  ProxyRegDropTask extends CacheTask

/** When the given registrar is discarded, this Task class is used to

* remove the registrar from the various maps maintained by this

* cache.

*/

/* For each itemReg in the serviceIdMap, disassociate the

* lookup service referenced here from the itemReg; and

* if the itemReg then has no more lookup services associated

* with it, remove the itemReg from the map and send a

* service removed event.

*/


  LookupTask extends CacheTask

/** This class requests a "snapshot" of the given registrar's state.*/



  ServiceIdTask extends CacheTask

/** Abstract base class for controlling the order-of-execution of tasks

* corresponding to a particular serviceID associated with a particular

* lookup service.

*/


  DiscardServiceTask extends CacheTask

/** Task class used to asynchronously notify service discard. */


  NotifyEventTask extends ServiceIdTask

/** Task class used to asynchronously notify all registered service

* discovery listeners of serviceAdded/serviceRemoved/serviceChanged

* events.

*/


  NewOldServiceTask extends ServiceIdTask

/** Task class used to asynchronously process the service state

* ("snapshot"), matching this cache's template, that was retrieved

* from the given lookup service.

*

* After retrieving the snapshot S, the LookupTask queues an instance

* of this task for each service referenced in S. This task determines

* if the given service is an already-discovered service (is currently

* in this cache's serviceIdMap), or is a new service. This task

* handles the service differently, depending on whether the service

* is a new or old.

*

* a. if the item is old, then this task will:

* - compare the given item from the snapshot to the UN-filtered

* item in given itemReg

* if(same version but attributes have changed)

* send changed event

* else if( version has changed )

* send removed event followed by added event

* else

* do nothing

* - apply the filter to the given item

* if(filter fails)

* send removed event

* else if(filter passes)

* set the filtered item in the itemReg in the map

* else if (filter is indefinite)

* discard item

* send removed event

* queue another filter attempt for later

* b. if the given item is newly discovered, then this task will:

* - create a new ServiceItemReg containing the given item

* - place the new itemReg in the serviceIdMap

* - apply the filter to the given item

* if(filter fails)

* remove the item from the map but

* send NO removed event

* else if(filter passes)

* send added event for the FILTERED item

* else if (filter is indefinite)

* discard item

* queue another filter attempt for later but

* send NO removed event

*/


  UnmapProxyTask extends ServiceIdTask

/** Task class used to asynchronously disassociate the given lookup

* service proxy from the given ServiceItemReg. This task is created

* and queued in both the LookupTask, and the ProxyRegDropTask.

*

* When the LookupTask determines that the service referenced by the

* given ServiceItemReg is an "orphan", the LookupTask queues an

* instance of this task. A service is an orphan if it is referenced

* in the serviceIdMap, but is no longer registered in any of the

* lookup service(s) to which it is mapped in the serviceIdMap.

* Note that the existence of orphans is possible when events from

* a particular lookup service are missed; that is, there is a "gap"

* in the event sequence numbers.

*

* When a previously discovered lookup service is discarded, the

* ProxyRegDropTask is initiated, and that task creates and queues

* an instance of this task for each mapping in this cache's

* serviceIdMap.

*

* This task removes the given lookup service proxy from the set

* associated with the service item referenced in the given

* ServiceItemReg, and determines whether that service is still

* associated with at least one lookup service. If the service is

* no longer associated with any other lookup service in the managed

* set of lookup services, the mapping that references the given

* ServiceItemReg is removed from the serviceIdMap, and a

* serviceRemoved event is sent.

*

* In this way, other tasks from this cache operating on the same

* service will not concurrently modify any state related to that

* service.

*/


  Comments from qa test suite


/**

* This test attempts to simulate the following race condition that

* can occur between an instance of LookupTask and an instance of

* ProxyRegDropTask:

*

* - 1 LUS {L0}

* - 1 service s0 registered in L0

* - 1 cache C0 with template matching s0

*

* Upon creation of the cache, LookupTask is initiated for L0. The test

* waits a few seconds after the cache is created because if L0 is discarded

* too quickly, L0 will be removed from the proxyRegSet before LookupTask

* has a chance to begin any processing. After the wait period is up,

* L0 is discarded; which initiates the ProxyRegDropTask. The race occurs

* as follows:

* LookupTask ProxyRegDropTask

* ---------------------------- ----------------------------------------

* o determine s0 is "new"

* o sleep for n seconds o remove L0 from proxyRegSet

* o serviceIdMap is empty, do nothing else

* o add new s0 to serviceIdMap

*

* The result is that serviceIdMap should NOT be empty when ProxyRegDropTask

* encounters it. But if LookupTask is too slow in adding s0 to the map,

* ProxyRegDropTask will have nothing to remove, and so will return without

* modifying serviceIdMap. But when LookupTask returns, s0 will be contained

* in serviceIdMap; even though it shouldn't.

*

* In order to insert the time delay, the SDM must be modified. Also, in

* order to observe the race, println's must be inserted in the SDM to

* display whether the serviceIdMap is empty/non-empty when it is supposed

* to be empty/non-empty. That is, the pass/fail status of this test cannot

* be determined by the test itself; it must be observed by the test

* engineer. Thus, this test will always return a pass status.

*

* This test is not part of the regular suite. It must be run manually, with

* a temporarily-modified SDM.

*

* Related bug ids: 4675746

* 4707125

*/



/**

* This test attempts to simulate the following race condition that

* can occur between different instances of LookupTask:

*

* - 2 LUS's {L0,L1}

* - 1 service s0 registered in L0 and L1

* - 1 cache C0 with template matching s0

*

* Upon creation of the cache, 2 LookupTasks are queued; one for L0 and

* one for L1. Both tasks modify the serviceIdMap. Suppose the first task

* determines that the service is "new", but before it can update the

* serviceIdMap, the context switches to the other task. Because the first

* has not yet updated the serviceIdMap, the second task should then also

* view the service as new. Thus, because both tasks view the service

* as new, each task sends a serviceAdded event; when only one is supposed

* to be sent.

*

* The race occurs as follows:

*

* o L0 created

* o L1 created

* o s0 registered in L0 and L1

* o C0 created with listener and template for s0

* o new C0 initiates a RegisterListener

* o RegisterListenerTask registers for events from {L0,L1}

* o RegisterLisenerTask initiates a LookupTask for {L0,L1}

*

* LookupTask-L0 LookupTask-L1

* ----------------------------- -----------------------------

* o task0 determine s0 is "new"

* o sleep for n seconds

* o task1 determine s0 is "new"

* o add new s0 to serviceIdMap

* o send serviceAdded event

* o sleep expires

* o add new s0 to serviceIdMap

* o send serviceAdded event

*

* Although this test can be run against an unmodified SDM, the situation

* described above does not occur consistently unless the SDM is modified

* to insert a time delay in the appropriate place.

*

* This class verifies that bug 4675746 has been fixed. As stated in the

* bug description: "The operations performed in addService() appear NOT

* to be performed atomically; which may result in a race condition."

*

* This bug was reported by a user who claimed that his application was

* receiving unexpected duplicate serviceAdded() events for a single service

* when the service registers with multiple lookup services; indicating a

* possible race condition in the addService() mechanism.

*

* This test attempts to simulate the user's described environment to

* duplicate the bug prior to a fix being implemented, and to verify that

* the bug has indeed been fixed after the intended fix has been implemented

* in the ServiceDiscoveryManager.

*

* Related bug ids: 4675746

* 4707125

*/


/**

* This test attempts to simulate the following race condition that

* can occur between an instance of UnmapProxyTask (created and queued

* in LookupTask) and instances of NewOldServiceTask that are created

* and queued by NotifyEventTask:

*

* - 1 LUS {L0}

* - N (~250) services {s0, s1, ..., sN-1}, to be registered in L0

* - M (~24) SDMs, each with 1 cache with template matching all si's

* {SDM_0/C0, SDM_1/C1, ... SDM_M-1/CM-1}

*

* Through the shear number of service registrations, caches, and events,

* this test attempts to produce the conditions that cause the regular

* occurrence of the race between an instance of UnmapProxyTask and

* instances of NewOldServiceTask produced by NotifyEventTask when a

* service event is received from L0.

*

* This test starts lookup L0 during construct. Then, when the test begins

* running, half the services are registered with L0, followed by the

* creation of half the SDMs and corresponding caches; which causes the

* tasks being tested to be queued, and event generation to ultimately

* begin. After registering the first half of the services and creating

* the first half of the SDMs, the remaining services are registered and

* the remaining SDMs and caches are created. As events are generated,

* the number of serviceAdded and serviceRemoved events are tallied.

*

* When an SDM_i/cach_i pair is created, an instance of RegisterListenerTask

* is queued and executed. RegisterListenerTask registers a remote event

* listener with L0's event mechanism. When the services are registered with

* L0, that listener receives service events; which causes NotifyEventTask

* to be queued and executed. After RegisterListerTask registers for events

* with L0, but before RegisterListenerTask exits, an instance of LookupTask

* is queued and executed. LookupTask retrieves from L0 a "snapshot" of its

* state. Thus, while events begin to arrive informing each cache of the

* services that are registering with L0, LookupTask is querying L0 for

* its current state.

*

* Upon receipt of a service event, NotifyEventTask queues a 
NewOldServiceTask

* to determine if the service corresponding to the event represents a new

* service that has been added, a change to a previously-registered service,

* or the removal of a service from L0. If the event corresponds to a newly

* registered service, the service is added to the cache's serviceIdMap and

* a serviceAdded event is sent to any listeners registered with the cache.

* That is,

*

* Service event received

*

* NotifyEventTask {

* if (service removed) {

* remove service from serviceIdMap

* send serviceRemoved

* } else {

* NewOldServiceTask

* if (service changed) {

* send serviceChanged

* } else if (service is new) {

* add service to serviceIdMap

* send serviceAdded

* }

* }

* }

*

* While events are being received and processed by NotifyEventTask and

* NewOldServiceTask, LookupTask is asynchronously requesting a snapshot

* of L0's state and attempting to process that snapshot to populate

* the same serviceIdMap that is being populated by instances of

* NewOldServiceTask that are initiated by NotifyEventTask. LookupTask

* first examines serviceIdMap, looking for services that are NOT in the

* snapshot; that is, services that are not currently registered with L0.

* Such a service is referred to as an, "orphan". For each orphan service

* that LookupTask finds, an instance of UnmapProxyTask is queued. That task

* removes the service from the serviceIdMap and sends a serviceRemoved

* event to any listeners registered with the cache. After processing

* any orphans that it finds, LookupTask then queues an instance of

* NewOldServiceTask for each service in the snapshot previously retrieved.

* That is,

*

* LookupTask - retrieve snapshot {

*

* for each service in serviceIdMap {

* if (service is not in snapshot) { //orphan

* UnmapProxyTask {

* remove service from serviceIdMap

* send serviceRemoved

* }

* }

* }

* for each service in snapshot {

* NewOldServiceTask

* if (service changed) {

* send serviceChanged

* } else if (service is new) {

* add service to serviceIdMap

* send serviceAdded

* }

* }

* }

*

* The race can occur because the NewOldServiceTasks that are queued by the

* NotifyEventTasks can add services to the serviceIdMap between the time

* LookupTask retrieves the snapshot and the time it analyzes the 
serviceIdMap

* for orphans. That is,

*

* o SDM_i/cache_i created

* RegisterListenerTask

* --------------------

* register for events

* LookupTask

* ----------

* retrieve snapshot {s0,s1,s2}

* o s3 registered with L0

* o L0 sends NO_MATCH_MATCH

* NotifyEventTask

* ---------------

* NewOldServiceTask

* -----------------

* add s3 to serviceIdMap

* send serviceAdded event

* ORPHAN: s3 in serviceIdMap, not snapshot

* UnmapProxyTask

* --------------

* remove s3 from serviceIdMap

* send serviceRemoved event

*

* This test returns a pass when no race is detected between UnmapProxyTask

* and any NewOldServiceTask initiated by a NotifyEventTask. This is

* determined by examining the serviceAdded and serviceRemoved event

* tallies collected during test execution. If, for each SDM/cache

* combination, the number of serviceAdded events received equals the

* number of services registered with L0, and no serviceRemoved events

* are received, then there is no race, and the test passes; otherwise,

* the test fails (in particular, if at least one serviceRemoved event

* is sent by at least one SDM/cache).

*

* No special modifications to the SDM are required to cause the race

* condition to occur consistently. When running this test individually

* on Solaris, out of "the vob", under a JERI or JRMP configuration, and

* with 24 SDMs/caches and 250 services, the race condition was consistently

* observed (until a fix was integrated). Thus, it appears that the greater

* the number of SDMs/caches/services, the greater the probability the

* conditions for the race will be encountered.

*

* Related bug ids: 6291851

*/



/**

* This test attempts to simulate the following race condition that

* can occur between an instance of LookupTask and an instance of

* ProxyRegDropTask:

*

* - 1 LUS {L0}

* - 1 services {s0}, to be registered in L0

* - 1 cache C0 with template matching s0

*

* This test attempts to simulate the race that appears to be possible

* between the NotifyEventTask and the ProxyRegDropTask. This test

* starts lookup L0 and creates cache C0. It then registers s0 with L0

* to generate a NOMATCH_MATCH event and ultimately initiate an instance

* of NotifyEventTask. Suppose that before NotifyEventTask can modify the

* serviceIdMap, L0 is discarded so that the ProxyRegDropTask will be

* initiated. Without the proposed fix implemented in the SDM, it's then

* possible that L0 will be discarded before NotifyEventTask inserts the

* mapping { [s0,L0] } in serviceIdMap, which may result (if the timing is

* right) in NotifyEventTask placing { [s0,L0] } in serviceIdMap after L0

* has been discarded; which means tha contents of serviceIdMap will be

* inconsistent, and the serviceRemoved event that should occur because

* of the discard, will never actually occur.

*

* The race occurs as follows:

*

* o L0 created

* o C0 created

* o s0 registered with L0

* o L0 sends NO_MATCH_MATCH

*

* NotifyEventTask ProxyRegDropTask

* ----------------------------- ----------------------------------------

* o task0 determine s0 is "new"

* o sleep for n seconds

* o L0 is discarded

* o remove L0 from proxyRegSet

* o serviceIdMap is empty, do nothing else

* o thinking map should be empty, return

* o add new s0 to serviceIdMap

* o map NOT empty now but should be

*

* The result is that serviceIdMap should be empty and L0 should not be

* in proxyRegSet. But if NotifyEventTask is too slow in processing the

* new s0, ProxyRegDropTask will have nothing to process and so the

* serviceIdMap will not be empty, and the serviceRemoved event that should

* have been sent because the [s0,L0] pair was removed from the serviceIdMap

* is never sent.

*

* Although this test can be run against an unmodified SDM, the situation

* described above does not occur consistently unless the SDM is modified

* to insert a time delay in the appropriate place.

*

* Related bug ids: 4675746

* 4707125

*/