You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tuscany.apache.org by Douglas Leite <do...@gmail.com> on 2009/08/15 15:57:15 UTC

[GSoC] More Details about the Guardian Model Implementation

I will try to give an explanation of how the model is working. I will
explain based on the examples I have developed to test the model.

*# Overview*

First of all, what kind of application is the guardian model applicable to?
The model is applicable to solve the problem of concurrent exceptions
occurrence in concurrent applications. So, we have two or more participants
executing at the same time, and exchanging messages in a cooperative action.

The test scenario is the primary-backup with N backups. In this scenario we
have a server-client application, with N participants on the server side.
The first participant to join in the server side becomes the primary server,
and the subsequent ones are the backups. The primary gets a request from a
client, and sends a reply to the client and a copy of its state to the
backups. When the primary fails, the first backup on the queue becomes the
new server. On the other hand, when a backup fails, the primary simply stops
to send updates to it.

*# SCDL file for primary-backup with N backups scenario*

Since all participants need to know each other, we define:

<composite>

    <component name="Participant1">
        <implementation.java
class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.NodeImpl"/>
        <reference name="nodes" target="Participant2 Participant3
Participant4"/>
    </component>

    <component name="Participant2">
        <implementation.java
class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.NodeImpl"/>
        <reference name="nodes" target="Participant1 Participant3
Participant4"/>
    </component>

    <component name="Participant3">
        <implementation.java
class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.NodeImpl"/>
        <reference name="nodes" target="Participant1 Participant2
Participant4"/>
    </component>

    <component name="Participant4">
        <implementation.java
class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.NodeImpl"/>
        <reference name="nodes" target="Participant1 Participant2
Participant3"/>
    </component>

...

</composite>

Each participant is an instance of the NodeImpl class ([1]) that contains
three main methods: *execute*, *sendUpdate*, and *applyUpdate*. The first
one is used to start the participant's execution thread. This method is
annotated with @OneWay annotation, which marks the execution to be
asynchronous. The second  method, is used by the server to send updates to
the backups. Finally, the *applyUpdate *is used by the backups to apply the
updates received from the server.

All the communication referent to the exceptional behavior between the
participants is done by the guardian, which was implemented as a component.
So, we need to define the guardian in the SCDL file:

<composite>

    <component name="Participant1">...</component>

    <component name="Participant2">...</component>

    <component name="Participant3">...</component>

    <component name="Participant4">...</component>

    <component name="GuardianGroup">
        <implementation.java
class="org.apache.tuscany.sca.guardian.GuardianGroupImpl"/>
        <property
name="recovery_rules">src/main/resources/recoveryrules_nbackpus_concurrent.xml</property>
        <property
name="resolution_tree">src/main/resources/resolutionTree.xml</property>
    </component>

<composite>

The guardian is an instance of the
org.apache.tuscany.sca.guardian.GuardianGroupImpl class, and provides the
org.apache.tuscany.sca.guardian.GuardianPrimitives as the main interface for
communication.

The GuardianPrimitives contains the following methods:

   1.     public void enableContext(Context context);
   2.     public void removeContext();
   3.     public void gthrow(GlobalExceptionInterface ex, List<String>
   participantList);
   4.     public boolean propagate(GlobalExceptionInterface ex);
   5.     public void checkExceptionStatus() throws GlobalException;

The methods 1 and 2 are designed to add and remove a context, respectively.
The method 3 is used every time a participant want to signal an external
exception, in other words, an exception that needs to be treated
cooperatively by a set of participant.
The method 4 is used to check if a specific exception needs to be propagated
to another context or not.
The method 5 is used to check if there are exceptions to be treated.

These methods are the channel the participants use to communicate with each
other, when they need to treat an exception cooperatively.

However, the participants do not communicate with the guardian directly.
Instead, they communicate with a guardian member, which is a mediator
between the participants and the guardian. Each participant is associated
with a guardian member. So the communication is established like this:
participant -> guardian member -> guardian, and guardian -> guardian member
-> participant.

The guardian member was implemented as a component too:

<composite>

...

    <component name="GuardianMember1">
        <implementation.java
class="org.apache.tuscany.sca.guardian.GuardianMemberImpl"/>
        <reference name="guardian_group" target="GuardianGroup"/>
    </component>

    <component name="GuardianMember2">
        <implementation.java
class="org.apache.tuscany.sca.guardian.GuardianMemberImpl"/>
        <reference name="guardian_group" target="GuardianGroup"/>
    </component>

    <component name="GuardianMember3">
        <implementation.java
class="org.apache.tuscany.sca.guardian.GuardianMemberImpl"/>
        <reference name="guardian_group" target="GuardianGroup"/>
    </component>

    <component name="GuardianMember4">
        <implementation.java
class="org.apache.tuscany.sca.guardian.GuardianMemberImpl"/>
        <reference name="guardian_group" target="GuardianGroup"/>
    </component>

    <component name="GuardianGroup">...</component>

</composite>

The org.apache.tuscany.sca.guardian.GuardianMemberImpl defines the guardian
member. Each guardian member has a reference to the guardian group, as well
as, each participant has a reference to its respective guardian member.

The full SCDL file can be found at [2].

The GuardianMemberImpl implements the GuardianPrimitives, so the
participants communicate with each other using the methods present in that
interface through their respective guardian members.
*
#Using the model
*
Hitherto, we have talked about three concepts of the guardian  model: the
guardian group, the guardian members, and the guardian primitives. Another
important concept is the contexts. A context defines a place in the
participant, to signal and treat external exceptions. A context has two
important attributes: a name, and the list of exception that can be treated
in that context. The class org.apache.tuscany.sca.guardian.Context defines
an instance for a context.

The primary-backup scenario has three contexts: MAIN, PRIMARY, and BACKUP,
where the PRIMARY and BACKUP are nested contexts to the MAIN context. A
context can be activate using the *enableContext *method from the guardian
member. The *disableContext *has the contrary effect. One time a context is
activated, it keeps on this state until the invocation of *disableContext *or
the activation of a nested context.

The general structure of the NodeImpl class is shown below:

   1.     @OneWay
   2.     public void execute() {
   3.         gm.enableContext(mainContext);
   4.         while (true) {
   5.             try {
   6.                 gm.checkExceptionStatus();
   7.                 if (role == PRIMARY) {
   8.                     //Config as primary then...
   9.                     primaryService();
   10.                 } else {
   11.                     //Config as backup then...
   12.                     backupService();
   13.                 }
   14.             } catch (PrimaryExistsException ex) {...}
   15.                catch (PrimaryFailedException ex) {...}
   16.                catch (BackupFailedException ex) {...}
   17.         }
   18.     }
   19.     private void primaryService() {
   20.         while (true) {
   21.             gm.enableContext(primaryContext);
   22.             try {
   23.                 gm.checkExceptionStatus();
   24.                 //Process the request then...
   25.                 ...
   26.                 if (backupAvailable) {
   27.                         //send updates to the backups
   28.                         ...
   29.                 }
   30.                 //send the reply to the client
   31.                 ...
   32.             } catch (PrimaryServiceFailureException ex) {...}
   33.                catch (BackupFailedException ex) {...}
   34.                catch (BackupJoinedException ex) {...}
   35.                finally {
   36.                 gm.removeContext();
   37.             }
   38.         }
   39.     }
   40.     private void backupService() {
   41.         while (true) {
   42.             gm.enableContext(backupContext);
   43.             try {
   44.                 gm.checkExceptionStatus();
   45.                 applyUpdate();
   46.             } catch (ApplyUpdateFailureException ex) {...}
   47.                finally {
   48.                 gm.removeContext();
   49.             }
   50.         }
   51.     }

As can be noticed the MAIN context is activated in the rows 1-18; the
PRIMARY in the rows 19-39; and the BACKUP in the rows 40- 51. Each context
is associated to a method, and since the *primaryService()* and *
backupService()* are invoked inside the *execute()*, we have the PRIMARY and
BACKUP as nested contexts to the MAIN context. When the first participant
joins in the guardian group, it context list is defined as MAIN.PRIMARY. For
the subsequent participants, the context list is defined as MAIN.BACKUP.

The core of this general structure is:

//scope
{
//Activate a context
gm.enableContext(SomeContext);

try{
//Check for unhandled exceptions
gm.checkExceptionStatus();

//Application-specific code
. . .

}catch () {}
finally {
gm.removeContext();
}
}

After the activation of a context, it is necessary to check for unhandled
exceptions with the *checkExceptionalStatus()* guardian member method. This
method checks for external exceptions that was raised by other participants,
but that has an influence in the behavior of this participant. If there is
some exception to be handle, than the *checkExceptionalStatus()* raises the
exception; otherwise the method returns.

Every time a participant wants to signal an external exception, it uses the
*gthrow()* method from its respective guardian member. The messages
exchanged between the participants, guardian members, and guardian group
when the gthrow is invoked is depicted in the sequence diagram [3]. (See the
"Progress on the GSoC project: Supporting Concurrent Exception Handling at
Tuscany SCA" conversation thread for more details).

*# Recovery Rules XML File*

When a participant invokes *gthrow() *to signal an external exception to a
set of participants, the guardian group calls the recovery rules, defined by
the user, to find out which exception should be raised in each participant
present in the list, as well as, the proper target context (in other words,
the place where the exception will be raised and treated).

A piece of the recovery rules XML file for the discussed scenario is (see
the full file at [4]):

<recovery_rules>

    <!-- A new participant joins in the group -->
    <rule name="Rule1"
signaled_exception="org.apache.tuscany.sca.guardian.JoinException">

        <participant match="*.PRIMARY">
            <throw_exception
class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.BackupJoinedException"
target_context="PRIMARY"/>
        </participant>

        <participant match="SIGNALER">
            <throw_exception
class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.PrimaryExistsException"
target_context="MAIN" min_participant_joined="2"/>
        </participant>
    </rule>
    ...
</recovery_rules>

When a participant joins in the guardian group, the guardian raises a
JoinException indicating that a new participant has joined. The defined
recovery rule "Rule1", is applied when such exception is found. Then, the
guardian adds a BackupJoinedException, with target context equals to
PRIMARY, to all active participants that are in the "*.PRIMARY" context
(MAIN.PRIMARY fills this rule), and a PrimaryExistsException, with target
context equals to MAIN, in the participant that has raised the external
exception (in other words, the SIGNALER), if there are at least two
participants that have already joined in the guardian group.

"Rule 2" is applied when a participant raise a PrimaryFailedException. Such
exception means that an internal error has occurred in the participant that
has the PRIMARY context activate.

    <rule name="Rule2"
signaled_exception="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.PrimaryFailedException">

        <participant match="*.PRIMARY">
            <throw_exception
class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.PrimaryFailedException"
target_context="INIT_CONTEXT"/>
        </participant>

        <participant match="*.BACKUP">
            <throw_exception
class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.PrimaryFailedException"
target_context="MAIN">
                <affected_participants>FIRST</affected_participants>
            </throw_exception>
        </participant>
    </rule>

The guardian adds a PrimaryFailedException, with target context equals
INIT_CONTEXT, to the participant that is in the PRIMARY context. The
INIT_CONTEXT is the most outside context, and it comes before the other
contexts defined by the user. In this application, the INIT_CONTEXT is the
place where NodeImpl.execute() is invoked. For this application, raising an
exception in this context, means that the participant has failed.

For the first backup in the list of backups, a PrimaryFailedException is
added with the target context equals MAIN.

The "Rule 3" works like the "Rule 2", but it is applied for a
BackupFailedException:

    <!-- The Backup fails -->
    <rule name="Rule3"
signaled_exception="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.BackupFailedException">

        <participant match="*.PRIMARY">
            <throw_exception
class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.BackupFailedException"
target_context="PRIMARY"/>
        </participant>

        <participant match="SIGNALER">
            <throw_exception
class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.BackupFailedException"
target_context="INIT_CONTEXT"/>
        </participant>
    </rule>

*# Putting the pieces together...*

Summarizing, the application works like that:

   1. A participant 'A' joins in the guardian group with the MAIN context
   activate; a JoinException is signaled by the guardian; no exceptions are
   delivered to the participant; and the participant reaches the PRIMARY
   context.
   2. A new participant 'B' joins in the guardian group with the MAIN
   context activate; a JoinException is signaled by the guardian; the guardian
   executes the recovery rule "Rule1"; a BackupJoinedException, with target
   context equals PRIMARY, is delivered to the participant A; a
   PrimaryExistsException, with target context equals MAIN, is delivered to the
   participant B.
   3. When the participant A invokes *checkExceptionalStatus()* the
   BackupJoinedException is raised in it, and it starts to send updates to the
   backup.
   4. When the participant B invokes *checkExceptinalStatus()* the
   PrimaryExistsException is raised in it, and it becomes a backup.

After that, the primary send updates to the backups, and the backups apply
the updates received from the primary.

If an internal error occurs in the primary, we have:

   1. The participant 'A' fails, so a PrimaryFailedException is signaled to
   the guardian.
   2. The guardian executes the recovery rule "Rule2".
   3. The guardian adds a PrimaryFailedException, with target context equals
   INIT_CONTEXT, to the participant 'A'.
   4. The guardian adds a PrimaryFailedException, with target context equals
   MAIN, to the first backup in the backup list (in this case, the participant
   'B')
   5. When the participant 'A' invokes *checkExceptionalStatus()* the
   PrimaryFailedException is raised in it, and propagated until the init
   context, what causes the stop of this participant.
   6. When the participant 'B' invokes *checkExceptionalStatus() *the
   PrimaryFailedException is raised in it, and the participant becomes the
   primary.

If an internal error occurs in the backup, we have:

   1. The participant 'B' fails, so a BackupFailedException is signaled to
   the guardian.
   2. The guardian executes the recovery rule "Rule3".
   3. The guardian adds a BackupFailedException, with target context equals
   PRIMARY, to the participant 'A'.
   4. The guardian adds a BackupFailedException, with target context equals
   INIT_CONTEXT, to participant 'B'.
   5. When the participant 'A' invokes *checkExceptionalStatus()* , the
   BackupFailedException is raised in it, and it removes the participant 'B'
   from its backup list.
   6. When the participant 'B', invokes *checkExceptionalStatus()* the
   BackupFailedException is raised in it, and propagated until the init
   context, what causes the stop of this participant.

*# Concurrent Exceptions and the Resolution Tree*

Due to the fact that the gthrow executes asynchronously, concurrent
exceptions can occur.

When concurrent exceptions occur, the guardian searches, in a resolution
tree, for the lowest common ancestor between the concurrent exceptions, and
then apply the recovery rules for this resolved exception. If there isn´t a
lowest common ancestor, than the guardian apply the recovery rules for each
exception sequentially.

The resolution tree for the discussed scenario is:

<resolution_trees>
    <resolution_tree exception_level="1">
        <exception
class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.PrimaryBackupFailedTogetherException">
            <exception
class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.PrimaryFailedException"/>
            <exception
class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.BackupFailedException"/>
        </exception>
    </resolution_tree>
</resolution_trees>

In this way, when a primary and a backup fail together, the
PrimaryFailedException and BackupFailedException will be concurrent, and the
resolved exception will be the PrimaryFailedBackupTogetherException.

The recovery rule "Rule4" works when such kind of exception is signaled:

    <!-- The Primary and Backup fail together -->
    <rule  name="Rule4"
signaled_exception="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.PrimaryBackupFailedTogetherException">

        <participant match="*.PRIMARY">
            <throw_exception
class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.PrimaryFailedException"
target_context="INIT_CONTEXT"/>
        </participant>

         <!-- Backup signaler -->
        <participant match="*.BACKUP,SIGNALER">
            <throw_exception
class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.BackupFailedException"
target_context="INIT_CONTEXT"/>
        </participant>

        <!-- Excluding the backup signaler -->
        <participant match="*.BACKUP,!SIGNALER">
            <throw_exception
class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.PrimaryFailedException"
target_context="MAIN">
                <affected_participants>FIRST</affected_participants>
            </throw_exception>
        </participant>
    </rule>

The guardian adds a PrimaryFailedException, with target context
INIT_CONTEXT, to the participant that is in the PRIMARY context. Similarly,
the guardian adds a BackupFailedException, with target context INIT_CONTEXT,
to the participant that is in the BACKUP context, and has signaled the
external exception BackupFailedException. A PrimaryFailedException, with
target context MAIN, is added to the first backup in the backup list that
has not signaled any exception.

This action causes the end of execution of the participants that have
failed, and choose a new backup to become the new primary server.

*# Ideas to improve the model implementation*

Although the implementation is working, I think that some modifications
could be done in order to approximate more the model to the tuscany sca.

   1. As was suggested previously, I think that could be a good idea uses
   the recovery rules and the resolution tree as policies, instead of
   properties in the guardian component.
   2. Instead of using the org.apache.tuscany.sca.guardian.GuardianImpl as
   the class of a implementation.java component, maybe would be better define a
   new implementation type, like implementation.guardian, that has the
   org.apache.tuscany.sca.guardian.Guardian as its service interface, and
   allows recovery-rules and resolution-tree as policies.
   3. Since we know all participants need to have a guardian member
   associated with it, the guardian members could be created automatically by
   the runtime. In this way, the user only need to define a component of type
   implementation.guardian, and has a reference to it, and in the background,
   the runtime creates one guardian member to each participant, and do the
   proper bindings between the components.

That's all for now. Let me know what you think. If you need some more
explanation, ask me. :)

*# Links*

[1]
http://svn.apache.org/repos/asf/tuscany/sandbox/dougsleite/guardian-model/src/test/java/org/apache/tuscany/sca/guardian/itests/primaryBackup/common/NodeImpl.java
[2]
http://svn.apache.org/repos/asf/tuscany/sandbox/dougsleite/guardian-model/src/main/resources/primaryNbackups-concurrent.composite
[3]
http://svn.apache.org/repos/asf/tuscany/sandbox/dougsleite/guardian-model/sequenceDiagram-externalException.jpg
[4]
http://svn.apache.org/repos/asf/tuscany/sandbox/dougsleite/guardian-model/src/main/resources/recoveryrules_nbackpus_concurrent.xml

-- 
Douglas Siqueira Leite
Graduate student at University of Campinas (Unicamp), Brazil

Re: [GSoC] More Details about the Guardian Model Implementation

Posted by Douglas Leite <do...@gmail.com>.
Hi,

As suggested, I have implemented the Guardian Member as policies. In this
way, the SCDL file works like that:

<composite>

    <component name="Participant1">
        <implementation.java
class="org.apache.tuscany.sca.implementation.guardian.itests.primaryBackup.common.NodeImpl"/>
        <reference name="guardian" target="GuardianComponent"
requires="tuscany:guardianExceptionHandling"/>
        <reference name="nodes" target="Participant2"/>
    </component>

    <component name="Participant2">
        <implementation.java
class="org.apache.tuscany.sca.implementation.guardian.itests.primaryBackup.common.NodeImpl"/>
        <reference name="guardian" target="GuardianComponent"
requires="tuscany:guardianExceptionHandling"/>
        <reference name="nodes" target="Participant1"/>
    </component>

    <component name="GuardianComponent">
        <tuscany:implementation.guardian>

            <tuscany:guardianProperties

recovery_rules="src/main/resources/org/apache/tuscany/sca/implementation/guardian/itests/primaryBackup/simple/recoveryRules.xml"

resolution_trees="src/main/resources/org/apache/tuscany/sca/implementation/guardian/itests/primaryBackup/resolutionTrees.xml"/>

        </tuscany:implementation.guardian>

    </component>

</composite>

As can be noticed, the GuardianMemberComponents* were replaced by the
"tuscany:guardianExceptionHandling" policy. So, each participant has a
reference to an impl.guardian component, and that reference must be
associated with a "tuscany:guardianExceptionHandling policy. So, the policy
framework will create a guardian member for each participant, and will
intercept all the messages between the participants and the guardian group
component.

What I would like know is if there is a way to assure that every time I
define a reference to a impl.guardian component, that reference must be
associated with a "tuscany:guardianExceptionHandling".

Thanks,

On Mon, Sep 28, 2009 at 3:36 PM, Douglas Leite <do...@gmail.com> wrote:

> Hi,
>
> I have implemented a new implementation type called
> "implementation.guardian".
>
> Using the impl.guardian module, the SCDL file would be like that:
>
> <composite>
>
>     <component name="Participant1">
>         <implementation.java
> class="org.apache.tuscany.sca.implementation.guardian.itests.primaryBackup.common.NodeImpl"/>
>
>         <reference name="guardian_member" target="GuardianMember1"/>
>         <reference name="nodes" target="Participant2"/>
>     </component>
>
>     <component name="Participant2">
>         <implementation.java
> class="org.apache.tuscany.sca.implementation.guardian.itests.primaryBackup.common.NodeImpl"/>
>
>         <reference name="guardian_member" target="GuardianMember2"/>
>         <reference name="nodes" target="Participant1"/>
>     </component>
>
>     <component name="GuardianMember1">
>         <implementation.java
> class="org.apache.tuscany.sca.implementation.guardian.impl.GuardianMemberImpl"/>
>
>         <reference name="guardian_group" target="GuardianComponent"/>
>     </component>
>
>     <component name="GuardianMember2">
>         <implementation.java
> class="org.apache.tuscany.sca.implementation.guardian.impl.GuardianMemberImpl"/>
>
>         <reference name="guardian_group" target="GuardianComponent"/>
>     </component>
>
>     <component name="GuardianComponent">
>
>         <tuscany:implementation.guardian>
>
>             <tuscany:guardianProperties
>
> recovery_rules="src/main/resources/org/apache/tuscany/sca/implementation/guardian/itests/primaryBackup/simple/recoveryRules.xml"
>
> resolution_trees="src/main/resources/org/apache/tuscany/sca/implementation/guardian/itests/primaryBackup/resolutionTrees.xml"/>
>
>
>         </tuscany:implementation.guardian>
>
>     </component>
>
> </composite>
>
> So, we need to define a component with the implementation type
> "implementation.guardian" and the guardianProperties (that includes the
> recoveryRules and the resolutionTrees)
>
> We still have to define one guardian member for each participant involved
> in the composite. I am working in a way to enhance this part.
>
> Thoughts?
>
> PS:
>
> Initially, the usage of the policy framework was suggest by some people in
> order to structure the guardian model. In this way, the recovery rules and
> the resolution trees would be defined in the definition.xml file, and the
> proper action would be invoked by the interceptors. However, I think that
> having an implementation type for the guardian group, is a better way to
> implement the model. This is because the guardian group represents a global
> distributed entity, that is able to communicate with a set of guardian
> members. So, the guardian needs to know the state of all guardian members.
> Considering that we could have a set of participants distributed in
> different machines, that communicate with each other using a protocol like
> SOAP messages, how could I instantiate the same interceptor to coordinate
> all the participants? I don't know if it is possible (is it?), but using a
> new implementation type, to create a global component, the problem was
> solved.
>
> Maybe I could use the policy framework to implement the guardian members.
> But I am not sure about that yet. So, a I need to think better.
>
>
>
> On Sat, Aug 15, 2009 at 10:57 AM, Douglas Leite <do...@gmail.com>wrote:
>
>> I will try to give an explanation of how the model is working. I will
>> explain based on the examples I have developed to test the model.
>>
>> *# Overview*
>>
>> First of all, what kind of application is the guardian model applicable
>> to? The model is applicable to solve the problem of concurrent exceptions
>> occurrence in concurrent applications. So, we have two or more participants
>> executing at the same time, and exchanging messages in a cooperative action.
>>
>> The test scenario is the primary-backup with N backups. In this scenario
>> we have a server-client application, with N participants on the server side.
>> The first participant to join in the server side becomes the primary server,
>> and the subsequent ones are the backups. The primary gets a request from a
>> client, and sends a reply to the client and a copy of its state to the
>> backups. When the primary fails, the first backup on the queue becomes the
>> new server. On the other hand, when a backup fails, the primary simply stops
>> to send updates to it.
>>
>> *# SCDL file for primary-backup with N backups scenario*
>>
>> Since all participants need to know each other, we define:
>>
>> <composite>
>>
>>     <component name="Participant1">
>>         <implementation.java
>> class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.NodeImpl"/>
>>         <reference name="nodes" target="Participant2 Participant3
>> Participant4"/>
>>     </component>
>>
>>     <component name="Participant2">
>>         <implementation.java
>> class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.NodeImpl"/>
>>         <reference name="nodes" target="Participant1 Participant3
>> Participant4"/>
>>     </component>
>>
>>     <component name="Participant3">
>>         <implementation.java
>> class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.NodeImpl"/>
>>         <reference name="nodes" target="Participant1 Participant2
>> Participant4"/>
>>     </component>
>>
>>     <component name="Participant4">
>>         <implementation.java
>> class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.NodeImpl"/>
>>         <reference name="nodes" target="Participant1 Participant2
>> Participant3"/>
>>     </component>
>>
>> ...
>>
>> </composite>
>>
>> Each participant is an instance of the NodeImpl class ([1]) that contains
>> three main methods: *execute*, *sendUpdate*, and *applyUpdate*. The first
>> one is used to start the participant's execution thread. This method is
>> annotated with @OneWay annotation, which marks the execution to be
>> asynchronous. The second  method, is used by the server to send updates to
>> the backups. Finally, the *applyUpdate *is used by the backups to apply
>> the updates received from the server.
>>
>> All the communication referent to the exceptional behavior between the
>> participants is done by the guardian, which was implemented as a component.
>> So, we need to define the guardian in the SCDL file:
>>
>> <composite>
>>
>>     <component name="Participant1">...</component>
>>
>>     <component name="Participant2">...</component>
>>
>>     <component name="Participant3">...</component>
>>
>>     <component name="Participant4">...</component>
>>
>>     <component name="GuardianGroup">
>>         <implementation.java
>> class="org.apache.tuscany.sca.guardian.GuardianGroupImpl"/>
>>         <property
>> name="recovery_rules">src/main/resources/recoveryrules_nbackpus_concurrent.xml</property>
>>         <property
>> name="resolution_tree">src/main/resources/resolutionTree.xml</property>
>>     </component>
>>
>> <composite>
>>
>> The guardian is an instance of the
>> org.apache.tuscany.sca.guardian.GuardianGroupImpl class, and provides the
>> org.apache.tuscany.sca.guardian.GuardianPrimitives as the main interface for
>> communication.
>>
>> The GuardianPrimitives contains the following methods:
>>
>>    1.     public void enableContext(Context context);
>>    2.     public void removeContext();
>>    3.     public void gthrow(GlobalExceptionInterface ex, List<String>
>>    participantList);
>>    4.     public boolean propagate(GlobalExceptionInterface ex);
>>    5.     public void checkExceptionStatus() throws GlobalException;
>>
>> The methods 1 and 2 are designed to add and remove a context,
>> respectively.
>> The method 3 is used every time a participant want to signal an external
>> exception, in other words, an exception that needs to be treated
>> cooperatively by a set of participant.
>> The method 4 is used to check if a specific exception needs to be
>> propagated to another context or not.
>> The method 5 is used to check if there are exceptions to be treated.
>>
>> These methods are the channel the participants use to communicate with
>> each other, when they need to treat an exception cooperatively.
>>
>> However, the participants do not communicate with the guardian directly.
>> Instead, they communicate with a guardian member, which is a mediator
>> between the participants and the guardian. Each participant is associated
>> with a guardian member. So the communication is established like this:
>> participant -> guardian member -> guardian, and guardian -> guardian member
>> -> participant.
>>
>> The guardian member was implemented as a component too:
>>
>> <composite>
>>
>> ...
>>
>>     <component name="GuardianMember1">
>>         <implementation.java
>> class="org.apache.tuscany.sca.guardian.GuardianMemberImpl"/>
>>         <reference name="guardian_group" target="GuardianGroup"/>
>>     </component>
>>
>>     <component name="GuardianMember2">
>>         <implementation.java
>> class="org.apache.tuscany.sca.guardian.GuardianMemberImpl"/>
>>         <reference name="guardian_group" target="GuardianGroup"/>
>>     </component>
>>
>>     <component name="GuardianMember3">
>>         <implementation.java
>> class="org.apache.tuscany.sca.guardian.GuardianMemberImpl"/>
>>         <reference name="guardian_group" target="GuardianGroup"/>
>>     </component>
>>
>>     <component name="GuardianMember4">
>>         <implementation.java
>> class="org.apache.tuscany.sca.guardian.GuardianMemberImpl"/>
>>         <reference name="guardian_group" target="GuardianGroup"/>
>>     </component>
>>
>>     <component name="GuardianGroup">...</component>
>>
>> </composite>
>>
>> The org.apache.tuscany.sca.guardian.GuardianMemberImpl defines the
>> guardian member. Each guardian member has a reference to the guardian group,
>> as well as, each participant has a reference to its respective guardian
>> member.
>>
>> The full SCDL file can be found at [2].
>>
>> The GuardianMemberImpl implements the GuardianPrimitives, so the
>> participants communicate with each other using the methods present in that
>> interface through their respective guardian members.
>> *
>> #Using the model
>> *
>> Hitherto, we have talked about three concepts of the guardian  model: the
>> guardian group, the guardian members, and the guardian primitives. Another
>> important concept is the contexts. A context defines a place in the
>> participant, to signal and treat external exceptions. A context has two
>> important attributes: a name, and the list of exception that can be treated
>> in that context. The class org.apache.tuscany.sca.guardian.Context defines
>> an instance for a context.
>>
>> The primary-backup scenario has three contexts: MAIN, PRIMARY, and BACKUP,
>> where the PRIMARY and BACKUP are nested contexts to the MAIN context. A
>> context can be activate using the *enableContext *method from the
>> guardian member. The *disableContext *has the contrary effect. One time a
>> context is activated, it keeps on this state until the invocation of *disableContext
>> *or the activation of a nested context.
>>
>> The general structure of the NodeImpl class is shown below:
>>
>>    1.     @OneWay
>>    2.     public void execute() {
>>    3.         gm.enableContext(mainContext);
>>    4.         while (true) {
>>    5.             try {
>>    6.                 gm.checkExceptionStatus();
>>    7.                 if (role == PRIMARY) {
>>    8.                     //Config as primary then...
>>    9.                     primaryService();
>>    10.                 } else {
>>    11.                     //Config as backup then...
>>    12.                     backupService();
>>    13.                 }
>>    14.             } catch (PrimaryExistsException ex) {...}
>>    15.                catch (PrimaryFailedException ex) {...}
>>    16.                catch (BackupFailedException ex) {...}
>>    17.         }
>>    18.     }
>>    19.     private void primaryService() {
>>    20.         while (true) {
>>    21.             gm.enableContext(primaryContext);
>>    22.             try {
>>    23.                 gm.checkExceptionStatus();
>>    24.                 //Process the request then...
>>    25.                 ...
>>    26.                 if (backupAvailable) {
>>    27.                         //send updates to the backups
>>    28.                         ...
>>    29.                 }
>>    30.                 //send the reply to the client
>>    31.                 ...
>>    32.             } catch (PrimaryServiceFailureException ex) {...}
>>    33.                catch (BackupFailedException ex) {...}
>>    34.                catch (BackupJoinedException ex) {...}
>>    35.                finally {
>>    36.                 gm.removeContext();
>>    37.             }
>>    38.         }
>>    39.     }
>>    40.     private void backupService() {
>>    41.         while (true) {
>>    42.             gm.enableContext(backupContext);
>>    43.             try {
>>    44.                 gm.checkExceptionStatus();
>>    45.                 applyUpdate();
>>    46.             } catch (ApplyUpdateFailureException ex) {...}
>>    47.                finally {
>>    48.                 gm.removeContext();
>>    49.             }
>>    50.         }
>>    51.     }
>>
>> As can be noticed the MAIN context is activated in the rows 1-18; the
>> PRIMARY in the rows 19-39; and the BACKUP in the rows 40- 51. Each context
>> is associated to a method, and since the *primaryService()* and *
>> backupService()* are invoked inside the *execute()*, we have the PRIMARY
>> and BACKUP as nested contexts to the MAIN context. When the first
>> participant joins in the guardian group, it context list is defined as
>> MAIN.PRIMARY. For the subsequent participants, the context list is defined
>> as MAIN.BACKUP.
>>
>> The core of this general structure is:
>>
>> //scope
>> {
>> //Activate a context
>> gm.enableContext(SomeContext);
>>
>> try{
>> //Check for unhandled exceptions
>> gm.checkExceptionStatus();
>>
>> //Application-specific code
>> . . .
>>
>> }catch () {}
>> finally {
>> gm.removeContext();
>> }
>> }
>>
>> After the activation of a context, it is necessary to check for unhandled
>> exceptions with the *checkExceptionalStatus()* guardian member method.
>> This method checks for external exceptions that was raised by other
>> participants, but that has an influence in the behavior of this participant.
>> If there is some exception to be handle, than the *
>> checkExceptionalStatus()* raises the exception; otherwise the method
>> returns.
>>
>> Every time a participant wants to signal an external exception, it uses
>> the *gthrow()* method from its respective guardian member. The messages
>> exchanged between the participants, guardian members, and guardian group
>> when the gthrow is invoked is depicted in the sequence diagram [3]. (See the
>> "Progress on the GSoC project: Supporting Concurrent Exception Handling
>> at Tuscany SCA" conversation thread for more details).
>>
>> *# Recovery Rules XML File*
>>
>> When a participant invokes *gthrow() *to signal an external exception to
>> a set of participants, the guardian group calls the recovery rules, defined
>> by the user, to find out which exception should be raised in each
>> participant present in the list, as well as, the proper target context (in
>> other words, the place where the exception will be raised and treated).
>>
>> A piece of the recovery rules XML file for the discussed scenario is (see
>> the full file at [4]):
>>
>> <recovery_rules>
>>
>>     <!-- A new participant joins in the group -->
>>     <rule name="Rule1"
>> signaled_exception="org.apache.tuscany.sca.guardian.JoinException">
>>
>>         <participant match="*.PRIMARY">
>>             <throw_exception
>> class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.BackupJoinedException"
>> target_context="PRIMARY"/>
>>         </participant>
>>
>>         <participant match="SIGNALER">
>>             <throw_exception
>> class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.PrimaryExistsException"
>> target_context="MAIN" min_participant_joined="2"/>
>>         </participant>
>>     </rule>
>>     ...
>> </recovery_rules>
>>
>> When a participant joins in the guardian group, the guardian raises a
>> JoinException indicating that a new participant has joined. The defined
>> recovery rule "Rule1", is applied when such exception is found. Then, the
>> guardian adds a BackupJoinedException, with target context equals to
>> PRIMARY, to all active participants that are in the "*.PRIMARY" context
>> (MAIN.PRIMARY fills this rule), and a PrimaryExistsException, with target
>> context equals to MAIN, in the participant that has raised the external
>> exception (in other words, the SIGNALER), if there are at least two
>> participants that have already joined in the guardian group.
>>
>> "Rule 2" is applied when a participant raise a PrimaryFailedException.
>> Such exception means that an internal error has occurred in the participant
>> that has the PRIMARY context activate.
>>
>>     <rule name="Rule2"
>> signaled_exception="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.PrimaryFailedException">
>>
>>         <participant match="*.PRIMARY">
>>             <throw_exception
>> class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.PrimaryFailedException"
>> target_context="INIT_CONTEXT"/>
>>         </participant>
>>
>>         <participant match="*.BACKUP">
>>             <throw_exception
>> class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.PrimaryFailedException"
>> target_context="MAIN">
>>                 <affected_participants>FIRST</affected_participants>
>>             </throw_exception>
>>         </participant>
>>     </rule>
>>
>> The guardian adds a PrimaryFailedException, with target context equals
>> INIT_CONTEXT, to the participant that is in the PRIMARY context. The
>> INIT_CONTEXT is the most outside context, and it comes before the other
>> contexts defined by the user. In this application, the INIT_CONTEXT is the
>> place where NodeImpl.execute() is invoked. For this application, raising an
>> exception in this context, means that the participant has failed.
>>
>> For the first backup in the list of backups, a PrimaryFailedException is
>> added with the target context equals MAIN.
>>
>> The "Rule 3" works like the "Rule 2", but it is applied for a
>> BackupFailedException:
>>
>>     <!-- The Backup fails -->
>>     <rule name="Rule3"
>> signaled_exception="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.BackupFailedException">
>>
>>         <participant match="*.PRIMARY">
>>             <throw_exception
>> class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.BackupFailedException"
>> target_context="PRIMARY"/>
>>         </participant>
>>
>>         <participant match="SIGNALER">
>>             <throw_exception
>> class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.BackupFailedException"
>> target_context="INIT_CONTEXT"/>
>>         </participant>
>>     </rule>
>>
>> *# Putting the pieces together...*
>>
>> Summarizing, the application works like that:
>>
>>    1. A participant 'A' joins in the guardian group with the MAIN context
>>    activate; a JoinException is signaled by the guardian; no exceptions are
>>    delivered to the participant; and the participant reaches the PRIMARY
>>    context.
>>    2. A new participant 'B' joins in the guardian group with the MAIN
>>    context activate; a JoinException is signaled by the guardian; the guardian
>>    executes the recovery rule "Rule1"; a BackupJoinedException, with target
>>    context equals PRIMARY, is delivered to the participant A; a
>>    PrimaryExistsException, with target context equals MAIN, is delivered to the
>>    participant B.
>>    3. When the participant A invokes *checkExceptionalStatus()* the
>>    BackupJoinedException is raised in it, and it starts to send updates to the
>>    backup.
>>    4. When the participant B invokes *checkExceptinalStatus()* the
>>    PrimaryExistsException is raised in it, and it becomes a backup.
>>
>> After that, the primary send updates to the backups, and the backups apply
>> the updates received from the primary.
>>
>> If an internal error occurs in the primary, we have:
>>
>>    1. The participant 'A' fails, so a PrimaryFailedException is signaled
>>    to the guardian.
>>    2. The guardian executes the recovery rule "Rule2".
>>    3. The guardian adds a PrimaryFailedException, with target context
>>    equals INIT_CONTEXT, to the participant 'A'.
>>    4. The guardian adds a PrimaryFailedException, with target context
>>    equals MAIN, to the first backup in the backup list (in this case, the
>>    participant 'B')
>>    5. When the participant 'A' invokes *checkExceptionalStatus()* the
>>    PrimaryFailedException is raised in it, and propagated until the init
>>    context, what causes the stop of this participant.
>>    6. When the participant 'B' invokes *checkExceptionalStatus() *the
>>    PrimaryFailedException is raised in it, and the participant becomes the
>>    primary.
>>
>> If an internal error occurs in the backup, we have:
>>
>>    1. The participant 'B' fails, so a BackupFailedException is signaled
>>    to the guardian.
>>    2. The guardian executes the recovery rule "Rule3".
>>    3. The guardian adds a BackupFailedException, with target context
>>    equals PRIMARY, to the participant 'A'.
>>    4. The guardian adds a BackupFailedException, with target context
>>    equals INIT_CONTEXT, to participant 'B'.
>>    5. When the participant 'A' invokes *checkExceptionalStatus()* , the
>>    BackupFailedException is raised in it, and it removes the participant 'B'
>>    from its backup list.
>>    6. When the participant 'B', invokes *checkExceptionalStatus()* the
>>    BackupFailedException is raised in it, and propagated until the init
>>    context, what causes the stop of this participant.
>>
>> *# Concurrent Exceptions and the Resolution Tree*
>>
>> Due to the fact that the gthrow executes asynchronously, concurrent
>> exceptions can occur.
>>
>> When concurrent exceptions occur, the guardian searches, in a resolution
>> tree, for the lowest common ancestor between the concurrent exceptions, and
>> then apply the recovery rules for this resolved exception. If there isn´t a
>> lowest common ancestor, than the guardian apply the recovery rules for each
>> exception sequentially.
>>
>> The resolution tree for the discussed scenario is:
>>
>> <resolution_trees>
>>     <resolution_tree exception_level="1">
>>         <exception
>> class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.PrimaryBackupFailedTogetherException">
>>             <exception
>> class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.PrimaryFailedException"/>
>>             <exception
>> class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.BackupFailedException"/>
>>         </exception>
>>     </resolution_tree>
>> </resolution_trees>
>>
>> In this way, when a primary and a backup fail together, the
>> PrimaryFailedException and BackupFailedException will be concurrent, and the
>> resolved exception will be the PrimaryFailedBackupTogetherException.
>>
>> The recovery rule "Rule4" works when such kind of exception is signaled:
>>
>>     <!-- The Primary and Backup fail together -->
>>     <rule  name="Rule4"
>> signaled_exception="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.PrimaryBackupFailedTogetherException">
>>
>>         <participant match="*.PRIMARY">
>>             <throw_exception
>> class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.PrimaryFailedException"
>> target_context="INIT_CONTEXT"/>
>>         </participant>
>>
>>          <!-- Backup signaler -->
>>         <participant match="*.BACKUP,SIGNALER">
>>             <throw_exception
>> class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.BackupFailedException"
>> target_context="INIT_CONTEXT"/>
>>         </participant>
>>
>>         <!-- Excluding the backup signaler -->
>>         <participant match="*.BACKUP,!SIGNALER">
>>             <throw_exception
>> class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.PrimaryFailedException"
>> target_context="MAIN">
>>                 <affected_participants>FIRST</affected_participants>
>>             </throw_exception>
>>         </participant>
>>     </rule>
>>
>> The guardian adds a PrimaryFailedException, with target context
>> INIT_CONTEXT, to the participant that is in the PRIMARY context. Similarly,
>> the guardian adds a BackupFailedException, with target context INIT_CONTEXT,
>> to the participant that is in the BACKUP context, and has signaled the
>> external exception BackupFailedException. A PrimaryFailedException, with
>> target context MAIN, is added to the first backup in the backup list that
>> has not signaled any exception.
>>
>> This action causes the end of execution of the participants that have
>> failed, and choose a new backup to become the new primary server.
>>
>> *# Ideas to improve the model implementation*
>>
>> Although the implementation is working, I think that some modifications
>> could be done in order to approximate more the model to the tuscany sca.
>>
>>    1. As was suggested previously, I think that could be a good idea uses
>>    the recovery rules and the resolution tree as policies, instead of
>>    properties in the guardian component.
>>    2. Instead of using the org.apache.tuscany.sca.guardian.GuardianImpl
>>    as the class of a implementation.java component, maybe would be better
>>    define a new implementation type, like implementation.guardian, that has the
>>    org.apache.tuscany.sca.guardian.Guardian as its service interface, and
>>    allows recovery-rules and resolution-tree as policies.
>>    3. Since we know all participants need to have a guardian member
>>    associated with it, the guardian members could be created automatically by
>>    the runtime. In this way, the user only need to define a component of type
>>    implementation.guardian, and has a reference to it, and in the background,
>>    the runtime creates one guardian member to each participant, and do the
>>    proper bindings between the components.
>>
>> That's all for now. Let me know what you think. If you need some more
>> explanation, ask me. :)
>>
>> *# Links*
>>
>> [1]
>> http://svn.apache.org/repos/asf/tuscany/sandbox/dougsleite/guardian-model/src/test/java/org/apache/tuscany/sca/guardian/itests/primaryBackup/common/NodeImpl.java
>> [2]
>> http://svn.apache.org/repos/asf/tuscany/sandbox/dougsleite/guardian-model/src/main/resources/primaryNbackups-concurrent.composite
>> [3]
>> http://svn.apache.org/repos/asf/tuscany/sandbox/dougsleite/guardian-model/sequenceDiagram-externalException.jpg
>> [4]
>> http://svn.apache.org/repos/asf/tuscany/sandbox/dougsleite/guardian-model/src/main/resources/recoveryrules_nbackpus_concurrent.xml
>>
>> --
>> Douglas Siqueira Leite
>> Graduate student at University of Campinas (Unicamp), Brazil
>>
>>
>
>
> --
> Douglas Siqueira Leite
> Graduate student at University of Campinas (Unicamp), Brazil
>
>


-- 
Douglas Siqueira Leite
Graduate student at University of Campinas (Unicamp), Brazil

Re: [GSoC] More Details about the Guardian Model Implementation

Posted by Douglas Leite <do...@gmail.com>.
Hi,

I have implemented a new implementation type called
"implementation.guardian".

Using the impl.guardian module, the SCDL file would be like that:

<composite>

    <component name="Participant1">
        <implementation.java
class="org.apache.tuscany.sca.implementation.guardian.itests.primaryBackup.common.NodeImpl"/>

        <reference name="guardian_member" target="GuardianMember1"/>
        <reference name="nodes" target="Participant2"/>
    </component>

    <component name="Participant2">
        <implementation.java
class="org.apache.tuscany.sca.implementation.guardian.itests.primaryBackup.common.NodeImpl"/>

        <reference name="guardian_member" target="GuardianMember2"/>
        <reference name="nodes" target="Participant1"/>
    </component>

    <component name="GuardianMember1">
        <implementation.java
class="org.apache.tuscany.sca.implementation.guardian.impl.GuardianMemberImpl"/>

        <reference name="guardian_group" target="GuardianComponent"/>
    </component>

    <component name="GuardianMember2">
        <implementation.java
class="org.apache.tuscany.sca.implementation.guardian.impl.GuardianMemberImpl"/>

        <reference name="guardian_group" target="GuardianComponent"/>
    </component>

    <component name="GuardianComponent">
        <tuscany:implementation.guardian>

            <tuscany:guardianProperties

recovery_rules="src/main/resources/org/apache/tuscany/sca/implementation/guardian/itests/primaryBackup/simple/recoveryRules.xml"

resolution_trees="src/main/resources/org/apache/tuscany/sca/implementation/guardian/itests/primaryBackup/resolutionTrees.xml"/>

        </tuscany:implementation.guardian>

    </component>

</composite>

So, we need to define a component with the implementation type
"implementation.guardian" and the guardianProperties (that includes the
recoveryRules and the resolutionTrees)

We still have to define one guardian member for each participant involved in
the composite. I am working in a way to enhance this part.

Thoughts?

PS:

Initially, the usage of the policy framework was suggest by some people in
order to structure the guardian model. In this way, the recovery rules and
the resolution trees would be defined in the definition.xml file, and the
proper action would be invoked by the interceptors. However, I think that
having an implementation type for the guardian group, is a better way to
implement the model. This is because the guardian group represents a global
distributed entity, that is able to communicate with a set of guardian
members. So, the guardian needs to know the state of all guardian members.
Considering that we could have a set of participants distributed in
different machines, that communicate with each other using a protocol like
SOAP messages, how could I instantiate the same interceptor to coordinate
all the participants? I don't know if it is possible (is it?), but using a
new implementation type, to create a global component, the problem was
solved.

Maybe I could use the policy framework to implement the guardian members.
But I am not sure about that yet. So, a I need to think better.


On Sat, Aug 15, 2009 at 10:57 AM, Douglas Leite <do...@gmail.com>wrote:

> I will try to give an explanation of how the model is working. I will
> explain based on the examples I have developed to test the model.
>
> *# Overview*
>
> First of all, what kind of application is the guardian model applicable to?
> The model is applicable to solve the problem of concurrent exceptions
> occurrence in concurrent applications. So, we have two or more participants
> executing at the same time, and exchanging messages in a cooperative action.
>
> The test scenario is the primary-backup with N backups. In this scenario we
> have a server-client application, with N participants on the server side.
> The first participant to join in the server side becomes the primary server,
> and the subsequent ones are the backups. The primary gets a request from a
> client, and sends a reply to the client and a copy of its state to the
> backups. When the primary fails, the first backup on the queue becomes the
> new server. On the other hand, when a backup fails, the primary simply stops
> to send updates to it.
>
> *# SCDL file for primary-backup with N backups scenario*
>
> Since all participants need to know each other, we define:
>
> <composite>
>
>     <component name="Participant1">
>         <implementation.java
> class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.NodeImpl"/>
>         <reference name="nodes" target="Participant2 Participant3
> Participant4"/>
>     </component>
>
>     <component name="Participant2">
>         <implementation.java
> class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.NodeImpl"/>
>         <reference name="nodes" target="Participant1 Participant3
> Participant4"/>
>     </component>
>
>     <component name="Participant3">
>         <implementation.java
> class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.NodeImpl"/>
>         <reference name="nodes" target="Participant1 Participant2
> Participant4"/>
>     </component>
>
>     <component name="Participant4">
>         <implementation.java
> class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.NodeImpl"/>
>         <reference name="nodes" target="Participant1 Participant2
> Participant3"/>
>     </component>
>
> ...
>
> </composite>
>
> Each participant is an instance of the NodeImpl class ([1]) that contains
> three main methods: *execute*, *sendUpdate*, and *applyUpdate*. The first
> one is used to start the participant's execution thread. This method is
> annotated with @OneWay annotation, which marks the execution to be
> asynchronous. The second  method, is used by the server to send updates to
> the backups. Finally, the *applyUpdate *is used by the backups to apply
> the updates received from the server.
>
> All the communication referent to the exceptional behavior between the
> participants is done by the guardian, which was implemented as a component.
> So, we need to define the guardian in the SCDL file:
>
> <composite>
>
>     <component name="Participant1">...</component>
>
>     <component name="Participant2">...</component>
>
>     <component name="Participant3">...</component>
>
>     <component name="Participant4">...</component>
>
>     <component name="GuardianGroup">
>         <implementation.java
> class="org.apache.tuscany.sca.guardian.GuardianGroupImpl"/>
>         <property
> name="recovery_rules">src/main/resources/recoveryrules_nbackpus_concurrent.xml</property>
>         <property
> name="resolution_tree">src/main/resources/resolutionTree.xml</property>
>     </component>
>
> <composite>
>
> The guardian is an instance of the
> org.apache.tuscany.sca.guardian.GuardianGroupImpl class, and provides the
> org.apache.tuscany.sca.guardian.GuardianPrimitives as the main interface for
> communication.
>
> The GuardianPrimitives contains the following methods:
>
>    1.     public void enableContext(Context context);
>    2.     public void removeContext();
>    3.     public void gthrow(GlobalExceptionInterface ex, List<String>
>    participantList);
>    4.     public boolean propagate(GlobalExceptionInterface ex);
>    5.     public void checkExceptionStatus() throws GlobalException;
>
> The methods 1 and 2 are designed to add and remove a context, respectively.
> The method 3 is used every time a participant want to signal an external
> exception, in other words, an exception that needs to be treated
> cooperatively by a set of participant.
> The method 4 is used to check if a specific exception needs to be
> propagated to another context or not.
> The method 5 is used to check if there are exceptions to be treated.
>
> These methods are the channel the participants use to communicate with each
> other, when they need to treat an exception cooperatively.
>
> However, the participants do not communicate with the guardian directly.
> Instead, they communicate with a guardian member, which is a mediator
> between the participants and the guardian. Each participant is associated
> with a guardian member. So the communication is established like this:
> participant -> guardian member -> guardian, and guardian -> guardian member
> -> participant.
>
> The guardian member was implemented as a component too:
>
> <composite>
>
> ...
>
>     <component name="GuardianMember1">
>         <implementation.java
> class="org.apache.tuscany.sca.guardian.GuardianMemberImpl"/>
>         <reference name="guardian_group" target="GuardianGroup"/>
>     </component>
>
>     <component name="GuardianMember2">
>         <implementation.java
> class="org.apache.tuscany.sca.guardian.GuardianMemberImpl"/>
>         <reference name="guardian_group" target="GuardianGroup"/>
>     </component>
>
>     <component name="GuardianMember3">
>         <implementation.java
> class="org.apache.tuscany.sca.guardian.GuardianMemberImpl"/>
>         <reference name="guardian_group" target="GuardianGroup"/>
>     </component>
>
>     <component name="GuardianMember4">
>         <implementation.java
> class="org.apache.tuscany.sca.guardian.GuardianMemberImpl"/>
>         <reference name="guardian_group" target="GuardianGroup"/>
>     </component>
>
>     <component name="GuardianGroup">...</component>
>
> </composite>
>
> The org.apache.tuscany.sca.guardian.GuardianMemberImpl defines the guardian
> member. Each guardian member has a reference to the guardian group, as well
> as, each participant has a reference to its respective guardian member.
>
> The full SCDL file can be found at [2].
>
> The GuardianMemberImpl implements the GuardianPrimitives, so the
> participants communicate with each other using the methods present in that
> interface through their respective guardian members.
> *
> #Using the model
> *
> Hitherto, we have talked about three concepts of the guardian  model: the
> guardian group, the guardian members, and the guardian primitives. Another
> important concept is the contexts. A context defines a place in the
> participant, to signal and treat external exceptions. A context has two
> important attributes: a name, and the list of exception that can be treated
> in that context. The class org.apache.tuscany.sca.guardian.Context defines
> an instance for a context.
>
> The primary-backup scenario has three contexts: MAIN, PRIMARY, and BACKUP,
> where the PRIMARY and BACKUP are nested contexts to the MAIN context. A
> context can be activate using the *enableContext *method from the guardian
> member. The *disableContext *has the contrary effect. One time a context
> is activated, it keeps on this state until the invocation of *disableContext
> *or the activation of a nested context.
>
> The general structure of the NodeImpl class is shown below:
>
>    1.     @OneWay
>    2.     public void execute() {
>    3.         gm.enableContext(mainContext);
>    4.         while (true) {
>    5.             try {
>    6.                 gm.checkExceptionStatus();
>    7.                 if (role == PRIMARY) {
>    8.                     //Config as primary then...
>    9.                     primaryService();
>    10.                 } else {
>    11.                     //Config as backup then...
>    12.                     backupService();
>    13.                 }
>    14.             } catch (PrimaryExistsException ex) {...}
>    15.                catch (PrimaryFailedException ex) {...}
>    16.                catch (BackupFailedException ex) {...}
>    17.         }
>    18.     }
>    19.     private void primaryService() {
>    20.         while (true) {
>    21.             gm.enableContext(primaryContext);
>    22.             try {
>    23.                 gm.checkExceptionStatus();
>    24.                 //Process the request then...
>    25.                 ...
>    26.                 if (backupAvailable) {
>    27.                         //send updates to the backups
>    28.                         ...
>    29.                 }
>    30.                 //send the reply to the client
>    31.                 ...
>    32.             } catch (PrimaryServiceFailureException ex) {...}
>    33.                catch (BackupFailedException ex) {...}
>    34.                catch (BackupJoinedException ex) {...}
>    35.                finally {
>    36.                 gm.removeContext();
>    37.             }
>    38.         }
>    39.     }
>    40.     private void backupService() {
>    41.         while (true) {
>    42.             gm.enableContext(backupContext);
>    43.             try {
>    44.                 gm.checkExceptionStatus();
>    45.                 applyUpdate();
>    46.             } catch (ApplyUpdateFailureException ex) {...}
>    47.                finally {
>    48.                 gm.removeContext();
>    49.             }
>    50.         }
>    51.     }
>
> As can be noticed the MAIN context is activated in the rows 1-18; the
> PRIMARY in the rows 19-39; and the BACKUP in the rows 40- 51. Each context
> is associated to a method, and since the *primaryService()* and *
> backupService()* are invoked inside the *execute()*, we have the PRIMARY
> and BACKUP as nested contexts to the MAIN context. When the first
> participant joins in the guardian group, it context list is defined as
> MAIN.PRIMARY. For the subsequent participants, the context list is defined
> as MAIN.BACKUP.
>
> The core of this general structure is:
>
> //scope
> {
> //Activate a context
> gm.enableContext(SomeContext);
>
> try{
> //Check for unhandled exceptions
> gm.checkExceptionStatus();
>
> //Application-specific code
> . . .
>
> }catch () {}
> finally {
> gm.removeContext();
> }
> }
>
> After the activation of a context, it is necessary to check for unhandled
> exceptions with the *checkExceptionalStatus()* guardian member method.
> This method checks for external exceptions that was raised by other
> participants, but that has an influence in the behavior of this participant.
> If there is some exception to be handle, than the *
> checkExceptionalStatus()* raises the exception; otherwise the method
> returns.
>
> Every time a participant wants to signal an external exception, it uses the
> *gthrow()* method from its respective guardian member. The messages
> exchanged between the participants, guardian members, and guardian group
> when the gthrow is invoked is depicted in the sequence diagram [3]. (See the
> "Progress on the GSoC project: Supporting Concurrent Exception Handling at
> Tuscany SCA" conversation thread for more details).
>
> *# Recovery Rules XML File*
>
> When a participant invokes *gthrow() *to signal an external exception to a
> set of participants, the guardian group calls the recovery rules, defined by
> the user, to find out which exception should be raised in each participant
> present in the list, as well as, the proper target context (in other words,
> the place where the exception will be raised and treated).
>
> A piece of the recovery rules XML file for the discussed scenario is (see
> the full file at [4]):
>
> <recovery_rules>
>
>     <!-- A new participant joins in the group -->
>     <rule name="Rule1"
> signaled_exception="org.apache.tuscany.sca.guardian.JoinException">
>
>         <participant match="*.PRIMARY">
>             <throw_exception
> class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.BackupJoinedException"
> target_context="PRIMARY"/>
>         </participant>
>
>         <participant match="SIGNALER">
>             <throw_exception
> class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.PrimaryExistsException"
> target_context="MAIN" min_participant_joined="2"/>
>         </participant>
>     </rule>
>     ...
> </recovery_rules>
>
> When a participant joins in the guardian group, the guardian raises a
> JoinException indicating that a new participant has joined. The defined
> recovery rule "Rule1", is applied when such exception is found. Then, the
> guardian adds a BackupJoinedException, with target context equals to
> PRIMARY, to all active participants that are in the "*.PRIMARY" context
> (MAIN.PRIMARY fills this rule), and a PrimaryExistsException, with target
> context equals to MAIN, in the participant that has raised the external
> exception (in other words, the SIGNALER), if there are at least two
> participants that have already joined in the guardian group.
>
> "Rule 2" is applied when a participant raise a PrimaryFailedException. Such
> exception means that an internal error has occurred in the participant that
> has the PRIMARY context activate.
>
>     <rule name="Rule2"
> signaled_exception="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.PrimaryFailedException">
>
>         <participant match="*.PRIMARY">
>             <throw_exception
> class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.PrimaryFailedException"
> target_context="INIT_CONTEXT"/>
>         </participant>
>
>         <participant match="*.BACKUP">
>             <throw_exception
> class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.PrimaryFailedException"
> target_context="MAIN">
>                 <affected_participants>FIRST</affected_participants>
>             </throw_exception>
>         </participant>
>     </rule>
>
> The guardian adds a PrimaryFailedException, with target context equals
> INIT_CONTEXT, to the participant that is in the PRIMARY context. The
> INIT_CONTEXT is the most outside context, and it comes before the other
> contexts defined by the user. In this application, the INIT_CONTEXT is the
> place where NodeImpl.execute() is invoked. For this application, raising an
> exception in this context, means that the participant has failed.
>
> For the first backup in the list of backups, a PrimaryFailedException is
> added with the target context equals MAIN.
>
> The "Rule 3" works like the "Rule 2", but it is applied for a
> BackupFailedException:
>
>     <!-- The Backup fails -->
>     <rule name="Rule3"
> signaled_exception="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.BackupFailedException">
>
>         <participant match="*.PRIMARY">
>             <throw_exception
> class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.BackupFailedException"
> target_context="PRIMARY"/>
>         </participant>
>
>         <participant match="SIGNALER">
>             <throw_exception
> class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.BackupFailedException"
> target_context="INIT_CONTEXT"/>
>         </participant>
>     </rule>
>
> *# Putting the pieces together...*
>
> Summarizing, the application works like that:
>
>    1. A participant 'A' joins in the guardian group with the MAIN context
>    activate; a JoinException is signaled by the guardian; no exceptions are
>    delivered to the participant; and the participant reaches the PRIMARY
>    context.
>    2. A new participant 'B' joins in the guardian group with the MAIN
>    context activate; a JoinException is signaled by the guardian; the guardian
>    executes the recovery rule "Rule1"; a BackupJoinedException, with target
>    context equals PRIMARY, is delivered to the participant A; a
>    PrimaryExistsException, with target context equals MAIN, is delivered to the
>    participant B.
>    3. When the participant A invokes *checkExceptionalStatus()* the
>    BackupJoinedException is raised in it, and it starts to send updates to the
>    backup.
>    4. When the participant B invokes *checkExceptinalStatus()* the
>    PrimaryExistsException is raised in it, and it becomes a backup.
>
> After that, the primary send updates to the backups, and the backups apply
> the updates received from the primary.
>
> If an internal error occurs in the primary, we have:
>
>    1. The participant 'A' fails, so a PrimaryFailedException is signaled
>    to the guardian.
>    2. The guardian executes the recovery rule "Rule2".
>    3. The guardian adds a PrimaryFailedException, with target context
>    equals INIT_CONTEXT, to the participant 'A'.
>    4. The guardian adds a PrimaryFailedException, with target context
>    equals MAIN, to the first backup in the backup list (in this case, the
>    participant 'B')
>    5. When the participant 'A' invokes *checkExceptionalStatus()* the
>    PrimaryFailedException is raised in it, and propagated until the init
>    context, what causes the stop of this participant.
>    6. When the participant 'B' invokes *checkExceptionalStatus() *the
>    PrimaryFailedException is raised in it, and the participant becomes the
>    primary.
>
> If an internal error occurs in the backup, we have:
>
>    1. The participant 'B' fails, so a BackupFailedException is signaled to
>    the guardian.
>    2. The guardian executes the recovery rule "Rule3".
>    3. The guardian adds a BackupFailedException, with target context
>    equals PRIMARY, to the participant 'A'.
>    4. The guardian adds a BackupFailedException, with target context
>    equals INIT_CONTEXT, to participant 'B'.
>    5. When the participant 'A' invokes *checkExceptionalStatus()* , the
>    BackupFailedException is raised in it, and it removes the participant 'B'
>    from its backup list.
>    6. When the participant 'B', invokes *checkExceptionalStatus()* the
>    BackupFailedException is raised in it, and propagated until the init
>    context, what causes the stop of this participant.
>
> *# Concurrent Exceptions and the Resolution Tree*
>
> Due to the fact that the gthrow executes asynchronously, concurrent
> exceptions can occur.
>
> When concurrent exceptions occur, the guardian searches, in a resolution
> tree, for the lowest common ancestor between the concurrent exceptions, and
> then apply the recovery rules for this resolved exception. If there isn´t a
> lowest common ancestor, than the guardian apply the recovery rules for each
> exception sequentially.
>
> The resolution tree for the discussed scenario is:
>
> <resolution_trees>
>     <resolution_tree exception_level="1">
>         <exception
> class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.PrimaryBackupFailedTogetherException">
>             <exception
> class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.PrimaryFailedException"/>
>             <exception
> class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.BackupFailedException"/>
>         </exception>
>     </resolution_tree>
> </resolution_trees>
>
> In this way, when a primary and a backup fail together, the
> PrimaryFailedException and BackupFailedException will be concurrent, and the
> resolved exception will be the PrimaryFailedBackupTogetherException.
>
> The recovery rule "Rule4" works when such kind of exception is signaled:
>
>     <!-- The Primary and Backup fail together -->
>     <rule  name="Rule4"
> signaled_exception="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.PrimaryBackupFailedTogetherException">
>
>         <participant match="*.PRIMARY">
>             <throw_exception
> class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.PrimaryFailedException"
> target_context="INIT_CONTEXT"/>
>         </participant>
>
>          <!-- Backup signaler -->
>         <participant match="*.BACKUP,SIGNALER">
>             <throw_exception
> class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.BackupFailedException"
> target_context="INIT_CONTEXT"/>
>         </participant>
>
>         <!-- Excluding the backup signaler -->
>         <participant match="*.BACKUP,!SIGNALER">
>             <throw_exception
> class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.PrimaryFailedException"
> target_context="MAIN">
>                 <affected_participants>FIRST</affected_participants>
>             </throw_exception>
>         </participant>
>     </rule>
>
> The guardian adds a PrimaryFailedException, with target context
> INIT_CONTEXT, to the participant that is in the PRIMARY context. Similarly,
> the guardian adds a BackupFailedException, with target context INIT_CONTEXT,
> to the participant that is in the BACKUP context, and has signaled the
> external exception BackupFailedException. A PrimaryFailedException, with
> target context MAIN, is added to the first backup in the backup list that
> has not signaled any exception.
>
> This action causes the end of execution of the participants that have
> failed, and choose a new backup to become the new primary server.
>
> *# Ideas to improve the model implementation*
>
> Although the implementation is working, I think that some modifications
> could be done in order to approximate more the model to the tuscany sca.
>
>    1. As was suggested previously, I think that could be a good idea uses
>    the recovery rules and the resolution tree as policies, instead of
>    properties in the guardian component.
>    2. Instead of using the org.apache.tuscany.sca.guardian.GuardianImpl as
>    the class of a implementation.java component, maybe would be better define a
>    new implementation type, like implementation.guardian, that has the
>    org.apache.tuscany.sca.guardian.Guardian as its service interface, and
>    allows recovery-rules and resolution-tree as policies.
>    3. Since we know all participants need to have a guardian member
>    associated with it, the guardian members could be created automatically by
>    the runtime. In this way, the user only need to define a component of type
>    implementation.guardian, and has a reference to it, and in the background,
>    the runtime creates one guardian member to each participant, and do the
>    proper bindings between the components.
>
> That's all for now. Let me know what you think. If you need some more
> explanation, ask me. :)
>
> *# Links*
>
> [1]
> http://svn.apache.org/repos/asf/tuscany/sandbox/dougsleite/guardian-model/src/test/java/org/apache/tuscany/sca/guardian/itests/primaryBackup/common/NodeImpl.java
> [2]
> http://svn.apache.org/repos/asf/tuscany/sandbox/dougsleite/guardian-model/src/main/resources/primaryNbackups-concurrent.composite
> [3]
> http://svn.apache.org/repos/asf/tuscany/sandbox/dougsleite/guardian-model/sequenceDiagram-externalException.jpg
> [4]
> http://svn.apache.org/repos/asf/tuscany/sandbox/dougsleite/guardian-model/src/main/resources/recoveryrules_nbackpus_concurrent.xml
>
> --
> Douglas Siqueira Leite
> Graduate student at University of Campinas (Unicamp), Brazil
>
>


-- 
Douglas Siqueira Leite
Graduate student at University of Campinas (Unicamp), Brazil