You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by Bhavani Iyer <bh...@gmail.com> on 2008/08/27 21:18:15 UTC

Using Delta CAS serialization in UIMA-AS

Now that we have Delta CAS support, can we use it in UIMA-AS ?
I'm planning to make the modifications to use Delta CAS as follows.
Please add your comments/suggestions.

*Additional Properties in the ProcessCAS message:*
 There will be an additional property, *AcceptsDeltaCas*, in the ProcessCAS
request message sent by a UIMA-AS client (including a UIMA-AS aggregate that
calls a remote delegate)  to a service specifying that the client accepts a
Delta CAS in the reply. The boolean property will default to true.

 A new boolean property in the ProcessCAS reply message, *SentDeltaCas*,  is
required which indicates that the service sent a Delta CAS in the reply.

 These two properties will enable older UIMA-AS clients and services to work
with  newer client and service.

*Parallel Step handling*
The parallel step contract requires that the remote delegates called in
parallel only
create new FSs.

Currently, since the complete CAS is serialized and sent in the reply
message,
pre-existing FSs are ignored during deserialization.

With Delta CAS, its now possible to enforce the parallel step contract by
specifying the option to disallow pre-existing FS when deserializing.
Should we do this ?

Note that there could be mix of older and newer services called
in the parallel step. The older serivces will reply with the complete CAS.
In this
case, pre-existing FSs will be ignored as they are being done currently.
 *
Disabling Delta CAS reply*
It may be necessary to require a service to always serialize
and reply with the complete CAS. This may be required for debugging
or to work around Delta CAS limitations. To specify this, a parameter
is required in the deployment descriptor.

Thanks,
Bhavani

Re: Using Delta CAS serialization in UIMA-AS

Posted by Bhavani Iyer <bh...@gmail.com>.
On Thu, Aug 28, 2008 at 5:28 PM, Adam Lally <al...@alum.rpi.edu> wrote:

> On Thu, Aug 28, 2008 at 4:25 PM, Bhavani Iyer <bh...@gmail.com> wrote:
> >> How about this idea instead:  If you detect that an Array or List has
> >> been modified, then you will need to
> >> iterate through everything in the CAS in order to find what FS
> >> referred to that array or list.  Less efficient than
> >> the delta, for sure, but at least you're saving on the amount of data
> >> that you have to returned.  And this seems
> >> much better than requiring the user to set a special property without
> >> which the service would be broken.
> >>
> >
> >   The concern was that this would be an expensive operation.
> >
> >   Another idea  is to save the link between non-shared Array/List FS and
> > the
> >   encompassing FS in the XmiSerializationSharedData on deserialization
>  and
> >   when serializing in Delta CAS format use that map to find and enqueue
> the
> > encompassing FS.
> >
>
> Yes, that's much better than my suggestion.  Thanks!
>
> Does this mean we don't need a deployment descriptor element for
> disabling delta CAS?  It sounds that way to me.


   We don't need the descriptor element.  For debugging purposes,
   we could add  a system property to turn off Delta CAS.


>
>  -Adam
>

Re: Using Delta CAS serialization in UIMA-AS

Posted by Adam Lally <al...@alum.rpi.edu>.
On Thu, Aug 28, 2008 at 4:25 PM, Bhavani Iyer <bh...@gmail.com> wrote:
>> How about this idea instead:  If you detect that an Array or List has
>> been modified, then you will need to
>> iterate through everything in the CAS in order to find what FS
>> referred to that array or list.  Less efficient than
>> the delta, for sure, but at least you're saving on the amount of data
>> that you have to returned.  And this seems
>> much better than requiring the user to set a special property without
>> which the service would be broken.
>>
>
>   The concern was that this would be an expensive operation.
>
>   Another idea  is to save the link between non-shared Array/List FS and
> the
>   encompassing FS in the XmiSerializationSharedData on deserialization  and
>   when serializing in Delta CAS format use that map to find and enqueue the
> encompassing FS.
>

Yes, that's much better than my suggestion.  Thanks!

Does this mean we don't need a deployment descriptor element for
disabling delta CAS?  It sounds that way to me.

  -Adam

Re: Using Delta CAS serialization in UIMA-AS

Posted by Bhavani Iyer <bh...@gmail.com>.
On Thu, Aug 28, 2008 at 1:32 PM, Adam Lally <al...@alum.rpi.edu> wrote:

>
>
> >    Well, not sure I can come up with other scenarios but the current
> Delta
> > CAS implementation
> >     has a limitation with serializing updates to pre-existing Array/List
> > FSs that are only referenced as
> >     non-shared feature values of pre-existing FS and no other feature of
> > the rereferencing FS is modified.
> >     The referencing FS is not marked as modified. To do that would
> require
> > going back to find the referencing FS
> >     in the CAS.  (Any suggestions on how to find the referencing FS
> > efficiently ?)
> >
> >     So currently Delta CAS serialization will have missed the change to
> the
> > Array/List FS.
> >     An annotator that does this type of update will have to be deployed
> > with Delta CAS response disabled.
> >
>
> How about this idea instead:  If you detect that an Array or List has
> been modified, then you will need to
> iterate through everything in the CAS in order to find what FS
> referred to that array or list.  Less efficient than
> the delta, for sure, but at least you're saving on the amount of data
> that you have to returned.  And this seems
> much better than requiring the user to set a special property without
> which the service would be broken.
>

   The concern was that this would be an expensive operation.

   Another idea  is to save the link between non-shared Array/List FS and
the
   encompassing FS in the XmiSerializationSharedData on deserialization  and
   when serializing in Delta CAS format use that map to find and enqueue the
encompassing FS.

   I'll update the Delta CAS impl as above to fix this limitation.


>  -Adam
>

Re: Using Delta CAS serialization in UIMA-AS

Posted by Burn Lewis <bu...@gmail.com>.
On Thu, Aug 28, 2008 at 1:32 PM, Adam Lally <al...@alum.rpi.edu> wrote:

> On Thu, Aug 28, 2008 at 10:36 AM, Bhavani Iyer <bh...@gmail.com>
> wrote:
> > On Wed, Aug 27, 2008 at 6:41 PM, Adam Lally <al...@alum.rpi.edu> wrote:
> >> I agree that if we have the contract and can enforce it, we should.
> >> But alternatively, could we relax this parallel step contract when
> >> delta CAS is in use?  The prior implementation of parallel step had no
> >> good way of recognizing when a modification of a pre-existing FS had
> >> taken place.  But with delta CAS, now it does, so it could apply those
> >> changes back to the common CAS.  Of course, if two services modify the
> >> same feature in different ways, then we'd have a conflict.  We could
> >> detect this and throw an exception.  This would allow the use of
> >> updates of different parts of a CAS in a parallel step, and only fail
> >> when common data is modified.  So it allows more flexibility at the
> >> cost of giving the user more ways to get themselves into trouble.
> >> Maybe it's not worth it.
> >>
> >
> >     This is obviously not supported currently. The Delta CAS support
> >    could be modified to detect and disallow updates of same FS by
> >    more that one component but may complicate error handling and
> recovery.
> >
>
> Not necessary for the first release, certainly.


Not sure that this relaxed contract is useful ... the current
"no-modifications" one is simple and clear.


>
> >
> >    Well, not sure I can come up with other scenarios but the current
> Delta
> > CAS implementation
> >     has a limitation with serializing updates to pre-existing Array/List
> > FSs that are only referenced as
> >     non-shared feature values of pre-existing FS and no other feature of
> > the rereferencing FS is modified.
> >     The referencing FS is not marked as modified. To do that would
> require
> > going back to find the referencing FS
> >     in the CAS.  (Any suggestions on how to find the referencing FS
> > efficiently ?)
> >
> >     So currently Delta CAS serialization will have missed the change to
> the
> > Array/List FS.
> >     An annotator that does this type of update will have to be deployed
> > with Delta CAS response disabled.
> >
>
> How about this idea instead:  If you detect that an Array or List has
> been modified, then you will need to
> iterate through everything in the CAS in order to find what FS
> referred to that array or list.  Less efficient than
> the delta, for sure, but at least you're saving on the amount of data
> that you have to returned.  And this seems
> much better than requiring the user to set a special property without
> which the service would be broken.
>
>  -Adam
>
I think Bhavani has a way to associate these arrays with the FS they're in,
so can fix this cheaply.

- Burn.

Re: Using Delta CAS serialization in UIMA-AS

Posted by Adam Lally <al...@alum.rpi.edu>.
On Thu, Aug 28, 2008 at 10:36 AM, Bhavani Iyer <bh...@gmail.com> wrote:
> On Wed, Aug 27, 2008 at 6:41 PM, Adam Lally <al...@alum.rpi.edu> wrote:
>> I agree that if we have the contract and can enforce it, we should.
>> But alternatively, could we relax this parallel step contract when
>> delta CAS is in use?  The prior implementation of parallel step had no
>> good way of recognizing when a modification of a pre-existing FS had
>> taken place.  But with delta CAS, now it does, so it could apply those
>> changes back to the common CAS.  Of course, if two services modify the
>> same feature in different ways, then we'd have a conflict.  We could
>> detect this and throw an exception.  This would allow the use of
>> updates of different parts of a CAS in a parallel step, and only fail
>> when common data is modified.  So it allows more flexibility at the
>> cost of giving the user more ways to get themselves into trouble.
>> Maybe it's not worth it.
>>
>
>     This is obviously not supported currently. The Delta CAS support
>    could be modified to detect and disallow updates of same FS by
>    more that one component but may complicate error handling and recovery.
>

Not necessary for the first release, certainly.

>
>    Well, not sure I can come up with other scenarios but the current Delta
> CAS implementation
>     has a limitation with serializing updates to pre-existing Array/List
> FSs that are only referenced as
>     non-shared feature values of pre-existing FS and no other feature of
> the rereferencing FS is modified.
>     The referencing FS is not marked as modified. To do that would require
> going back to find the referencing FS
>     in the CAS.  (Any suggestions on how to find the referencing FS
> efficiently ?)
>
>     So currently Delta CAS serialization will have missed the change to the
> Array/List FS.
>     An annotator that does this type of update will have to be deployed
> with Delta CAS response disabled.
>

How about this idea instead:  If you detect that an Array or List has
been modified, then you will need to
iterate through everything in the CAS in order to find what FS
referred to that array or list.  Less efficient than
the delta, for sure, but at least you're saving on the amount of data
that you have to returned.  And this seems
much better than requiring the user to set a special property without
which the service would be broken.

  -Adam

Re: Using Delta CAS serialization in UIMA-AS

Posted by Bhavani Iyer <bh...@gmail.com>.
On Wed, Aug 27, 2008 at 6:41 PM, Adam Lally <al...@alum.rpi.edu> wrote:

> On Wed, Aug 27, 2008 at 4:47 PM, Marshall Schor <ms...@schor.com> wrote:
> > Bhavani Iyer wrote:
> >> Now that we have Delta CAS support, can we use it in UIMA-AS ?
> >> I'm planning to make the modifications to use Delta CAS as follows.
> >> Please add your comments/suggestions.
> >>
> >> *Additional Properties in the ProcessCAS message:*
> >>
> > For these new things, is there anything in the proposed OASIS spec which
> > we could be following?
>
> The OASIS spec just says that a service may reply with a delta, it
> doesn't provide for a client requesting whether it wants a delta or
> not.  I don't think it prohibits a service from having an additional
> variation of the ProcessCAS operation that takes additional arguments
> such as Bhavani has suggested.  Those arguments would just need to be
> optional so that a client that was unaware of them could still
> interact with the service.


    Yes the additional message properties are optional.

>
>
> >>  There will be an additional property, *AcceptsDeltaCas*, in the
> ProcessCAS
> >> request message sent by a UIMA-AS client (including a UIMA-AS aggregate
> that
> >> calls a remote delegate)  to a service specifying that the client
> accepts a
> >> Delta CAS in the reply. The boolean property will default to true.
> >>
> > Not sure what is meant by "default".  Can you elaborate?
> > If an older client (pre delta cas) connects to a newer service - this
> > probably should "work" (unless there are other things preventing it).
>
> I share Marshall's confusion.  From the service's perspective, I think
> the default for AcceptsDeltaCas is false.  If it is not set, the
> service won't reply with a delta, thus supporting older clients.
> Bhavani might have meant a different meaning for "default" - that a
> new client will always send a value of true unless otherwise
> configured (which I think is fine).


    That's exactly what I meant,

>> *Parallel Step handling*
> >> The parallel step contract requires that the remote delegates called in
> >> parallel only
> >> create new FSs.
> >>
> >> Currently, since the complete CAS is serialized and sent in the reply
> >> message,
> >> pre-existing FSs are ignored during deserialization.
> >>
> >> With Delta CAS, its now possible to enforce the parallel step contract
> by
> >> specifying the option to disallow pre-existing FS when deserializing.
> >> Should we do this ?
> >>
> > I think yes, because it helps produce results for users which are more
> > dependable.
>
> I agree that if we have the contract and can enforce it, we should.
> But alternatively, could we relax this parallel step contract when
> delta CAS is in use?  The prior implementation of parallel step had no
> good way of recognizing when a modification of a pre-existing FS had
> taken place.  But with delta CAS, now it does, so it could apply those
> changes back to the common CAS.  Of course, if two services modify the
> same feature in different ways, then we'd have a conflict.  We could
> detect this and throw an exception.  This would allow the use of
> updates of different parts of a CAS in a parallel step, and only fail
> when common data is modified.  So it allows more flexibility at the
> cost of giving the user more ways to get themselves into trouble.
> Maybe it's not worth it.
>

     This is obviously not supported currently. The Delta CAS support
    could be modified to detect and disallow updates of same FS by
    more that one component but may complicate error handling and recovery.


>
> >> Note that there could be mix of older and newer services called
> >> in the parallel step. The older serivces will reply with the complete
> CAS.
> >> In this
> >> case, pre-existing FSs will be ignored as they are being done currently.
> >>  *
> >> Disabling Delta CAS reply*
> >> It may be necessary to require a service to always serialize
> >> and reply with the complete CAS. This may be required for debugging
> >> or to work around Delta CAS limitations. To specify this, a parameter
> >> is required in the deployment descriptor.
> >>
> > Not sure the deployment descriptor is the best place for this.  We don't
> > put other "debugging" info here.  What other choices exist for
> > specifying this? JVM System property? others?
> >
> Bhavani, could you say more about in what scenarios you imagine
> someone would need to disable the delta cas response?  If it's really
> only for debugging, then I agree with Marshall - make it a system
> property if you need it.  If there might actually be a time when delta
> CAS needs to be turned off in a production system, then it belongs in
> the deployment descriptor.
>

    Well, not sure I can come up with other scenarios but the current Delta
CAS implementation
     has a limitation with serializing updates to pre-existing Array/List
FSs that are only referenced as
     non-shared feature values of pre-existing FS and no other feature of
the rereferencing FS is modified.
     The referencing FS is not marked as modified. To do that would require
going back to find the referencing FS
     in the CAS.  (Any suggestions on how to find the referencing FS
efficiently ?)

     So currently Delta CAS serialization will have missed the change to the
Array/List FS.
     An annotator that does this type of update will have to be deployed
with Delta CAS response disabled.

     I guess this could be a system property.




>
>  -Adam
>

Re: Using Delta CAS serialization in UIMA-AS

Posted by Adam Lally <al...@alum.rpi.edu>.
On Wed, Aug 27, 2008 at 4:47 PM, Marshall Schor <ms...@schor.com> wrote:
> Bhavani Iyer wrote:
>> Now that we have Delta CAS support, can we use it in UIMA-AS ?
>> I'm planning to make the modifications to use Delta CAS as follows.
>> Please add your comments/suggestions.
>>
>> *Additional Properties in the ProcessCAS message:*
>>
> For these new things, is there anything in the proposed OASIS spec which
> we could be following?

The OASIS spec just says that a service may reply with a delta, it
doesn't provide for a client requesting whether it wants a delta or
not.  I don't think it prohibits a service from having an additional
variation of the ProcessCAS operation that takes additional arguments
such as Bhavani has suggested.  Those arguments would just need to be
optional so that a client that was unaware of them could still
interact with the service.

>>  There will be an additional property, *AcceptsDeltaCas*, in the ProcessCAS
>> request message sent by a UIMA-AS client (including a UIMA-AS aggregate that
>> calls a remote delegate)  to a service specifying that the client accepts a
>> Delta CAS in the reply. The boolean property will default to true.
>>
> Not sure what is meant by "default".  Can you elaborate?
> If an older client (pre delta cas) connects to a newer service - this
> probably should "work" (unless there are other things preventing it).

I share Marshall's confusion.  From the service's perspective, I think
the default for AcceptsDeltaCas is false.  If it is not set, the
service won't reply with a delta, thus supporting older clients.
Bhavani might have meant a different meaning for "default" - that a
new client will always send a value of true unless otherwise
configured (which I think is fine).

>> *Parallel Step handling*
>> The parallel step contract requires that the remote delegates called in
>> parallel only
>> create new FSs.
>>
>> Currently, since the complete CAS is serialized and sent in the reply
>> message,
>> pre-existing FSs are ignored during deserialization.
>>
>> With Delta CAS, its now possible to enforce the parallel step contract by
>> specifying the option to disallow pre-existing FS when deserializing.
>> Should we do this ?
>>
> I think yes, because it helps produce results for users which are more
> dependable.

I agree that if we have the contract and can enforce it, we should.
But alternatively, could we relax this parallel step contract when
delta CAS is in use?  The prior implementation of parallel step had no
good way of recognizing when a modification of a pre-existing FS had
taken place.  But with delta CAS, now it does, so it could apply those
changes back to the common CAS.  Of course, if two services modify the
same feature in different ways, then we'd have a conflict.  We could
detect this and throw an exception.  This would allow the use of
updates of different parts of a CAS in a parallel step, and only fail
when common data is modified.  So it allows more flexibility at the
cost of giving the user more ways to get themselves into trouble.
Maybe it's not worth it.

>> Note that there could be mix of older and newer services called
>> in the parallel step. The older serivces will reply with the complete CAS.
>> In this
>> case, pre-existing FSs will be ignored as they are being done currently.
>>  *
>> Disabling Delta CAS reply*
>> It may be necessary to require a service to always serialize
>> and reply with the complete CAS. This may be required for debugging
>> or to work around Delta CAS limitations. To specify this, a parameter
>> is required in the deployment descriptor.
>>
> Not sure the deployment descriptor is the best place for this.  We don't
> put other "debugging" info here.  What other choices exist for
> specifying this? JVM System property? others?
>
Bhavani, could you say more about in what scenarios you imagine
someone would need to disable the delta cas response?  If it's really
only for debugging, then I agree with Marshall - make it a system
property if you need it.  If there might actually be a time when delta
CAS needs to be turned off in a production system, then it belongs in
the deployment descriptor.

  -Adam

Re: Using Delta CAS serialization in UIMA-AS

Posted by Bhavani Iyer <bh...@gmail.com>.
On Wed, Aug 27, 2008 at 4:47 PM, Marshall Schor <ms...@schor.com> wrote:

>
>
> Bhavani Iyer wrote:
> > Now that we have Delta CAS support, can we use it in UIMA-AS ?
> > I'm planning to make the modifications to use Delta CAS as follows.
> > Please add your comments/suggestions.
> >
> > *Additional Properties in the ProcessCAS message:*
> >  There will be an additional property, *AcceptsDeltaCas*, in the
> ProcessCAS
> > request message sent by a UIMA-AS client (including a UIMA-AS aggregate
> that
> > calls a remote delegate)  to a service specifying that the client accepts
> a
> > Delta CAS in the reply. The boolean property will default to true.
> >
>
Not sure what is meant by "default".  Can you elaborate?
> If an older client (pre delta cas) connects to a newer service - this
> probably should "work" (unless there are other things preventing it).


   Newer clients will by default have the property, AcceptsDeltaCAS, set to
'true' in the request message.
   This default setting implies newer services will by default reply to
newer clients with a Delta CAS .

   If we want to support newer  clients requiring a complete CAS in the
reply,
   the value of this property would have to be set to false or not set at
all .
   If we want to allow this, we would have to provide a mechanism to specify
this.

   Newer services will check the property AcceptsDeltaCAS  and if available
and  set to true will
   send a reply message containing a  Delta CAS and with the new property
SentDeltaCas set to true.
   A newer service may ignore this property and reply with a complete CAS if
the service has been
   configured to always reply with a completeCAS (see Disablisng Delta CAS
below).  In this case the
   property SentDeltaCAS will be set to false (or we can omit it in the
reply message).

  Newer clients will check the property, SentDeltaCAS.  A DeltaCAS
deserialization is done if this
   property is available and set to true in the reply message.


> *Parallel Step handling*>
> > With Delta CAS, its now possible to enforce the parallel step contract by
> > specifying the option to disallow pre-existing FS when deserializing.
> > Should we do this ?
> >
> I think yes, because it helps produce results for users which are more
> dependable.


    OK.

>
>
>
>
> >
> >
>

Re: Using Delta CAS serialization in UIMA-AS

Posted by Marshall Schor <ms...@schor.com>.

Bhavani Iyer wrote:
> Now that we have Delta CAS support, can we use it in UIMA-AS ?
> I'm planning to make the modifications to use Delta CAS as follows.
> Please add your comments/suggestions.
>
> *Additional Properties in the ProcessCAS message:*
>   
For these new things, is there anything in the proposed OASIS spec which
we could be following?
>  There will be an additional property, *AcceptsDeltaCas*, in the ProcessCAS
> request message sent by a UIMA-AS client (including a UIMA-AS aggregate that
> calls a remote delegate)  to a service specifying that the client accepts a
> Delta CAS in the reply. The boolean property will default to true.
>   
Not sure what is meant by "default".  Can you elaborate?  
If an older client (pre delta cas) connects to a newer service - this
probably should "work" (unless there are other things preventing it). 
>  A new boolean property in the ProcessCAS reply message, *SentDeltaCas*,  is
> required which indicates that the service sent a Delta CAS in the reply.
>
>  These two properties will enable older UIMA-AS clients and services to work
> with  newer client and service.
>
> *Parallel Step handling*
> The parallel step contract requires that the remote delegates called in
> parallel only
> create new FSs.
>
> Currently, since the complete CAS is serialized and sent in the reply
> message,
> pre-existing FSs are ignored during deserialization.
>
> With Delta CAS, its now possible to enforce the parallel step contract by
> specifying the option to disallow pre-existing FS when deserializing.
> Should we do this ?
>   
I think yes, because it helps produce results for users which are more
dependable.
> Note that there could be mix of older and newer services called
> in the parallel step. The older serivces will reply with the complete CAS.
> In this
> case, pre-existing FSs will be ignored as they are being done currently.
>  *
> Disabling Delta CAS reply*
> It may be necessary to require a service to always serialize
> and reply with the complete CAS. This may be required for debugging
> or to work around Delta CAS limitations. To specify this, a parameter
> is required in the deployment descriptor.
>   
Not sure the deployment descriptor is the best place for this.  We don't
put other "debugging" info here.  What other choices exist for
specifying this? JVM System property? others?

Thanks for making progress here :-)  -Marshall
> Thanks,
> Bhavani
>
>