You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by Thilo Goetz <tw...@gmx.de> on 2007/05/25 11:12:53 UTC

Singleton FSs, again

I would like to revive the discussion that started with
http://www.mail-archive.com/uima-dev@incubator.apache.org/msg01299.html

Let me recap the current status of the proposal.  The purpose is
for UIMA users/annotators to be able to access specific FSs by some
identifier.  This can already be done "manually" with the tools available
in the CAS, but it is cumbersome and not standardized.  There are users
who positively need this capability for 2.2, and I'm wondering how to 
best provide it to them.  The best solution for me of course would be if
I could build it into the core.

Technically, the proposal consists of a new built-in type and new built-in
index as follows.

- a type uima.cas.FsVariable that inherits from uima.cas.TOP with features
  name:String, type:String and value:TOP.

- a built-in set index over FsVariable, sorted by the name feature.

The APIs to define and access these critters would look as follows.

// Declare a new global variable/singleton FS
declareFsVariable(String name, Type type)

// Check if a variable of that name exists
isFsVariable(String name):boolean

// Get the value of variable
getFsVariableType(String name):Type

// Get all variables of a given type
listFsVariables(Type type):List

There are several ways I can implement this.

1. Make the necessary type and index built-in to UIMA.
  a) Define the APIs directly on the CAS.
  b) Define the APIs external to the CAS, having to pass the CAS as
     additional argument.

2. Provide type and index definition XML for users to include in their
   descriptors and an external library that implements the APIs.  This
   could go in the sandbox.

Obviously, I think that 1a is the way to go.  As I have to do something
in the 2.2 time frame, I would appreciate your input within the next few
days.

--Thilo

Re: Singleton FSs, again

Posted by Thilo Goetz <tw...@gmx.de>.
Adam Lally wrote:
>> That approach is too brittle for my taste.  An annotator writer would
>> declare a type that is meant to be a singleton, but there's no way to
>> enforce this.  One careless annotator that creates a second instance of
>> such a type, and the whole analysis chain stops working.  With my
>> approach,
>> at least the bar is a bit higher.
>>
> 
> Well, with FsVariables, one careless annotator can still set the
> variable to a new value and break downstream annotators.  There can
> even be name conflicts with two annotators trying to use the same
> variable name for different things.
> 
> It's OK with me to not implement my suggestion because it encourages
> annotator developers to rely on undeclared assumptions (that only a
> single instance of a type exists) - basically the same criticism I had
> of the FsVariable proposal.  In that case let's leave things the way
> they are.  I don't think this is a pressing problem that needs to be
> addressed.
> 
> -Adam

Right, it is important to me, though.  I'll put it in the sandbox as an
external tool that people can optionally use.  It won't be pretty, as
annotators that want to use the facility need to declare the necessary
type and index, but I guess that's ok.  We'll see if people want this
or not.

--Thilo


Re: Singleton FSs, again

Posted by Adam Lally <al...@alum.rpi.edu>.
> That approach is too brittle for my taste.  An annotator writer would
> declare a type that is meant to be a singleton, but there's no way to
> enforce this.  One careless annotator that creates a second instance of
> such a type, and the whole analysis chain stops working.  With my approach,
> at least the bar is a bit higher.
>

Well, with FsVariables, one careless annotator can still set the
variable to a new value and break downstream annotators.  There can
even be name conflicts with two annotators trying to use the same
variable name for different things.

It's OK with me to not implement my suggestion because it encourages
annotator developers to rely on undeclared assumptions (that only a
single instance of a type exists) - basically the same criticism I had
of the FsVariable proposal.  In that case let's leave things the way
they are.  I don't think this is a pressing problem that needs to be
addressed.

-Adam

Re: Singleton FSs, again

Posted by Thilo Goetz <tw...@gmx.de>.
Adam Lally wrote:
> On 5/25/07, Thilo Goetz <tw...@gmx.de> wrote:
>> I would like to revive the discussion that started with
>> http://www.mail-archive.com/uima-dev@incubator.apache.org/msg01299.html
>>
> 
> I still have the same concern about this that I posted to the previous
> thread:
> 
>    I think Michael is onto the same point that concerns me.  To use this
>    feature, components have to agree on what variable names they are
>    going to use.  So we're creating another kind of dependency that I
>    believe should be documented in the capabilities.  Sure, people could
>    build this themselves already, but if we make it built-in then we're
>    strongly encouraging its use and should consider all the implications.
>    If we had more expressive capability spec (so you could say I
>    create/require an instance of type FsVariable with name="Foo") then
>    that might be a way to go.
> 
> Thilo replied:
>> I guess we could do that in addition. How would you imagine this would
>> work?
> 
>> I'm mainly addressing a practical problem here, and want to give
> people a viable
>> alternative to modifying the DocumentAnnotation. I think this
> approach is fairly
>> forward-compatible as well, in the sense that it can later be
> strengthened with
>> descriptor-based integrity constraints.
> 
> 
> Actually I'm not happy about making the capabilities more complicated
> to handle this.  I'm not sure the benefits of global variables
> outweigh either (a) making the capabilities more complicated or (b)
> adding/encouraging another set of implicit agreements between
> annotators that aren't declared anywhere.
> 
> Let's go back and think about the DocumentAnnotation use case.  Users
> can already declare their own document metadata type and add it to the
> indexes.  Now that we have default bag indexes this is easy to do even
> if their document metadata type does not extend annotation.
> 
> I think this is most of the way towards addressing the issue.  What
> remains are (a) providing convenient access to a single indexed
> object, without going through an iterator, and (b) enforcing that
> there is only ever a singleton instance of a particular type.
> 
> Another suggestion for addressing these issues:
> void CAS.indexSingleton(FeatureStructure aFS) throws CASException
> FeatureStructure CAS.getSingleton(Type aType) throws CASException
> 
> The former is defined to throw an exception if the index over
> aFS.getType() is non-empty (for this view - we can have a separate
> "singleton" for each index repository - I think that is what we want
> for DocumentAnnotation), and otherwise to add aFS to the indexes.
> 
> The latter is defined to throw an exception if there is not exactly
> one instance of aType in the indexes for this view, and otherwise to
> return the one instance.
> 
> 
> 
> I like this better since it doesn't introduce yet another "name space"
> that annotators have to agree on amongst each other.

That approach is too brittle for my taste.  An annotator writer would
declare a type that is meant to be a singleton, but there's no way to
enforce this.  One careless annotator that creates a second instance of
such a type, and the whole analysis chain stops working.  With my approach,
at least the bar is a bit higher.

--Thilo

> 
> -Adam


Re: Singleton FSs, again

Posted by Adam Lally <al...@alum.rpi.edu>.
On 5/25/07, Thilo Goetz <tw...@gmx.de> wrote:
> I would like to revive the discussion that started with
> http://www.mail-archive.com/uima-dev@incubator.apache.org/msg01299.html
>

I still have the same concern about this that I posted to the previous thread:

    I think Michael is onto the same point that concerns me.  To use this
    feature, components have to agree on what variable names they are
    going to use.  So we're creating another kind of dependency that I
    believe should be documented in the capabilities.  Sure, people could
    build this themselves already, but if we make it built-in then we're
    strongly encouraging its use and should consider all the implications.
    If we had more expressive capability spec (so you could say I
    create/require an instance of type FsVariable with name="Foo") then
    that might be a way to go.

Thilo replied:
>I guess we could do that in addition. How would you imagine this would work?

>I'm mainly addressing a practical problem here, and want to give
people a viable
>alternative to modifying the DocumentAnnotation. I think this
approach is fairly
>forward-compatible as well, in the sense that it can later be
strengthened with
>descriptor-based integrity constraints.


Actually I'm not happy about making the capabilities more complicated
to handle this.  I'm not sure the benefits of global variables
outweigh either (a) making the capabilities more complicated or (b)
adding/encouraging another set of implicit agreements between
annotators that aren't declared anywhere.

Let's go back and think about the DocumentAnnotation use case.  Users
can already declare their own document metadata type and add it to the
indexes.  Now that we have default bag indexes this is easy to do even
if their document metadata type does not extend annotation.

I think this is most of the way towards addressing the issue.  What
remains are (a) providing convenient access to a single indexed
object, without going through an iterator, and (b) enforcing that
there is only ever a singleton instance of a particular type.

Another suggestion for addressing these issues:
void CAS.indexSingleton(FeatureStructure aFS) throws CASException
FeatureStructure CAS.getSingleton(Type aType) throws CASException

The former is defined to throw an exception if the index over
aFS.getType() is non-empty (for this view - we can have a separate
"singleton" for each index repository - I think that is what we want
for DocumentAnnotation), and otherwise to add aFS to the indexes.

The latter is defined to throw an exception if there is not exactly
one instance of aType in the indexes for this view, and otherwise to
return the one instance.



I like this better since it doesn't introduce yet another "name space"
that annotators have to agree on amongst each other.

-Adam

Re: Singleton FSs, again

Posted by Thilo Goetz <tw...@gmx.de>.
Eddie Epstein wrote:
> On 5/25/07, Thilo Goetz <tw...@gmx.de> wrote:
>> Technically, the proposal consists of a new built-in type and new 
>> built-in
>> index as follows.
>>
>> - a type uima.cas.FsVariable that inherits from uima.cas.TOP with 
>> features
>>   name:String, type:String and value:TOP.
> 
> The feature called "value", as type TOP, can only hold a reference to
> another FS. So, it is not possible to create an FsVariable with a
> double valued feature, or with an Integer feature, etc. As you say, it
> is an "FS variable".
> 
> I'm still a little fuzzy about the scope of what can be done with this
> proposal. Looking back at the previous discussion, Adam said:
> 
>> I think one use case is the "singleton" use case.  You could define a
>> "global variable" called myapp.documentMetadata and set its value to
>> an instance of myTypeSystem.DocumentMetadata.  Then all your
>> annotators could access it by getting the value of this global
>> variable.
> 
> The application has to create a custom type,
> myTypeSystem.DocumentMetadata, which is good because it is documented
> in the descriptors. So the FsVariable is a mechanism to get to the
> single instance of a custom type.
> 
> I admit that the alternative of creating a custom set index for a type
> is a bit much for most users.

Yes, that is *the* use case.  People want to create document metadata.  They
often do this by adding new features to DocumentAnnotation, which leads to
problems, mainly in conjunction with the JCas (because everybody creates a
different cover class for the DocumentAnnotation, but only one of those cover
classes is actually loaded; this problem will not go away until we have a
separate class loader for each annotator.  We're one step closer to at least
being able to have that with Marshall's ongoing work on JCas class loading)

> 
> Jorn said:
>> Imagine an Annotator which is a spam filter, it has to
>> put a tag to the CAS which say spam or no_spam.
>>
>> The document language is also an example for a global variable.
> 
> These examples would not be covered because the FsVariable can only
> point to another FS, not hold an arbitrary value; correct?
> 

True, but Joern didn't really say he was talking about string values.
You could also imagine having a DocumentLanguage FS that holds one or
more string valued features.

>> - a built-in set index over FsVariable, sorted by the name feature.
>>
>> The APIs to define and access these critters would look as follows.
>>
>> // Declare a new global variable/singleton FS
>> declareFsVariable(String name, Type type)
> 
> What happens when a variable is declared?

First we check if a FSVariable object with the given name already exists
in the FSVariable index.  If it does, we throw an exception.  If not, we
create a new FSVariable with the name feature set to the name parameter,
the type feature set to type.getName() and value set to null.  This we
put in the index.

> 
>>
>> // Check if a variable of that name exists
>> isFsVariable(String name):boolean
>>
>> // Get the value of variable
>> getFsVariableType(String name):Type
> 
> This just returns the String value for Type, yes?

Not sure what you mean.  It checks if a variable with name exists, and
if yes, returns the type object corresponding to the type feature of the
FSVariable.  If no such variable exists, either return null or throw an
exception (I'd favor the latter in this case).

> 
>>
>> // Get all variables of a given type
>> listFsVariables(Type type):List
> 
> What exactly is the List returned?

A list of Strings, containing all names of FSVariables declared for 
the input type.

And finally, looks like I missed a couple of pretty crucial methods.

// Retrieve a certain FSVariable value.  May return null.
getVariableValue(String name):FeatureStructure 

// Set a variable value.
setVariable(String name, FeatureStructure fs):void

Those also would throw an exception if the variable did not exist, or, in
the latter case, was of the wrong type.

> 
> Thanks,
> Eddie


Re: Singleton FSs, again

Posted by Eddie Epstein <ea...@gmail.com>.
On 5/25/07, Thilo Goetz <tw...@gmx.de> wrote:
> Technically, the proposal consists of a new built-in type and new built-in
> index as follows.
>
> - a type uima.cas.FsVariable that inherits from uima.cas.TOP with features
>   name:String, type:String and value:TOP.

The feature called "value", as type TOP, can only hold a reference to
another FS. So, it is not possible to create an FsVariable with a
double valued feature, or with an Integer feature, etc. As you say, it
is an "FS variable".

I'm still a little fuzzy about the scope of what can be done with this
proposal. Looking back at the previous discussion, Adam said:

>I think one use case is the "singleton" use case.  You could define a
>"global variable" called myapp.documentMetadata and set its value to
>an instance of myTypeSystem.DocumentMetadata.  Then all your
>annotators could access it by getting the value of this global
>variable.

The application has to create a custom type,
myTypeSystem.DocumentMetadata, which is good because it is documented
in the descriptors. So the FsVariable is a mechanism to get to the
single instance of a custom type.

I admit that the alternative of creating a custom set index for a type
is a bit much for most users.

Jorn said:
>Imagine an Annotator which is a spam filter, it has to
>put a tag to the CAS which say spam or no_spam.
>
>The document language is also an example for a global variable.

These examples would not be covered because the FsVariable can only
point to another FS, not hold an arbitrary value; correct?

> - a built-in set index over FsVariable, sorted by the name feature.
>
> The APIs to define and access these critters would look as follows.
>
> // Declare a new global variable/singleton FS
> declareFsVariable(String name, Type type)

What happens when a variable is declared?

>
> // Check if a variable of that name exists
> isFsVariable(String name):boolean
>
> // Get the value of variable
> getFsVariableType(String name):Type

This just returns the String value for Type, yes?

>
> // Get all variables of a given type
> listFsVariables(Type type):List

What exactly is the List returned?

Thanks,
Eddie