You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by Marshall Schor <ms...@schor.com> on 2008/01/21 22:13:10 UTC

Re: capabilityLangugaeFlow - computeResultSpec - Question on ResultSpecification

I'm doing a redesign for the result spec area to improve performance.

The basic idea is to put a hasBeenChanged flag into the result spec 
object, and use it being "false" to enable users to avoid recomputing 
things.
Why not use "equal" ? because a single result spec object is shared 
among multiple users, and when updated, the object is updated in place 
(so there is no other object to compare it to).
Looking at the ResultSpec object - it has a hashMap that stores the 
Types and Features (TypeOrFeature objects) as the keys; the values are 
hashSets holding languages for which these types and features are in the 
result spec.  (There is a special hash set having just the entry of the 
default language = UNSPECIFIED_LANGUAGE = "x-unspecified"). 

I'm going to try and make the default language hash set a constant, and 
create just one instance of it - this should improve performance, 
especially when languages are not being used.

There are 2 kinds of methods to add types/features to a result spec:  
ones with language(s) and ones without. 

    The ones without reset any language spec associated with the type or
    feature(s) to the UNSPECIFIED_LANGUAGE.

    The ones with a language, sometimes "replace"  the language
    associated with the type/feature, and other times, they "add" the
    language (assuming the type/feature is already an entry in the
    hashMap of types and features).

    methods which are replacing any existing languages:

        setResultTypesAndFeatures[array of TypeOrFeature)   << repl with
        x-unspecified language
        setResultTypesAndFeatures[array of TypeOrFeature, languages)  <<
        repl with languages
        addResultTypeOrFeature(1-TypeOrFeature)                << repl
        with x-unspecified language
        addResultTypeOrFeature(1-TypeOrFeature, languages) << repl with
        languages
        addResultType(String, boolean)    << repl with x-unspecified
        language
        addResultFeature(1-feature, languages)   << repl with
        languagesx-unspecified

    methods which are adding to existing languages:

        addResultType(1-type, boolean, languages)  adds languages
        addResultFeature(1-feature)  << adds x-unspecified

The "set..." method essentially clears the result spec and sets it with 
completely new information, so it is reasonable that it replaces any 
existing language information.

The addResult methods, when used to add a type or feature which already 
present, are inconsistent - with one method adding, and the others, 
replacing. This behavior is documented in the JavaDocs for the class.

The JavaDocs have the behavior for adding a Feature by name reversed 
with the behavior for adding a Type by name.  In one case, including the 
language is treated as a replace, in the other as an add.  This seems 
likely a bug in the Javadocs. The code for the addResultFeature is 
reversed from the Javadocs: the code will "add" languages if specified, 
but "replaces" (with the x-unspecified) if languages are not specified 
in the method call.

Does anyone know what the "correct" behavior of these methods is 
supposed to be?

-Marshall



 

Re: capabilityLangugaeFlow - computeResultSpec - Question on ResultSpecification

Posted by Marshall Schor <ms...@schor.com>.
I'll fix the Javadocs to correspond to what the code does.  This will 
have the result that
   addResultFeature(1-feature, languages) will *add* to the existing 
languages, while
   addResultFeature(1-feature) will *replace* all existing languages 
with x-unspecified.

-Marshall


Marshall Schor wrote:
> I'm doing a redesign for the result spec area to improve performance.
>
> The basic idea is to put a hasBeenChanged flag into the result spec 
> object, and use it being "false" to enable users to avoid recomputing 
> things.
> Why not use "equal" ? because a single result spec object is shared 
> among multiple users, and when updated, the object is updated in place 
> (so there is no other object to compare it to).
> Looking at the ResultSpec object - it has a hashMap that stores the 
> Types and Features (TypeOrFeature objects) as the keys; the values are 
> hashSets holding languages for which these types and features are in 
> the result spec.  (There is a special hash set having just the entry 
> of the default language = UNSPECIFIED_LANGUAGE = "x-unspecified").
> I'm going to try and make the default language hash set a constant, 
> and create just one instance of it - this should improve performance, 
> especially when languages are not being used.
>
> There are 2 kinds of methods to add types/features to a result spec:  
> ones with language(s) and ones without.
>    The ones without reset any language spec associated with the type or
>    feature(s) to the UNSPECIFIED_LANGUAGE.
>
>    The ones with a language, sometimes "replace"  the language
>    associated with the type/feature, and other times, they "add" the
>    language (assuming the type/feature is already an entry in the
>    hashMap of types and features).
>
>    methods which are replacing any existing languages:
>
>        setResultTypesAndFeatures[array of TypeOrFeature)   << repl with
>        x-unspecified language
>        setResultTypesAndFeatures[array of TypeOrFeature, languages)  <<
>        repl with languages
>        addResultTypeOrFeature(1-TypeOrFeature)                << repl
>        with x-unspecified language
>        addResultTypeOrFeature(1-TypeOrFeature, languages) << repl with
>        languages
>        addResultType(String, boolean)    << repl with x-unspecified
>        language
>        addResultFeature(1-feature, languages)   << repl with
>        languagesx-unspecified
>
>    methods which are adding to existing languages:
>
>        addResultType(1-type, boolean, languages)  adds languages
>        addResultFeature(1-feature)  << adds x-unspecified
>
> The "set..." method essentially clears the result spec and sets it 
> with completely new information, so it is reasonable that it replaces 
> any existing language information.
>
> The addResult methods, when used to add a type or feature which 
> already present, are inconsistent - with one method adding, and the 
> others, replacing. This behavior is documented in the JavaDocs for the 
> class.
>
> The JavaDocs have the behavior for adding a Feature by name reversed 
> with the behavior for adding a Type by name.  In one case, including 
> the language is treated as a replace, in the other as an add.  This 
> seems likely a bug in the Javadocs. The code for the addResultFeature 
> is reversed from the Javadocs: the code will "add" languages if 
> specified, but "replaces" (with the x-unspecified) if languages are 
> not specified in the method call.
>
> Does anyone know what the "correct" behavior of these methods is 
> supposed to be?
>
> -Marshall
>
>
>
>
>
>