You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Mario Juric <ma...@cactusglobal.com> on 2020/11/02 21:16:24 UTC

Odd select coveredBy behaviour

Hi,

I am migrating some code to the new UIMA v3 select API, and I am seeing some odd behaviour. My reference implementation is the good old JCasUtil.selectCovered, which I am trying to replace first, and I thought the following line should do it:

jCas.select(annotationType).coveredBy(annotation)

This works fine as long annotation is of annotationType, but I am seeing some strange different output when annotation is of a different Annotation subtype. More specifically I have a unit test (see bottom) where annotationType is the Annotation class and annotation is an instance of some direct subtype of Annotation, which was added to the CAS index prior to the call. In this case all annotations that have the exact same bounds as annotation are not selected, only those that are completely enclosed get selected (begin > annotation.getBegin() and end < annotation.getEnd()). The JCasUtil includes the missing annotations.

None of the available select configurations seem to address this, and superficially stepping through the code didn’t help me much, since it’s not trivial to get into the details of the underlying API, so I thought that I maybe get a faster answer here.

Cheers
Mario


@Test
public void verify_selectCovered() throws CASException, ResourceInitializationException {
    JCas jCas = JCasFactory.createJCas();
    Annotation[] fixture = new Annotation[] {
            new Annotation(jCas, 5, 10),
            new Annotation(jCas, 5, 15),
            new Annotation(jCas, 0, 10),
            new Annotation(jCas, 0, 15),
            new Annotation(jCas, 5, 7),
            new Annotation(jCas, 8, 10),
            new Annotation(jCas, 6, 9),
            new Annotation(jCas, 5, 10)
    };
    Stream.of(fixture).forEach(Annotation::addToIndexes);

    assertEquals(4, JCasUtil.selectCovered(jCas, Annotation.class, fixture[0]).size());

    List<Annotation> selection1 = jCas.select(Annotation.class)
            .coveredBy(fixture[0])
            .collect(Collectors.toList());

    assertEquals(4, selection1.size());

    SubType subType = new SubType(jCas, 5, 10);
    subType.addToIndexes();

    assertEquals(5, JCasUtil.selectCovered(jCas, Annotation.class, subType).size());

    List<Annotation> selection2 = jCas.select(Annotation.class)
            .coveredBy(subType)
            .collect(Collectors.toList());

    assertEquals(5, selection2.size()); // Fails!
}


________________________________
Disclaimer:
This email and any files transmitted with it are confidential and directed solely for the use of the intended addressee or addressees and may contain information that is legally privileged, confidential, and exempt from disclosure. If you have received this email in error, please notify the sender by telephone, fax, or return email and immediately delete this email and any files transmitted along with it. Unintended recipients are not authorized to disclose, disseminate, distribute, copy or take any action in reliance on information contained in this email and/or any files attached thereto, in any manner other than to notify the sender; any unauthorized use is subject to legal prosecution.


Re: Odd select coveredBy behaviour

Posted by Mario Juric <ma...@cactusglobal.com>.
Hi,

I tried enabling type priorities, but not surprisingly it didn’t make any difference when I also haven’t defined type priorities. If its true that the behaviour for coveredBy is that undefined without type priorities, so that it doesn’t include annotations with the same bounds, then it isn’t useful in our case, since we do not intent to specify type priorities, unless there is a use case for us.

Cheers
Mario

> On 3 Nov 2020, at 12.14, Richard Eckart de Castilho <re...@apache.org> wrote:
>
> External email – Do not click links or open attachments unless you recognize the sender and know that the content is safe.
>
>
> On 3. Nov 2020, at 12:10, Raffaella Ventaglio <ra...@celi.it> wrote:
>>
>> Hi, unfortunately I can't try your example at the moment, but if `/JCasUtil.selectCovered/` behavior is similar to `AnnotationIndex.subiterator` behavior, than having an arbitrary ordering between your annotation types could impact the definition of "covered by".
>
> uimaFIT's JCasUtil.selectCovered ignores type priorities. It only takes into account the offsets.
>
> Similarly, the UIMAv3 SelectFS by default does ignore type priorities (same behavior as uimaFIT), but there is an option to enable type prios (jcas.select(...).typePriority()...).
>
> -- Richard


________________________________
Disclaimer:
This email and any files transmitted with it are confidential and directed solely for the use of the intended addressee or addressees and may contain information that is legally privileged, confidential, and exempt from disclosure. If you have received this email in error, please notify the sender by telephone, fax, or return email and immediately delete this email and any files transmitted along with it. Unintended recipients are not authorized to disclose, disseminate, distribute, copy or take any action in reliance on information contained in this email and/or any files attached thereto, in any manner other than to notify the sender; any unauthorized use is subject to legal prosecution.


Re: Odd select coveredBy behaviour

Posted by Richard Eckart de Castilho <re...@apache.org>.
On 3. Nov 2020, at 12:10, Raffaella Ventaglio <ra...@celi.it> wrote:
> 
> Hi, unfortunately I can't try your example at the moment, but if `/JCasUtil.selectCovered/` behavior is similar to `AnnotationIndex.subiterator` behavior, than having an arbitrary ordering between your annotation types could impact the definition of "covered by".

uimaFIT's JCasUtil.selectCovered ignores type priorities. It only takes into account the offsets.

Similarly, the UIMAv3 SelectFS by default does ignore type priorities (same behavior as uimaFIT), but there is an option to enable type prios (jcas.select(...).typePriority()...).

-- Richard

Re: Odd select coveredBy behaviour

Posted by Raffaella Ventaglio <ra...@celi.it>.
On 03/11/20 11:19, Mario Juric wrote:
> Thanks Rafaella.
>
> We haven’t had any need for type priorities yet, so we don’t use this feature at all. I am not sure, how the problem I am describing, where these annotations are not included in the selection, can be caused by arbitrary ordering?

Hi, unfortunately I can't try your example at the moment, but if 
`/JCasUtil.selectCovered/` behavior is similar to 
`AnnotationIndex.subiterator` behavior, than having an arbitrary 
ordering between your annotation types could impact the definition of 
"covered by".

I am referring to this passage in the /subiterator/ JavaDoc:

For annotations x, y,|x < y|here is to be interpreted as "x comes before 
y in the index", according to the rules defined in the description 
of|this class| 
<https://uima.apache.org/d/uimaj-current/apidocs/org/apache/uima/cas/text/AnnotationIndex.html>.

This definition implies that annotations|b|that have the same span 
as|annot|may or may not be returned by the subiterator. This is 
determined by the type priorities; the subiterator will only return such 
an annotation|b|if the type of|annot|precedes the type of|b|in the type 
priorities definition. If you have not specified the priority, or 
if|annot|and|b|are of the same type, then the behavior is undefined.


Bye,

/Raf/



>
> Cheers
> Mario
>
>> On 3 Nov 2020, at 09.49, Raffaella Ventaglio <ra...@celi.it> wrote:
>>
>> External email – Do not click links or open attachments unless you recognize the sender and know that the content is safe.
>>
>>
>> Hi Mario,
>> Have you defined the TypePriority[0] for your /SubType/ Annotation?
>>
>> As per the /AnnotationIndex/ documentation[1] this property impacts the
>> ordering of different annotation types with an equal span coverage:
>>
>> * Annotations whose start offsets are equal and whose end offsets are
>>    equal are sorted based on|TypePriorities|
>>    <https://uima.apache.org/d/uimaj-current/apidocs/org/apache/uima/resource/metadata/TypePriorities.html>if
>>    type priorities are specified. Type Priorities specification is an
>>    optional element of the component descriptor). When type priorities
>>    are in use, if|a.start = b.start|,|a.end = b.end|, and the type
>>    of|a|is defined before the type of|b|in the type priorities, then|a
>>    < b|.
>> * If none of the above rules apply, then the ordering is arbitrary.
>>    This will occur if you have two annotations of the exact same type
>>    that also have the same span. It will also occur if you have not
>>    defined any type priority between two annotations that have the same
>>    span.
>>
>>
>> Hope this helps.
>>
>> Bye,
>> /Raf/
>>
>>
>> [0]
>> https://uima.apache.org/d/uimaj-current/apidocs/org/apache/uima/resource/metadata/TypePriorities.html
>>
>> [1]
>> https://uima.apache.org/d/uimaj-current/apidocs/org/apache/uima/cas/text/AnnotationIndex.html
>>
>>
>> On 02/11/20 22:16, Mario Juric wrote:
>>> Hi,
>>>
>>> I am migrating some code to the new UIMA v3 select API, and I am seeing some odd behaviour. My reference implementation is the good old JCasUtil.selectCovered, which I am trying to replace first, and I thought the following line should do it:
>>>
>>> jCas.select(annotationType).coveredBy(annotation)
>>>
>>> This works fine as long annotation is of annotationType, but I am seeing some strange different output when annotation is of a different Annotation subtype. More specifically I have a unit test (see bottom) where annotationType is the Annotation class and annotation is an instance of some direct subtype of Annotation, which was added to the CAS index prior to the call. In this case all annotations that have the exact same bounds as annotation are not selected, only those that are completely enclosed get selected (begin > annotation.getBegin() and end < annotation.getEnd()). The JCasUtil includes the missing annotations.
>>>
>>> None of the available select configurations seem to address this, and superficially stepping through the code didn’t help me much, since it’s not trivial to get into the details of the underlying API, so I thought that I maybe get a faster answer here.
>>>
>>> Cheers
>>> Mario
>>>
>>>
>>> @Test
>>> public void verify_selectCovered() throws CASException, ResourceInitializationException {
>>>      JCas jCas = JCasFactory.createJCas();
>>>      Annotation[] fixture = new Annotation[] {
>>>              new Annotation(jCas, 5, 10),
>>>              new Annotation(jCas, 5, 15),
>>>              new Annotation(jCas, 0, 10),
>>>              new Annotation(jCas, 0, 15),
>>>              new Annotation(jCas, 5, 7),
>>>              new Annotation(jCas, 8, 10),
>>>              new Annotation(jCas, 6, 9),
>>>              new Annotation(jCas, 5, 10)
>>>      };
>>>      Stream.of(fixture).forEach(Annotation::addToIndexes);
>>>
>>>      assertEquals(4, JCasUtil.selectCovered(jCas, Annotation.class, fixture[0]).size());
>>>
>>>      List<Annotation> selection1 = jCas.select(Annotation.class)
>>>              .coveredBy(fixture[0])
>>>              .collect(Collectors.toList());
>>>
>>>      assertEquals(4, selection1.size());
>>>
>>>      SubType subType = new SubType(jCas, 5, 10);
>>>      subType.addToIndexes();
>>>
>>>      assertEquals(5, JCasUtil.selectCovered(jCas, Annotation.class, subType).size());
>>>
>>>      List<Annotation> selection2 = jCas.select(Annotation.class)
>>>              .coveredBy(subType)
>>>              .collect(Collectors.toList());
>>>
>>>      assertEquals(5, selection2.size()); // Fails!
>>> }
>>>
>>>
>>> ________________________________
>>> Disclaimer:
>>> This email and any files transmitted with it are confidential and directed solely for the use of the intended addressee or addressees and may contain information that is legally privileged, confidential, and exempt from disclosure. If you have received this email in error, please notify the sender by telephone, fax, or return email and immediately delete this email and any files transmitted along with it. Unintended recipients are not authorized to disclose, disseminate, distribute, copy or take any action in reliance on information contained in this email and/or any files attached thereto, in any manner other than to notify the sender; any unauthorized use is subject to legal prosecution.
>>>
>> --
>> *Raffaella Ventaglio*
>> Senior Software Architect
>>
>> --
>>
>>
>>
>> *CELI srl*
>> via San Quintino, 31 - Torino
>> <https://www.google.com/maps/place/Via+S.+Quintino,+31,+10121+Torino+TO/@45.0668691,7.6684529,17z/data=%213m1%214b1%214m5%213m4%211s0x47886d13c6b49f81:0x2b74ae2a12fca9de%218m2%213d45.0668653%214d7.6706416>
>> Torino IT – 10121
>> <https://www.google.com/maps/place/Via+S.+Quintino,+31,+10121+Torino+TO/@45.0668691,7.6684529,17z/data=%213m1%214b1%214m5%213m4%211s0x47886d13c6b49f81:0x2b74ae2a12fca9de%218m2%213d45.0668653%214d7.6706416>
>>
>> *
>> *
>> *T  *+39 011 5627115
>> *W  *www.celi.it <https://www.celi.it/>
>
> ________________________________
> Disclaimer:
> This email and any files transmitted with it are confidential and directed solely for the use of the intended addressee or addressees and may contain information that is legally privileged, confidential, and exempt from disclosure. If you have received this email in error, please notify the sender by telephone, fax, or return email and immediately delete this email and any files transmitted along with it. Unintended recipients are not authorized to disclose, disseminate, distribute, copy or take any action in reliance on information contained in this email and/or any files attached thereto, in any manner other than to notify the sender; any unauthorized use is subject to legal prosecution.
>

-- 
*Raffaella Ventaglio*
Senior Software Architect

-- 



*CELI srl*
via San Quintino, 31 - Torino  
<https://www.google.com/maps/place/Via+S.+Quintino,+31,+10121+Torino+TO/@45.0668691,7.6684529,17z/data=%213m1%214b1%214m5%213m4%211s0x47886d13c6b49f81:0x2b74ae2a12fca9de%218m2%213d45.0668653%214d7.6706416>
Torino IT – 10121 
<https://www.google.com/maps/place/Via+S.+Quintino,+31,+10121+Torino+TO/@45.0668691,7.6684529,17z/data=%213m1%214b1%214m5%213m4%211s0x47886d13c6b49f81:0x2b74ae2a12fca9de%218m2%213d45.0668653%214d7.6706416>

*
*
*T  *+39 011 5627115
*W  *www.celi.it <https://www.celi.it/>

Re: Odd select coveredBy behaviour

Posted by Mario Juric <ma...@cactusglobal.com>.
Thanks Rafaella.

We haven’t had any need for type priorities yet, so we don’t use this feature at all. I am not sure, how the problem I am describing, where these annotations are not included in the selection, can be caused by arbitrary ordering?

Cheers
Mario

> On 3 Nov 2020, at 09.49, Raffaella Ventaglio <ra...@celi.it> wrote:
>
> External email – Do not click links or open attachments unless you recognize the sender and know that the content is safe.
>
>
> Hi Mario,
> Have you defined the TypePriority[0] for your /SubType/ Annotation?
>
> As per the /AnnotationIndex/ documentation[1] this property impacts the
> ordering of different annotation types with an equal span coverage:
>
> * Annotations whose start offsets are equal and whose end offsets are
>   equal are sorted based on|TypePriorities|
>   <https://uima.apache.org/d/uimaj-current/apidocs/org/apache/uima/resource/metadata/TypePriorities.html>if
>   type priorities are specified. Type Priorities specification is an
>   optional element of the component descriptor). When type priorities
>   are in use, if|a.start = b.start|,|a.end = b.end|, and the type
>   of|a|is defined before the type of|b|in the type priorities, then|a
>   < b|.
> * If none of the above rules apply, then the ordering is arbitrary.
>   This will occur if you have two annotations of the exact same type
>   that also have the same span. It will also occur if you have not
>   defined any type priority between two annotations that have the same
>   span.
>
>
> Hope this helps.
>
> Bye,
> /Raf/
>
>
> [0]
> https://uima.apache.org/d/uimaj-current/apidocs/org/apache/uima/resource/metadata/TypePriorities.html
>
> [1]
> https://uima.apache.org/d/uimaj-current/apidocs/org/apache/uima/cas/text/AnnotationIndex.html
>
>
> On 02/11/20 22:16, Mario Juric wrote:
>> Hi,
>>
>> I am migrating some code to the new UIMA v3 select API, and I am seeing some odd behaviour. My reference implementation is the good old JCasUtil.selectCovered, which I am trying to replace first, and I thought the following line should do it:
>>
>> jCas.select(annotationType).coveredBy(annotation)
>>
>> This works fine as long annotation is of annotationType, but I am seeing some strange different output when annotation is of a different Annotation subtype. More specifically I have a unit test (see bottom) where annotationType is the Annotation class and annotation is an instance of some direct subtype of Annotation, which was added to the CAS index prior to the call. In this case all annotations that have the exact same bounds as annotation are not selected, only those that are completely enclosed get selected (begin > annotation.getBegin() and end < annotation.getEnd()). The JCasUtil includes the missing annotations.
>>
>> None of the available select configurations seem to address this, and superficially stepping through the code didn’t help me much, since it’s not trivial to get into the details of the underlying API, so I thought that I maybe get a faster answer here.
>>
>> Cheers
>> Mario
>>
>>
>> @Test
>> public void verify_selectCovered() throws CASException, ResourceInitializationException {
>>     JCas jCas = JCasFactory.createJCas();
>>     Annotation[] fixture = new Annotation[] {
>>             new Annotation(jCas, 5, 10),
>>             new Annotation(jCas, 5, 15),
>>             new Annotation(jCas, 0, 10),
>>             new Annotation(jCas, 0, 15),
>>             new Annotation(jCas, 5, 7),
>>             new Annotation(jCas, 8, 10),
>>             new Annotation(jCas, 6, 9),
>>             new Annotation(jCas, 5, 10)
>>     };
>>     Stream.of(fixture).forEach(Annotation::addToIndexes);
>>
>>     assertEquals(4, JCasUtil.selectCovered(jCas, Annotation.class, fixture[0]).size());
>>
>>     List<Annotation> selection1 = jCas.select(Annotation.class)
>>             .coveredBy(fixture[0])
>>             .collect(Collectors.toList());
>>
>>     assertEquals(4, selection1.size());
>>
>>     SubType subType = new SubType(jCas, 5, 10);
>>     subType.addToIndexes();
>>
>>     assertEquals(5, JCasUtil.selectCovered(jCas, Annotation.class, subType).size());
>>
>>     List<Annotation> selection2 = jCas.select(Annotation.class)
>>             .coveredBy(subType)
>>             .collect(Collectors.toList());
>>
>>     assertEquals(5, selection2.size()); // Fails!
>> }
>>
>>
>> ________________________________
>> Disclaimer:
>> This email and any files transmitted with it are confidential and directed solely for the use of the intended addressee or addressees and may contain information that is legally privileged, confidential, and exempt from disclosure. If you have received this email in error, please notify the sender by telephone, fax, or return email and immediately delete this email and any files transmitted along with it. Unintended recipients are not authorized to disclose, disseminate, distribute, copy or take any action in reliance on information contained in this email and/or any files attached thereto, in any manner other than to notify the sender; any unauthorized use is subject to legal prosecution.
>>
>
> --
> *Raffaella Ventaglio*
> Senior Software Architect
>
> --
>
>
>
> *CELI srl*
> via San Quintino, 31 - Torino
> <https://www.google.com/maps/place/Via+S.+Quintino,+31,+10121+Torino+TO/@45.0668691,7.6684529,17z/data=%213m1%214b1%214m5%213m4%211s0x47886d13c6b49f81:0x2b74ae2a12fca9de%218m2%213d45.0668653%214d7.6706416>
> Torino IT – 10121
> <https://www.google.com/maps/place/Via+S.+Quintino,+31,+10121+Torino+TO/@45.0668691,7.6684529,17z/data=%213m1%214b1%214m5%213m4%211s0x47886d13c6b49f81:0x2b74ae2a12fca9de%218m2%213d45.0668653%214d7.6706416>
>
> *
> *
> *T  *+39 011 5627115
> *W  *www.celi.it <https://www.celi.it/>


________________________________
Disclaimer:
This email and any files transmitted with it are confidential and directed solely for the use of the intended addressee or addressees and may contain information that is legally privileged, confidential, and exempt from disclosure. If you have received this email in error, please notify the sender by telephone, fax, or return email and immediately delete this email and any files transmitted along with it. Unintended recipients are not authorized to disclose, disseminate, distribute, copy or take any action in reliance on information contained in this email and/or any files attached thereto, in any manner other than to notify the sender; any unauthorized use is subject to legal prosecution.


Re: Odd select coveredBy behaviour

Posted by Raffaella Ventaglio <ra...@celi.it>.
Hi Mario,
Have you defined the TypePriority[0] for your /SubType/ Annotation?

As per the /AnnotationIndex/ documentation[1] this property impacts the 
ordering of different annotation types with an equal span coverage:

  * Annotations whose start offsets are equal and whose end offsets are
    equal are sorted based on|TypePriorities|
    <https://uima.apache.org/d/uimaj-current/apidocs/org/apache/uima/resource/metadata/TypePriorities.html>if
    type priorities are specified. Type Priorities specification is an
    optional element of the component descriptor). When type priorities
    are in use, if|a.start = b.start|,|a.end = b.end|, and the type
    of|a|is defined before the type of|b|in the type priorities, then|a
    < b|.
  * If none of the above rules apply, then the ordering is arbitrary.
    This will occur if you have two annotations of the exact same type
    that also have the same span. It will also occur if you have not
    defined any type priority between two annotations that have the same
    span.


Hope this helps.

Bye,
/Raf/


[0] 
https://uima.apache.org/d/uimaj-current/apidocs/org/apache/uima/resource/metadata/TypePriorities.html

[1] 
https://uima.apache.org/d/uimaj-current/apidocs/org/apache/uima/cas/text/AnnotationIndex.html


On 02/11/20 22:16, Mario Juric wrote:
> Hi,
>
> I am migrating some code to the new UIMA v3 select API, and I am seeing some odd behaviour. My reference implementation is the good old JCasUtil.selectCovered, which I am trying to replace first, and I thought the following line should do it:
>
> jCas.select(annotationType).coveredBy(annotation)
>
> This works fine as long annotation is of annotationType, but I am seeing some strange different output when annotation is of a different Annotation subtype. More specifically I have a unit test (see bottom) where annotationType is the Annotation class and annotation is an instance of some direct subtype of Annotation, which was added to the CAS index prior to the call. In this case all annotations that have the exact same bounds as annotation are not selected, only those that are completely enclosed get selected (begin > annotation.getBegin() and end < annotation.getEnd()). The JCasUtil includes the missing annotations.
>
> None of the available select configurations seem to address this, and superficially stepping through the code didn’t help me much, since it’s not trivial to get into the details of the underlying API, so I thought that I maybe get a faster answer here.
>
> Cheers
> Mario
>
>
> @Test
> public void verify_selectCovered() throws CASException, ResourceInitializationException {
>      JCas jCas = JCasFactory.createJCas();
>      Annotation[] fixture = new Annotation[] {
>              new Annotation(jCas, 5, 10),
>              new Annotation(jCas, 5, 15),
>              new Annotation(jCas, 0, 10),
>              new Annotation(jCas, 0, 15),
>              new Annotation(jCas, 5, 7),
>              new Annotation(jCas, 8, 10),
>              new Annotation(jCas, 6, 9),
>              new Annotation(jCas, 5, 10)
>      };
>      Stream.of(fixture).forEach(Annotation::addToIndexes);
>
>      assertEquals(4, JCasUtil.selectCovered(jCas, Annotation.class, fixture[0]).size());
>
>      List<Annotation> selection1 = jCas.select(Annotation.class)
>              .coveredBy(fixture[0])
>              .collect(Collectors.toList());
>
>      assertEquals(4, selection1.size());
>
>      SubType subType = new SubType(jCas, 5, 10);
>      subType.addToIndexes();
>
>      assertEquals(5, JCasUtil.selectCovered(jCas, Annotation.class, subType).size());
>
>      List<Annotation> selection2 = jCas.select(Annotation.class)
>              .coveredBy(subType)
>              .collect(Collectors.toList());
>
>      assertEquals(5, selection2.size()); // Fails!
> }
>
>
> ________________________________
> Disclaimer:
> This email and any files transmitted with it are confidential and directed solely for the use of the intended addressee or addressees and may contain information that is legally privileged, confidential, and exempt from disclosure. If you have received this email in error, please notify the sender by telephone, fax, or return email and immediately delete this email and any files transmitted along with it. Unintended recipients are not authorized to disclose, disseminate, distribute, copy or take any action in reliance on information contained in this email and/or any files attached thereto, in any manner other than to notify the sender; any unauthorized use is subject to legal prosecution.
>

-- 
*Raffaella Ventaglio*
Senior Software Architect

-- 



*CELI srl*
via San Quintino, 31 - Torino  
<https://www.google.com/maps/place/Via+S.+Quintino,+31,+10121+Torino+TO/@45.0668691,7.6684529,17z/data=%213m1%214b1%214m5%213m4%211s0x47886d13c6b49f81:0x2b74ae2a12fca9de%218m2%213d45.0668653%214d7.6706416>
Torino IT – 10121 
<https://www.google.com/maps/place/Via+S.+Quintino,+31,+10121+Torino+TO/@45.0668691,7.6684529,17z/data=%213m1%214b1%214m5%213m4%211s0x47886d13c6b49f81:0x2b74ae2a12fca9de%218m2%213d45.0668653%214d7.6706416>

*
*
*T  *+39 011 5627115
*W  *www.celi.it <https://www.celi.it/>

Re: Odd select coveredBy behaviour

Posted by Mario Juric <ma...@cactusglobal.com>.
Hi Richard,

I also just tried to add the same test to uimaFIT JCasUtilTest, replacing SubType with Token as well, but keeping the code that creates the JCas using JCasFactory. In this project it fails as well. So, I guess the problem must somehow be related to the uimaFIT use. Maybe you could confirm it fails for you in this case as well.

Cheers
Mario

On 3 Nov 2020, at 13.52, Mario Juric <ma...@cactusglobal.com>> wrote:

Hi Richard,

I was in the middle of reporting the issue in JIra, when I received your mail. The test fails in my project, but it passes when I do like you do. Don’t know what the difference is, except that we use uiamFIT in our project. I also tried to define SubType manually outside the XML type description, and it worked in this case in my project as well, e.g. just added this static class to the test class.


public static class SubType extends Annotation {

  public SubType(JCas jcas, int begin, int end) {
    super(jcas, begin, end);
  }

}

I actually don’t have a clue why I am seeing this difference in behaviour.

Cheers
Mario


On 3 Nov 2020, at 13.03, Richard Eckart de Castilho <re...@apache.org>> wrote:

External email – Do not click links or open attachments unless you recognize the sender and know that the content is safe.


On 3. Nov 2020, at 11:42, Richard Eckart de Castilho <re...@apache.org>> wrote:

I checked out the latest from master and installed it, but the unit test still fails in the same way.

roger, I'll check it out.

If you want to do me a favor, please open an issue on Jira and put your test case code there.

I removed the uimaFIT references from your code and dropped it into the
org.apache.uima.cas.impl.SelectFsTest test class in uimaj-core of the
master branch. Instead of the SubType type, I used the "x.y.z.Token" type
which is already available in the uimaj-core test code (inherits from Annotation).

For me, the test runs...

Did I accidentally mutilate your test?

-- Richard


 @Test
 public void verify_selectCovered() throws Exception {
   JCas jCas = cas.getJCas();
    Annotation[] fixture = new Annotation[] {
            new Annotation(jCas, 5, 10),
            new Annotation(jCas, 5, 15),
            new Annotation(jCas, 0, 10),
            new Annotation(jCas, 0, 15),
            new Annotation(jCas, 5, 7),
            new Annotation(jCas, 8, 10),
            new Annotation(jCas, 6, 9),
            new Annotation(jCas, 5, 10)
    };
    Stream.of(fixture).forEach(Annotation::addToIndexes);

    List<Annotation> selection1 = jCas.select(Annotation.class)
            .coveredBy(fixture[0])
            .collect(Collectors.toList());

    assertEquals(4, selection1.size());

    Token subType = new Token(jCas, 5, 10);
    subType.addToIndexes();

    List<Annotation> selection2 = jCas.select(Annotation.class)
            .coveredBy(subType)
            .collect(Collectors.toList());

    assertEquals(5, selection2.size()); // Fails!
 }



________________________________
Disclaimer:
This email and any files transmitted with it are confidential and directed solely for the use of the intended addressee or addressees and may contain information that is legally privileged, confidential, and exempt from disclosure. If you have received this email in error, please notify the sender by telephone, fax, or return email and immediately delete this email and any files transmitted along with it. Unintended recipients are not authorized to disclose, disseminate, distribute, copy or take any action in reliance on information contained in this email and/or any files attached thereto, in any manner other than to notify the sender; any unauthorized use is subject to legal prosecution.


Re: Odd select coveredBy behaviour

Posted by Mario Juric <ma...@cactusglobal.com>.
Awesome Richard,

All our tests pass with this PR. I only migrated a utility class, which is used in several places. This is just a trial balloon for comparing performance and behaviour with JCasUtil. I tried the uimaFIT benchmark you mentioned, and judging from the output of this, it appears there is generally little or no performance gain with the new Select API, and like you said some of them seem to be slower, e.g. selectCovered and selectOverlapping. However, there might be some gains, if intermediate collection aggregations are avoided by utilising streams better.

I can see that the selectAt seems to be implemented more efficiently using binary search in the new API, so I tried adding a simple benchmark for it, which appears to confirm this. I also tried changing the timer to use CPU time, which should yield more accurate timing.

Cheers
Mario

> On 3 Nov 2020, at 22.40, Richard Eckart de Castilho <re...@apache.org> wrote:
>
> External email – Do not click links or open attachments unless you recognize the sender and know that the content is safe.
>
>
> On 3. Nov 2020, at 15:01, Mario Juric <ma...@cactusglobal.com> wrote:
>>
>> I created the Jira issue with a small Maven project that reproduces the issue. However, it seems to be related to the use of uimaFIT, since removing the uimaFIT references and running the test in uimaj-core didn’t reproduce the problem.
>
> With a bit of refactoring of your test, I could reproduce the issue without any uimaFIT whatsoever.
>
> There is a PR for this now with a bunch of fixes: https://github.com/apache/uima-uimaj/pull/81
>
> -- Richard


________________________________
Disclaimer:
This email and any files transmitted with it are confidential and directed solely for the use of the intended addressee or addressees and may contain information that is legally privileged, confidential, and exempt from disclosure. If you have received this email in error, please notify the sender by telephone, fax, or return email and immediately delete this email and any files transmitted along with it. Unintended recipients are not authorized to disclose, disseminate, distribute, copy or take any action in reliance on information contained in this email and/or any files attached thereto, in any manner other than to notify the sender; any unauthorized use is subject to legal prosecution.


Re: Odd select coveredBy behaviour

Posted by Richard Eckart de Castilho <re...@apache.org>.
On 3. Nov 2020, at 15:01, Mario Juric <ma...@cactusglobal.com> wrote:
> 
> I created the Jira issue with a small Maven project that reproduces the issue. However, it seems to be related to the use of uimaFIT, since removing the uimaFIT references and running the test in uimaj-core didn’t reproduce the problem.

With a bit of refactoring of your test, I could reproduce the issue without any uimaFIT whatsoever. 

There is a PR for this now with a bunch of fixes: https://github.com/apache/uima-uimaj/pull/81

-- Richard

Re: Odd select coveredBy behaviour

Posted by Mario Juric <ma...@cactusglobal.com>.
His Richard,

I created the Jira issue with a small Maven project that reproduces the issue. However, it seems to be related to the use of uimaFIT, since removing the uimaFIT references and running the test in uimaj-core didn’t reproduce the problem.

Cheers
Mario

> On 3 Nov 2020, at 14.00, Richard Eckart de Castilho <re...@apache.org> wrote:
>
> External email – Do not click links or open attachments unless you recognize the sender and know that the content is safe.
>
>
> Maybe you can attach a full Maven project which reproduces the issue on your machine to the Jira.
>
> If I can reproduce it locally, I can look into fixing the issue.
>
> -- Richard
>
>> On 3. Nov 2020, at 13:52, Mario Juric <ma...@cactusglobal.com> wrote:
>>
>> Hi Richard,
>>
>> I was in the middle of reporting the issue in JIra, when I received your mail. The test fails in my project, but it passes when I do like you do. Don’t know what the difference is, except that we use uiamFIT in our project. I also tried to define SubType manually outside the XML type description, and it worked in this case in my project as well, e.g. just added this static class to the test class.
>>
>>
>> public static class SubType extends Annotation {
>>
>> public SubType(JCas jcas, int begin, int end) {
>>   super(jcas, begin, end);
>> }
>>
>> }
>>
>> I actually don’t have a clue why I am seeing this difference in behaviour.
>>
>> Cheers
>> Mario
>


________________________________
Disclaimer:
This email and any files transmitted with it are confidential and directed solely for the use of the intended addressee or addressees and may contain information that is legally privileged, confidential, and exempt from disclosure. If you have received this email in error, please notify the sender by telephone, fax, or return email and immediately delete this email and any files transmitted along with it. Unintended recipients are not authorized to disclose, disseminate, distribute, copy or take any action in reliance on information contained in this email and/or any files attached thereto, in any manner other than to notify the sender; any unauthorized use is subject to legal prosecution.


Re: Odd select coveredBy behaviour

Posted by Richard Eckart de Castilho <re...@apache.org>.
Maybe you can attach a full Maven project which reproduces the issue on your machine to the Jira.

If I can reproduce it locally, I can look into fixing the issue.

-- Richard

> On 3. Nov 2020, at 13:52, Mario Juric <ma...@cactusglobal.com> wrote:
> 
> Hi Richard,
> 
> I was in the middle of reporting the issue in JIra, when I received your mail. The test fails in my project, but it passes when I do like you do. Don’t know what the difference is, except that we use uiamFIT in our project. I also tried to define SubType manually outside the XML type description, and it worked in this case in my project as well, e.g. just added this static class to the test class.
> 
> 
> public static class SubType extends Annotation {
> 
>  public SubType(JCas jcas, int begin, int end) {
>    super(jcas, begin, end);
>  }
> 
> }
> 
> I actually don’t have a clue why I am seeing this difference in behaviour.
> 
> Cheers
> Mario


Re: Odd select coveredBy behaviour

Posted by Mario Juric <ma...@cactusglobal.com>.
Hi Richard,

I was in the middle of reporting the issue in JIra, when I received your mail. The test fails in my project, but it passes when I do like you do. Don’t know what the difference is, except that we use uiamFIT in our project. I also tried to define SubType manually outside the XML type description, and it worked in this case in my project as well, e.g. just added this static class to the test class.


public static class SubType extends Annotation {

  public SubType(JCas jcas, int begin, int end) {
    super(jcas, begin, end);
  }

}

I actually don’t have a clue why I am seeing this difference in behaviour.

Cheers
Mario


On 3 Nov 2020, at 13.03, Richard Eckart de Castilho <re...@apache.org>> wrote:

External email – Do not click links or open attachments unless you recognize the sender and know that the content is safe.


On 3. Nov 2020, at 11:42, Richard Eckart de Castilho <re...@apache.org>> wrote:

I checked out the latest from master and installed it, but the unit test still fails in the same way.

roger, I'll check it out.

If you want to do me a favor, please open an issue on Jira and put your test case code there.

I removed the uimaFIT references from your code and dropped it into the
org.apache.uima.cas.impl.SelectFsTest test class in uimaj-core of the
master branch. Instead of the SubType type, I used the "x.y.z.Token" type
which is already available in the uimaj-core test code (inherits from Annotation).

For me, the test runs...

Did I accidentally mutilate your test?

-- Richard


 @Test
 public void verify_selectCovered() throws Exception {
   JCas jCas = cas.getJCas();
    Annotation[] fixture = new Annotation[] {
            new Annotation(jCas, 5, 10),
            new Annotation(jCas, 5, 15),
            new Annotation(jCas, 0, 10),
            new Annotation(jCas, 0, 15),
            new Annotation(jCas, 5, 7),
            new Annotation(jCas, 8, 10),
            new Annotation(jCas, 6, 9),
            new Annotation(jCas, 5, 10)
    };
    Stream.of(fixture).forEach(Annotation::addToIndexes);

    List<Annotation> selection1 = jCas.select(Annotation.class)
            .coveredBy(fixture[0])
            .collect(Collectors.toList());

    assertEquals(4, selection1.size());

    Token subType = new Token(jCas, 5, 10);
    subType.addToIndexes();

    List<Annotation> selection2 = jCas.select(Annotation.class)
            .coveredBy(subType)
            .collect(Collectors.toList());

    assertEquals(5, selection2.size()); // Fails!
 }


________________________________
Disclaimer:
This email and any files transmitted with it are confidential and directed solely for the use of the intended addressee or addressees and may contain information that is legally privileged, confidential, and exempt from disclosure. If you have received this email in error, please notify the sender by telephone, fax, or return email and immediately delete this email and any files transmitted along with it. Unintended recipients are not authorized to disclose, disseminate, distribute, copy or take any action in reliance on information contained in this email and/or any files attached thereto, in any manner other than to notify the sender; any unauthorized use is subject to legal prosecution.


Re: Odd select coveredBy behaviour

Posted by Richard Eckart de Castilho <re...@apache.org>.
On 3. Nov 2020, at 11:42, Richard Eckart de Castilho <re...@apache.org> wrote:
> 
>> I checked out the latest from master and installed it, but the unit test still fails in the same way.
> 
> roger, I'll check it out.
> 
> If you want to do me a favor, please open an issue on Jira and put your test case code there.

I removed the uimaFIT references from your code and dropped it into the
org.apache.uima.cas.impl.SelectFsTest test class in uimaj-core of the
master branch. Instead of the SubType type, I used the "x.y.z.Token" type
which is already available in the uimaj-core test code (inherits from Annotation).

For me, the test runs... 

Did I accidentally mutilate your test?

-- Richard
 

  @Test
  public void verify_selectCovered() throws Exception {
    JCas jCas = cas.getJCas();
     Annotation[] fixture = new Annotation[] {
             new Annotation(jCas, 5, 10),
             new Annotation(jCas, 5, 15),
             new Annotation(jCas, 0, 10),
             new Annotation(jCas, 0, 15),
             new Annotation(jCas, 5, 7),
             new Annotation(jCas, 8, 10),
             new Annotation(jCas, 6, 9),
             new Annotation(jCas, 5, 10)
     };
     Stream.of(fixture).forEach(Annotation::addToIndexes);

     List<Annotation> selection1 = jCas.select(Annotation.class)
             .coveredBy(fixture[0])
             .collect(Collectors.toList());

     assertEquals(4, selection1.size());

     Token subType = new Token(jCas, 5, 10);
     subType.addToIndexes();

     List<Annotation> selection2 = jCas.select(Annotation.class)
             .coveredBy(subType)
             .collect(Collectors.toList());

     assertEquals(5, selection2.size()); // Fails!
  } 

Re: Odd select coveredBy behaviour

Posted by Richard Eckart de Castilho <re...@apache.org>.
Hi,

> On 3. Nov 2020, at 11:40, Mario Juric <ma...@cactusglobal.com> wrote:
> 
> I checked out the latest from master and installed it, but the unit test still fails in the same way.

roger, I'll check it out.

If you want to do me a favor, please open an issue on Jira and put your test case code there.

Cheers,

-- Richard

Re: Odd select coveredBy behaviour

Posted by Mario Juric <ma...@cactusglobal.com>.
Hi Richard,

I checked out the latest from master and installed it, but the unit test still fails in the same way.

Cheers
Mario

> On 3 Nov 2020, at 11.07, Mario Juric <ma...@cactusglobal.com> wrote:
>
> Thanks for the info Richard.
>
> This is good to know, so we’ll stick with uimaFIT for now. Given some of the things I read about it, I expected the SelectFS to be faster, but to be sure I also intended to make my own performance measurements as part of these initial steps, therefore its nice that you already did this, which saves me the time.
>
> Still, I will checkout the latest from master to see if it fixes the issue I encountered. I guess we would still want to remove any bugs, if this turns out to be the case.
>
> Cheers
> Mario
>
>
>> On 3 Nov 2020, at 10.30, Richard Eckart de Castilho <re...@apache.org> wrote:
>>
>> External email – Do not click links or open attachments unless you recognize the sender and know that the content is safe.
>>
>>
>> Btw...
>>
>>> On 2. Nov 2020, at 22:16, Mario Juric <ma...@cactusglobal.com> wrote:
>>>
>>> jCas.select(annotationType).coveredBy(annotation)
>>
>> I did some local speed measurements and in particular for the "coveredBy" selector,
>> SelectFS is presently a good deal slower than the uimaFIT equivalent. I didn't look
>> into speeding up SelectFS.coveredBy yet, but for the moment, you just might want to
>> stick with the uimaFIT version of selectCovered.
>>
>> There is a "benchmark" module in the uimaFIT source repo which contains a few tentative
>> performance measurements.
>>
>> ==================
>> JCas selectCovered (using uimaFIT JCasUtil.select and selectCovered)
>> ==================
>>
>>   new Benchmark("JCas selectCovered", template)
>>     .measure(() -> select(jcas, Sentence.class).forEach(s -> selectCovered(Token.class, s)))
>>     .run();
>>
>> Running benchmark... 10 100 1000 10000
>>
>> [     10/     20: min:    0 max:    2 median:    0 fail:    0 ]
>> [    100/     20: min:    0 max:    1 median:    0 fail:    0 ]
>> [   1000/     20: min:    4 max:   25 median:    5 fail:    0 ]
>> [  10000/     20: min:  170 max:  696 median:  193 fail:    0 ]
>>
>> The 10, 100, 1000, etc are the numbers of annotations in the CAS (randomly generated Tokens/Sentences).
>> The 20 indicates how often the given lambda was measured.
>>
>>
>> =====================
>> JCas select.coveredBy v3 (using SelectFS select and coveredBy)
>> =====================
>>
>>   new Benchmark("JCas select.coveredBy v3", template)
>>     .measure(() -> {
>>         jcas.select(Sentence.class).forEach(s -> jcas.select(Token.class).coveredBy(s).forEach(t -> {}));
>>     })
>>   .run();
>>
>> Running benchmark... 10 100 1000 10000
>>
>> [     10/     20: min:    0 max:    4 median:    0 fail:    0 ]
>> [    100/     20: min:    1 max:   45 median:    2 fail:    0 ]
>> [   1000/     20: min:   15 max:   87 median:   24 fail:    0 ]
>> [  10000/     20: min:  862 max: 2919 median: 1310 fail:    0 ]
>>
>>
>> These numbers are from my local branches. The code in the master branches might behave slightly differently.
>>
>> Cheers,
>>
>> -- Richard
>


________________________________
Disclaimer:
This email and any files transmitted with it are confidential and directed solely for the use of the intended addressee or addressees and may contain information that is legally privileged, confidential, and exempt from disclosure. If you have received this email in error, please notify the sender by telephone, fax, or return email and immediately delete this email and any files transmitted along with it. Unintended recipients are not authorized to disclose, disseminate, distribute, copy or take any action in reliance on information contained in this email and/or any files attached thereto, in any manner other than to notify the sender; any unauthorized use is subject to legal prosecution.


Re: Odd select coveredBy behaviour

Posted by Mario Juric <ma...@cactusglobal.com>.
Thanks for the info Richard.

This is good to know, so we’ll stick with uimaFIT for now. Given some of the things I read about it, I expected the SelectFS to be faster, but to be sure I also intended to make my own performance measurements as part of these initial steps, therefore its nice that you already did this, which saves me the time.

Still, I will checkout the latest from master to see if it fixes the issue I encountered. I guess we would still want to remove any bugs, if this turns out to be the case.

Cheers
Mario


> On 3 Nov 2020, at 10.30, Richard Eckart de Castilho <re...@apache.org> wrote:
>
> External email – Do not click links or open attachments unless you recognize the sender and know that the content is safe.
>
>
> Btw...
>
>> On 2. Nov 2020, at 22:16, Mario Juric <ma...@cactusglobal.com> wrote:
>>
>> jCas.select(annotationType).coveredBy(annotation)
>
> I did some local speed measurements and in particular for the "coveredBy" selector,
> SelectFS is presently a good deal slower than the uimaFIT equivalent. I didn't look
> into speeding up SelectFS.coveredBy yet, but for the moment, you just might want to
> stick with the uimaFIT version of selectCovered.
>
> There is a "benchmark" module in the uimaFIT source repo which contains a few tentative
> performance measurements.
>
> ==================
> JCas selectCovered (using uimaFIT JCasUtil.select and selectCovered)
> ==================
>
>    new Benchmark("JCas selectCovered", template)
>      .measure(() -> select(jcas, Sentence.class).forEach(s -> selectCovered(Token.class, s)))
>      .run();
>
> Running benchmark... 10 100 1000 10000
>
> [     10/     20: min:    0 max:    2 median:    0 fail:    0 ]
> [    100/     20: min:    0 max:    1 median:    0 fail:    0 ]
> [   1000/     20: min:    4 max:   25 median:    5 fail:    0 ]
> [  10000/     20: min:  170 max:  696 median:  193 fail:    0 ]
>
> The 10, 100, 1000, etc are the numbers of annotations in the CAS (randomly generated Tokens/Sentences).
> The 20 indicates how often the given lambda was measured.
>
>
> =====================
> JCas select.coveredBy v3 (using SelectFS select and coveredBy)
> =====================
>
>    new Benchmark("JCas select.coveredBy v3", template)
>      .measure(() -> {
>          jcas.select(Sentence.class).forEach(s -> jcas.select(Token.class).coveredBy(s).forEach(t -> {}));
>      })
>    .run();
>
> Running benchmark... 10 100 1000 10000
>
> [     10/     20: min:    0 max:    4 median:    0 fail:    0 ]
> [    100/     20: min:    1 max:   45 median:    2 fail:    0 ]
> [   1000/     20: min:   15 max:   87 median:   24 fail:    0 ]
> [  10000/     20: min:  862 max: 2919 median: 1310 fail:    0 ]
>
>
> These numbers are from my local branches. The code in the master branches might behave slightly differently.
>
> Cheers,
>
> -- Richard


________________________________
Disclaimer:
This email and any files transmitted with it are confidential and directed solely for the use of the intended addressee or addressees and may contain information that is legally privileged, confidential, and exempt from disclosure. If you have received this email in error, please notify the sender by telephone, fax, or return email and immediately delete this email and any files transmitted along with it. Unintended recipients are not authorized to disclose, disseminate, distribute, copy or take any action in reliance on information contained in this email and/or any files attached thereto, in any manner other than to notify the sender; any unauthorized use is subject to legal prosecution.


Re: Odd select coveredBy behaviour

Posted by Richard Eckart de Castilho <re...@apache.org>.
Btw...

> On 2. Nov 2020, at 22:16, Mario Juric <ma...@cactusglobal.com> wrote:
> 
> jCas.select(annotationType).coveredBy(annotation)

I did some local speed measurements and in particular for the "coveredBy" selector,
SelectFS is presently a good deal slower than the uimaFIT equivalent. I didn't look 
into speeding up SelectFS.coveredBy yet, but for the moment, you just might want to
stick with the uimaFIT version of selectCovered.

There is a "benchmark" module in the uimaFIT source repo which contains a few tentative
performance measurements.

==================
JCas selectCovered (using uimaFIT JCasUtil.select and selectCovered)
==================

    new Benchmark("JCas selectCovered", template)
      .measure(() -> select(jcas, Sentence.class).forEach(s -> selectCovered(Token.class, s)))
      .run();

Running benchmark... 10 100 1000 10000 

[     10/     20: min:    0 max:    2 median:    0 fail:    0 ]
[    100/     20: min:    0 max:    1 median:    0 fail:    0 ]
[   1000/     20: min:    4 max:   25 median:    5 fail:    0 ]
[  10000/     20: min:  170 max:  696 median:  193 fail:    0 ]

The 10, 100, 1000, etc are the numbers of annotations in the CAS (randomly generated Tokens/Sentences).
The 20 indicates how often the given lambda was measured.


=====================
JCas select.coveredBy v3 (using SelectFS select and coveredBy)
=====================

    new Benchmark("JCas select.coveredBy v3", template)
      .measure(() -> {
          jcas.select(Sentence.class).forEach(s -> jcas.select(Token.class).coveredBy(s).forEach(t -> {}));
      })
    .run();

Running benchmark... 10 100 1000 10000 

[     10/     20: min:    0 max:    4 median:    0 fail:    0 ]
[    100/     20: min:    1 max:   45 median:    2 fail:    0 ]
[   1000/     20: min:   15 max:   87 median:   24 fail:    0 ]
[  10000/     20: min:  862 max: 2919 median: 1310 fail:    0 ]


These numbers are from my local branches. The code in the master branches might behave slightly differently.

Cheers,

-- Richard

Re: Odd select coveredBy behaviour

Posted by Richard Eckart de Castilho <re...@apache.org>.
HI Mario,

> On 2. Nov 2020, at 22:16, Mario Juric <ma...@cactusglobal.com> wrote:
> 
> In this case all annotations that have the exact same bounds as annotation are not selected, only those that are completely enclosed get selected (begin > annotation.getBegin() and end < annotation.getEnd()). The JCasUtil includes the missing annotations.

I have been looking into the SelectFS framework in the past weeks and fixed a number of issues - also one or two fixed are going into uimaFIT for specific edge cases.

Maybe you could try running you test against the current master branches of uima-uimaj and uima-uimafit
to check if the issue still persists?

You might also care to check out the dev-list discussion on annotation relationships which accompanied the fixes:

 https://lists.apache.org/thread.html/rff2b9882af077907ff1ad08e90f80a62a20efbfab587a08a5e2bc78c%40%3Cdev.uima.apache.org%3E

Cheers,

-- Richard