You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@uima.apache.org by Richard Eckart de Castilho <re...@apache.org> on 2020/11/04 10:08:48 UTC

Annotation access speed comparison (sneak-peek, non-authoritative)

Hi all,

for those who are interested - here are a few results of benchmarking the access to annotations in the CAS using different approaches.

These were done on a Macbook Pro (i7 2,2 GHz) basically under working conditions (many applications open, etc.)

Used versions:
- uimaj-core:   3.1.2-SNAPSHOT (commit 099a2e0a9) 
- uimafit-core: 3.1.1-SNAPSHOT (commit 72895b5c8)

The benchmarks basically fill a CAS with random annotations (Sentence and Token type, but they do not behave
like sentences/tokens usually would - i.e. they are generated at random positions and so may arbitrarily 
overlap with each other). All annotations start/end within a range of [0-130] and have a random length between
0 and 30. The benchmarks fill the CAS multiple times with increasing numbers of annotations and perform the
selections repeatedly. If you want more details, check out the uimafit-benchmark module and run them yourself ;)
The first timing is the cumulative time spend by benchmark. The second timing is the longest duration of a
single execution.


As for insights: 

* Don't look at the times in terms of absolute values - rather consider how the time of one approach
behaves relative to the time of another approach.

* I find it quite interesting that selects are slower when using JCAS.select(type) than when using
JCAS.getAnnotationIndex(type).select(). I would expect both to run at the same speed.

* Contrary to previous benchmark results, we can see that the (J)CAS.select() is typically faster than
its uimaFIT counterpart with a few interesting exceptions.

* Note that there is no CAS.select().overlapping() equivalent to the JCasUtil.selectOverlapping (yet)


If you would like to see additional approaches measured or if you have ideas of how to improve the
informativeness or general setup of the benchmarks, let me know. For small changes, you could also
just open a PR on GitHub against uimaFIT master.

Cheers,

-- Richard


GROUP: select
=========================

Sorted by execution time:
  1136ms /    2ms -- JCAS.select(Token.class).forEach(x -> {})
  1231ms /    3ms -- JCasUtil.select(JCAS, Token.class).forEach(x -> {})
  2679ms /    4ms -- JCAS.select(TOP.class).forEach(x -> {})
  2703ms /    4ms -- JCAS.select().forEach(x -> {})
  3803ms /    6ms -- JCasUtil.select(JCAS, TOP.class).forEach(x -> {})
  3997ms /   16ms -- JCasUtil.selectAll(JCAS).forEach(x -> {})


GROUP: select covered by
=========================

Sorted by execution time:
    84ms /    5ms -- JCAS.getAnnotationIndex(Token.class).select().coveredBy(s).forEach(t -> {})
   134ms /   11ms -- JCasUtil.selectCovered(Token.class, s).forEach(t -> {})
   159ms /   11ms -- JCAS.select(Token.class).coveredBy(s).forEach(t -> {})
   836ms /   46ms -- JCAS.getAnnotationIndex(Token.class).stream().filter(t -> coveredBy(t, s)).forEach(t -> {})
   842ms /   46ms -- JCAS.select(Token.class).filter(t -> coveredBy(t, s)).forEach(t -> {})


GROUP: select covering
=========================

Sorted by execution time:
    98ms /    5ms -- JCAS.getAnnotationIndex(Token.class).select().covering(s).forEach(t -> {})
   109ms /    6ms -- CAS.getAnnotationIndex(getType(cas, TYPE_NAME_TOKEN)).select().covering(s).forEach(t -> {})
   157ms /    7ms -- CasUtil.selectCovering(tokenType, s).forEach(t -> {})
   170ms /   20ms -- JCasUtil.selectCovering(Token.class, s).forEach(t -> {})
   187ms /   14ms -- JCAS.select(Token.class).covering(s).forEach(t -> {})
   812ms /   47ms -- JCAS.select(Token.class).filter(t -> covering(t, s)).forEach(t -> {})
   862ms /   45ms -- CAS.getAnnotationIndex(getType(cas, TYPE_NAME_TOKEN)).stream().filter(t -> covering(t, s)).forEach(t -> {})
  1039ms /   65ms -- JCAS.getAnnotationIndex(Token.class).stream().filter(t -> covering(t, s)).forEach(t -> {})


GROUP: select at
=========================

Sorted by execution time:
    31ms /    2ms -- JCAS.select(Token.class).at(s).forEach(t -> {})
    65ms /    4ms -- JCAS.select(Token.class).at(s.getBegin(), s.getEnd()).forEach(t -> {})
   109ms /   29ms -- JCasUtil.selectAt(CAS, Token.class, s.getBegin(), s.getEnd()).forEach(t -> {})
   880ms /   41ms -- JCAS.getAnnotationIndex(Token.class).stream().filter(t -> colocated(t, s)).forEach(t -> {})
   936ms /   47ms -- JCAS.select(Token.class).filter(t -> colocated(t, s)).forEach(t -> {})


GROUP: select overlapping
=========================

Sorted by execution time:
   238ms /   34ms -- JCasUtil.selectOverlapping(JCAS, Token.class, s).forEach(t -> {})
   354ms /   22ms -- JCAS.getAnnotationIndex(Token.class).stream().filter(t -> overlapping(t, s)).forEach(t -> {})
   381ms /   24ms -- CAS.select(Token.class).filter(t -> overlapping(t, s)).forEach(t -> {})

Re: Annotation access speed comparison (sneak-peek, non-authoritative)

Posted by Richard Eckart de Castilho <re...@apache.org>.

On 6. Nov 2020, at 13:17, Mario Juric <ma...@cactusglobal.com> wrote:
> 
> I managed finally to clean up the messages after all, although my IDE send me on a detour at first, which almost prompted me to give up. However, it should follow the guide lines now.

Excellent. I have pushed my additional changes onto your branch. I hope you agree.

A tip for the commit messages: consider leaving an empty line between the issue number/title and the actual change list:

```
[UIMA-6291] Improve uimaFIT benchmarking module 

- Added select and selectAt benchmarks using getAnnotationIndex approach.
```

The empty line separates the "title" of the commit message from the "body".

No need to update the commit messages again though. Just a pointer for future commits.

It looks like the mail address you use for your commits is not associated with your GitHub account. 
To have the commits appear in your GitHub profile, you might care to add the mail address as an alias to your GitHub profile.

> BTW Did you see my selectAt bug report? https://issues.apache.org/jira/browse/UIMA-6294

Now I did ;) Thanks! I'll have a look at it.

-- Richard