You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Richard Eckart de Castilho <re...@apache.org> on 2020/11/04 10:08:48 UTC

Annotation access speed comparison (sneak-peek, non-authoritative)

Hi all,

for those who are interested - here are a few results of benchmarking the access to annotations in the CAS using different approaches.

These were done on a Macbook Pro (i7 2,2 GHz) basically under working conditions (many applications open, etc.)

Used versions:
- uimaj-core:   3.1.2-SNAPSHOT (commit 099a2e0a9) 
- uimafit-core: 3.1.1-SNAPSHOT (commit 72895b5c8)

The benchmarks basically fill a CAS with random annotations (Sentence and Token type, but they do not behave
like sentences/tokens usually would - i.e. they are generated at random positions and so may arbitrarily 
overlap with each other). All annotations start/end within a range of [0-130] and have a random length between
0 and 30. The benchmarks fill the CAS multiple times with increasing numbers of annotations and perform the
selections repeatedly. If you want more details, check out the uimafit-benchmark module and run them yourself ;)
The first timing is the cumulative time spend by benchmark. The second timing is the longest duration of a
single execution.


As for insights: 

* Don't look at the times in terms of absolute values - rather consider how the time of one approach
behaves relative to the time of another approach.

* I find it quite interesting that selects are slower when using JCAS.select(type) than when using
JCAS.getAnnotationIndex(type).select(). I would expect both to run at the same speed.

* Contrary to previous benchmark results, we can see that the (J)CAS.select() is typically faster than
its uimaFIT counterpart with a few interesting exceptions.

* Note that there is no CAS.select().overlapping() equivalent to the JCasUtil.selectOverlapping (yet)


If you would like to see additional approaches measured or if you have ideas of how to improve the
informativeness or general setup of the benchmarks, let me know. For small changes, you could also
just open a PR on GitHub against uimaFIT master.

Cheers,

-- Richard


GROUP: select
=========================

Sorted by execution time:
  1136ms /    2ms -- JCAS.select(Token.class).forEach(x -> {})
  1231ms /    3ms -- JCasUtil.select(JCAS, Token.class).forEach(x -> {})
  2679ms /    4ms -- JCAS.select(TOP.class).forEach(x -> {})
  2703ms /    4ms -- JCAS.select().forEach(x -> {})
  3803ms /    6ms -- JCasUtil.select(JCAS, TOP.class).forEach(x -> {})
  3997ms /   16ms -- JCasUtil.selectAll(JCAS).forEach(x -> {})


GROUP: select covered by
=========================

Sorted by execution time:
    84ms /    5ms -- JCAS.getAnnotationIndex(Token.class).select().coveredBy(s).forEach(t -> {})
   134ms /   11ms -- JCasUtil.selectCovered(Token.class, s).forEach(t -> {})
   159ms /   11ms -- JCAS.select(Token.class).coveredBy(s).forEach(t -> {})
   836ms /   46ms -- JCAS.getAnnotationIndex(Token.class).stream().filter(t -> coveredBy(t, s)).forEach(t -> {})
   842ms /   46ms -- JCAS.select(Token.class).filter(t -> coveredBy(t, s)).forEach(t -> {})


GROUP: select covering
=========================

Sorted by execution time:
    98ms /    5ms -- JCAS.getAnnotationIndex(Token.class).select().covering(s).forEach(t -> {})
   109ms /    6ms -- CAS.getAnnotationIndex(getType(cas, TYPE_NAME_TOKEN)).select().covering(s).forEach(t -> {})
   157ms /    7ms -- CasUtil.selectCovering(tokenType, s).forEach(t -> {})
   170ms /   20ms -- JCasUtil.selectCovering(Token.class, s).forEach(t -> {})
   187ms /   14ms -- JCAS.select(Token.class).covering(s).forEach(t -> {})
   812ms /   47ms -- JCAS.select(Token.class).filter(t -> covering(t, s)).forEach(t -> {})
   862ms /   45ms -- CAS.getAnnotationIndex(getType(cas, TYPE_NAME_TOKEN)).stream().filter(t -> covering(t, s)).forEach(t -> {})
  1039ms /   65ms -- JCAS.getAnnotationIndex(Token.class).stream().filter(t -> covering(t, s)).forEach(t -> {})


GROUP: select at
=========================

Sorted by execution time:
    31ms /    2ms -- JCAS.select(Token.class).at(s).forEach(t -> {})
    65ms /    4ms -- JCAS.select(Token.class).at(s.getBegin(), s.getEnd()).forEach(t -> {})
   109ms /   29ms -- JCasUtil.selectAt(CAS, Token.class, s.getBegin(), s.getEnd()).forEach(t -> {})
   880ms /   41ms -- JCAS.getAnnotationIndex(Token.class).stream().filter(t -> colocated(t, s)).forEach(t -> {})
   936ms /   47ms -- JCAS.select(Token.class).filter(t -> colocated(t, s)).forEach(t -> {})


GROUP: select overlapping
=========================

Sorted by execution time:
   238ms /   34ms -- JCasUtil.selectOverlapping(JCAS, Token.class, s).forEach(t -> {})
   354ms /   22ms -- JCAS.getAnnotationIndex(Token.class).stream().filter(t -> overlapping(t, s)).forEach(t -> {})
   381ms /   24ms -- CAS.select(Token.class).filter(t -> overlapping(t, s)).forEach(t -> {})


Re: Annotation access speed comparison (sneak-peek, non-authoritative)

Posted by Richard Eckart de Castilho <re...@apache.org>.
On 6. Nov 2020, at 13:17, Mario Juric <ma...@cactusglobal.com> wrote:
> 
> I managed finally to clean up the messages after all, although my IDE send me on a detour at first, which almost prompted me to give up. However, it should follow the guide lines now.

Excellent. I have pushed my additional changes onto your branch. I hope you agree.

A tip for the commit messages: consider leaving an empty line between the issue number/title and the actual change list:

```
[UIMA-6291] Improve uimaFIT benchmarking module 

- Added select and selectAt benchmarks using getAnnotationIndex approach.
```

The empty line separates the "title" of the commit message from the "body".

No need to update the commit messages again though. Just a pointer for future commits.

It looks like the mail address you use for your commits is not associated with your GitHub account. 
To have the commits appear in your GitHub profile, you might care to add the mail address as an alias to your GitHub profile.

> BTW Did you see my selectAt bug report? https://issues.apache.org/jira/browse/UIMA-6294

Now I did ;) Thanks! I'll have a look at it.

-- Richard

Re: Annotation access speed comparison (sneak-peek, non-authoritative)

Posted by Richard Eckart de Castilho <re...@apache.org>.
On 11. Nov 2020, at 15:37, Mario Juric <ma...@cactusglobal.com> wrote:
> 
> Would it makes sense to provide CAS files based on real documents for the benchmarks? I mean, we could run segmenters on some OpenAccess documents, map the annotations to those used by the tests, and store them somewhere as XMI or CAS binaries. The test could then load them during the initialization phase before benchmarking.

That would be an option. But I think I would rather factor out the initialization of the
CASes in the benchmark such that different initialization strategies can be plugged in.
Then implementing a random CAS generator which would generate CASes according to a few
rule such as:

- partition the CAS into sentences with a length between 20 and 200 with a kind of a
  bell distribution of sentence length
- partition the sentences into tokens between 2 to 20 chars also with a bell
  distribution

That would be a bit more realistic for a benchmark like "find the sentence covering
the current token".

But actually, despite being a bit degenerated, I believe the current benchmark setup
could already be used to profile the code and see if / what might could possibly be
improved (if anything).

Conceptually, both uimaFIT selectCovered(Token, X) and SelectFS select(Token).coveredBy(X) work in the same way

1) seek to X in the annotation index using a binary search
2) linear search backwards in the index to the first match by offset (to ignore type prios)
3) iterate forward until the end of the selection window is reached

The main difference is that selectCovered does already collect all annotations it finds into a list in step 3
while SelectFS allows to stream over the annotations - although probably people will use SelectFS with .asArray()
or .asList() which again makes it equivalent to the uimaFIT approach. That is why in the benchmark, I put a .forEach()
at the end of the selection to make it a bit more comparable.

-- Richard

Re: Annotation access speed comparison (sneak-peek, non-authoritative)

Posted by Mario Juric <ma...@cactusglobal.com>.
Thanks Richard,

Would it makes sense to provide CAS files based on real documents for the benchmarks? I mean, we could run segmenters on some OpenAccess documents, map the annotations to those used by the tests, and store them somewhere as XMI or CAS binaries. The test could then load them during the initialization phase before benchmarking.

Cheers
Mario

> On 11 Nov 2020, at 15.26, Richard Eckart de Castilho <re...@apache.org> wrote:
>
> External email – Do not click links or open attachments unless you recognize the sender and know that the content is safe.
>
>
> Hi Mario,
>
>> On 11. Nov 2020, at 09:11, Mario Juric <ma...@cactusglobal.com> wrote:
>>
>> I ran the latest benchmarks, and they seem to confirm your initial conclusions, that the JCasUtil.selectCovered method perform better than the corresponding SelectFSs coveredBy method. I somehow picked up the idea that the new select API should improve performance, which it does for other select calls, but it is not the case for coveredBy. I was wondering whether you have some ideas as to why the new API performance isn't closer to the JCasUtil.selectCovered method?
>
> First, the benchmarks are not really very representative at the moment. They create annotation structures that are highly overlapping and very dense - something that you wouldn't see in real life normally. So some operations show up slower in the benchmarks than they would feel in real life. The benchmarks should probably be adjusted to test against different types of structures.
>
> The select methods of uimaFIT are very lightweight and much less flexible/configurable than SelectFS. If you look at cases where you have small cases and a very high frequency of method calls, the uimaFIT methods are likely to be better because they have a lower setup cost.
>
> For larger CASes and lower call frequencies, the SelectFS can be better. In particular if you can iterate over the results (and maybe stop before having iterated over all results) instead of retrieving the results as a list. SelectFS tries hard to not calculate the full result list while uimaFIT will usually calculate and return the full result list.
>
> There may be more to it, but that's my insight/intuition for the moment.
>
> Cheers,
>
> -- Richard

________________________________
Disclaimer:
This email and any files transmitted with it are confidential and directed solely for the use of the intended addressee or addressees and may contain information that is legally privileged, confidential, and exempt from disclosure. If you have received this email in error, please notify the sender by telephone, fax, or return email and immediately delete this email and any files transmitted along with it. Unintended recipients are not authorized to disclose, disseminate, distribute, copy or take any action in reliance on information contained in this email and/or any files attached thereto, in any manner other than to notify the sender; any unauthorized use is subject to legal prosecution.


Re: Annotation access speed comparison (sneak-peek, non-authoritative)

Posted by Richard Eckart de Castilho <re...@apache.org>.
Hi Mario,

> On 11. Nov 2020, at 09:11, Mario Juric <ma...@cactusglobal.com> wrote:
> 
> I ran the latest benchmarks, and they seem to confirm your initial conclusions, that the JCasUtil.selectCovered method perform better than the corresponding SelectFSs coveredBy method. I somehow picked up the idea that the new select API should improve performance, which it does for other select calls, but it is not the case for coveredBy. I was wondering whether you have some ideas as to why the new API performance isn't closer to the JCasUtil.selectCovered method?

First, the benchmarks are not really very representative at the moment. They create annotation structures that are highly overlapping and very dense - something that you wouldn't see in real life normally. So some operations show up slower in the benchmarks than they would feel in real life. The benchmarks should probably be adjusted to test against different types of structures.

The select methods of uimaFIT are very lightweight and much less flexible/configurable than SelectFS. If you look at cases where you have small cases and a very high frequency of method calls, the uimaFIT methods are likely to be better because they have a lower setup cost.

For larger CASes and lower call frequencies, the SelectFS can be better. In particular if you can iterate over the results (and maybe stop before having iterated over all results) instead of retrieving the results as a list. SelectFS tries hard to not calculate the full result list while uimaFIT will usually calculate and return the full result list.

There may be more to it, but that's my insight/intuition for the moment.

Cheers,

-- Richard

Re: Annotation access speed comparison (sneak-peek, non-authoritative)

Posted by Mario Juric <ma...@cactusglobal.com>.
Hi Richard,

I ran the latest benchmarks, and they seem to confirm your initial conclusions, that the JCasUtil.selectCovered method perform better than the corresponding SelectFSs coveredBy method. I somehow picked up the idea that the new select API should improve performance, which it does for other select calls, but it is not the case for coveredBy. I was wondering whether you have some ideas as to why the new API performance isn't closer to the JCasUtil.selectCovered method?

Cheers
Mario


________________________________
Disclaimer:
This email and any files transmitted with it are confidential and directed solely for the use of the intended addressee or addressees and may contain information that is legally privileged, confidential, and exempt from disclosure. If you have received this email in error, please notify the sender by telephone, fax, or return email and immediately delete this email and any files transmitted along with it. Unintended recipients are not authorized to disclose, disseminate, distribute, copy or take any action in reliance on information contained in this email and/or any files attached thereto, in any manner other than to notify the sender; any unauthorized use is subject to legal prosecution.


Re: Annotation access speed comparison (sneak-peek, non-authoritative)

Posted by Mario Juric <ma...@cactusglobal.com>.
Hi Richard,

>
> It would be great if we could just do a PR with your changes, but the
> commit messages are supposed to have a specific format including a
> Jira issue number. So at least the commit messages would need to be
> rewritten or the PR would need to be squash-merged. The format of the
> commit messages should look as follows:
>
> ```
> [UIMA-6291] Improve uimaFIT benchmarking module
>
> - Remove unused dependency
> - ... other changes
> - ... other changes

I managed finally to clean up the messages after all, although my IDE send me on a detour at first, which almost prompted me to give up. However, it should follow the guide lines now.


>
> That might be because the CASes are only generated to by max 130 offset positions long leading to a quite high density of annotations
> and thus t-coveredBy-s may not be able to profit too much from its abiltiy to seek fast and to abort early. Probably the benchmark
> shoud be adjusted to expand the offset space as more annotations get added for a lower annotation density.
>
> ... or there is some measurement/calculation bug …


It Is indeed strange that there shouldn’t be a larger difference. I am running it from my branch, which is based on an older version of the benchmarks, but with user time, and I have a factor 11 in difference, which is more like what I would expect.

BTW Did you see my selectAt bug report? https://issues.apache.org/jira/browse/UIMA-6294

Cheers
Mario


________________________________
Disclaimer:
This email and any files transmitted with it are confidential and directed solely for the use of the intended addressee or addressees and may contain information that is legally privileged, confidential, and exempt from disclosure. If you have received this email in error, please notify the sender by telephone, fax, or return email and immediately delete this email and any files transmitted along with it. Unintended recipients are not authorized to disclose, disseminate, distribute, copy or take any action in reliance on information contained in this email and/or any files attached thereto, in any manner other than to notify the sender; any unauthorized use is subject to legal prosecution.


Re: Annotation access speed comparison (sneak-peek, non-authoritative)

Posted by Mario Juric <ma...@cactusglobal.com>.
Thanks Richard,

I’ll fix the commit messages and do a regular PR later today, which will also includes some extra variants that I added. You can then let me know what additional changes I need to do.

Cheers
Mario


> On 6 Nov 2020, at 01.36, Richard Eckart de Castilho <re...@apache.org> wrote:
>
> External email – Do not click links or open attachments unless you recognize the sender and know that the content is safe.
>
>
> Hi,
>
> thanks for the changes. Measuring at a higher resolution makes sense and
> measuring user time instead of system time also makes sense.
>
> I tweaked the benchmark a bit more, added the option to have a max duration for each
> batch and a few other changes.
>
> It would be great if we could just do a PR with your changes, but the
> commit messages are supposed to have a specific format including a
> Jira issue number. So at least the commit messages would need to be
> rewritten or the PR would need to be squash-merged. The format of the
> commit messages should look as follows:
>
> ```
> [UIMA-6291] Improve uimaFIT benchmarking module
>
> - Remove unused dependency
> - ... other changes
> - ... other changes
> ```
>
>> On 4. Nov 2020, at 17:34, Mario Juric <ma...@cactusglobal.com> wrote:
>>
>> Sorted by execution time:
>> 19308000ns / 7667000ns -- WARM-UP
>> 97388000ns / 3862000ns -- JCAS.getAnnotationIndex(Token.class).select().coveredBy(s.getBegin(), s.getEnd()).forEach(t -> {})
>> 114012000ns / 7585000ns -- JCAS.getAnnotationIndex(Token.class).select().coveredBy(s).forEach(t -> {})
>> 139689000ns / 11349000ns -- JCAS.select(Token.class).coveredBy(s.getBegin(), s.getEnd()).forEach(t -> {})
>> 158952000ns / 30894000ns -- selectCovered(JCAS, Token.class, s.getBegin(), s.getEnd()).forEach(t -> {})
>> 189711000ns / 12981000ns -- selectCovered(Token.class, s).forEach(t -> {})
>> 233259000ns / 12051000ns -- JCAS.select(Token.class).coveredBy(s).forEach(t -> {})
>> 737271000ns / 30204000ns -- JCAS.getAnnotationIndex(Token.class).stream().filter(t -> coveredBy(t, s)).forEach(t -> {})
>> 846053000ns / 45272000ns -- JCAS.select(Token.class).filter(t -> coveredBy(t, s)).forEach(t -> {})
>
> So I didn't commit my changes anywhere yet either because I made them on top of your branch for the moment ;)
>
> But here are a few runs with the setup I did. I tried to make it so that the order of the methods is more stable,
> but as you can see they still switch around a bit.
>
> == Run 1
> AVG: 0.432ms (10 seconds total) -- selectCovered(Token.class, s).forEach(t -> {})
> AVG: 0.467ms (10 seconds total) -- JCAS.getAnnotationIndex(Token.class).select().coveredBy(s).forEach(t -> {})
> AVG: 0.474ms (10 seconds total) -- JCAS.select(Token.class).coveredBy(s).forEach(t -> {})
> AVG: 0.475ms (10 seconds total) -- selectCovered(JCAS, Token.class, s.getBegin(), s.getEnd()).forEach(t -> {})
> AVG: 0.487ms (10 seconds total) -- JCAS.getAnnotationIndex(Token.class).select().coveredBy(s.getBegin(), s.getEnd()).forEach(t -> {})
> AVG: 0.492ms (10 seconds total) -- JCAS.select(Token.class).coveredBy(s.getBegin(), s.getEnd()).forEach(t -> {})
> AVG: 0.661ms (13 seconds total) -- JCAS.getAnnotationIndex(Token.class).stream().filter(t -> coveredBy(t, s)).forEach(t -> {})
> AVG: 0.701ms (14 seconds total) -- JCAS.select(Token.class).filter(t -> coveredBy(t, s)).forEach(t -> {})
>
> == Run 2
> AVG: 0.429ms (10 seconds total) -- selectCovered(Token.class, s).forEach(t -> {})
> AVG: 0.463ms (10 seconds total) -- JCAS.getAnnotationIndex(Token.class).select().coveredBy(s).forEach(t -> {})
> AVG: 0.465ms (10 seconds total) -- selectCovered(JCAS, Token.class, s.getBegin(), s.getEnd()).forEach(t -> {})
> AVG: 0.475ms (10 seconds total) -- JCAS.getAnnotationIndex(Token.class).select().coveredBy(s.getBegin(), s.getEnd()).forEach(t -> {})
> AVG: 0.477ms (10 seconds total) -- JCAS.select(Token.class).coveredBy(s).forEach(t -> {})
> AVG: 0.479ms (10 seconds total) -- JCAS.select(Token.class).coveredBy(s.getBegin(), s.getEnd()).forEach(t -> {})
> AVG: 0.654ms (13 seconds total) -- JCAS.getAnnotationIndex(Token.class).stream().filter(t -> coveredBy(t, s)).forEach(t -> {})
> AVG: 0.700ms (14 seconds total) -- JCAS.select(Token.class).filter(t -> coveredBy(t, s)).forEach(t -> {})
>
> == Run 3
> AVG: 0.431ms (10 seconds total) -- selectCovered(Token.class, s).forEach(t -> {})
> AVG: 0.463ms (10 seconds total) -- JCAS.getAnnotationIndex(Token.class).select().coveredBy(s).forEach(t -> {})
> AVG: 0.464ms (10 seconds total) -- selectCovered(JCAS, Token.class, s.getBegin(), s.getEnd()).forEach(t -> {})
> AVG: 0.471ms (10 seconds total) -- JCAS.select(Token.class).coveredBy(s).forEach(t -> {})
> AVG: 0.488ms (10 seconds total) -- JCAS.getAnnotationIndex(Token.class).select().coveredBy(s.getBegin(), s.getEnd()).forEach(t -> {})
> AVG: 0.492ms (10 seconds total) -- JCAS.select(Token.class).coveredBy(s.getBegin(), s.getEnd()).forEach(t -> {})
> AVG: 0.653ms (13 seconds total) -- JCAS.getAnnotationIndex(Token.class).stream().filter(t -> coveredBy(t, s)).forEach(t -> {})
> AVG: 0.698ms (14 seconds total) -- JCAS.select(Token.class).filter(t -> coveredBy(t, s)).forEach(t -> {})
>
> The choice for a specific approach really needs to look at the scalability. E.g.:
>
> ======================================================
> JCAS.select(Token.class).coveredBy(s).forEach(t -> {})
> ======================================================
> Running benchmark... 10 100 1000 10000
> [     10/  10000 | min:      0.004 | max:      4.441 | median:      0.011 | cumulative:    211.239 | fail:    0]
> [    100/  10000 | min:      0.052 | max:      0.456 | median:      0.057 | cumulative:    681.855 | fail:    0]
> [   1000/   2493 | min:      1.723 | max:      5.437 | median:      1.891 | cumulative:   4927.217 | fail:    0 | time limit exceeded]
> [  10000/     37 | min:    127.892 | max:    146.936 | median:    135.230 | cumulative:   5024.777 | fail:    0 | time limit exceeded]
>
> vs.
>
> ======================================================================
> JCAS.select(Token.class).filter(t -> coveredBy(t, s)).forEach(t -> {})
> ======================================================================
> Running benchmark... 10 100 1000 10000
> [     10/  10000 | min:      0.005 | max:      2.201 | median:      0.007 | cumulative:     90.482 | fail:    0]
> [    100/  10000 | min:      0.275 | max:      0.774 | median:      0.277 | cumulative:   2858.870 | fail:    0]
> [   1000/    190 | min:     25.235 | max:     28.488 | median:     26.280 | cumulative:   4996.646 | fail:    0 | time limit exceeded]
> [  10000/      2 | min:   2698.091 | max:   2705.892 | median:   2701.992 | cumulative:   5403.983 | fail:    0 | time limit exceeded]
>
> The micro-average which is currently in the first column of the summary report doesn't really reflect how the second approach here scales in comparison to the first.
>
> Btw. the time limit was configured to be ~5 secs for each batch (each row in the summary).
>
> Considering the results for t-coveredBy-s, I am slightly surprised that t-covering-s is per results not so much slower:
>
> =====================================================
> JCAS.select(Token.class).covering(s).forEach(t -> {})
> =====================================================
> Running benchmark... 10 100 1000 10000
> [     10/  10000 | min:      0.004 | max:      3.467 | median:      0.009 | cumulative:    167.222 | fail:    0]
> [    100/  10000 | min:      0.059 | max:      0.800 | median:      0.060 | cumulative:    638.144 | fail:    0]
> [   1000/   2254 | min:      1.995 | max:     18.008 | median:      2.194 | cumulative:   4978.578 | fail:    0 | time limit exceeded]
> [  10000/     33 | min:    150.032 | max:    158.695 | median:    155.340 | cumulative:   5121.281 | fail:    0 | time limit exceeded]
>
> That might be because the CASes are only generated to by max 130 offset positions long leading to a quite high density of annotations
> and thus t-coveredBy-s may not be able to profit too much from its abiltiy to seek fast and to abort early. Probably the benchmark
> shoud be adjusted to expand the offset space as more annotations get added for a lower annotation density.
>
> ... or there is some measurement/calculation bug ...
>
> Cheers,
>
> -- Richard


________________________________
Disclaimer:
This email and any files transmitted with it are confidential and directed solely for the use of the intended addressee or addressees and may contain information that is legally privileged, confidential, and exempt from disclosure. If you have received this email in error, please notify the sender by telephone, fax, or return email and immediately delete this email and any files transmitted along with it. Unintended recipients are not authorized to disclose, disseminate, distribute, copy or take any action in reliance on information contained in this email and/or any files attached thereto, in any manner other than to notify the sender; any unauthorized use is subject to legal prosecution.


Re: Annotation access speed comparison (sneak-peek, non-authoritative)

Posted by Richard Eckart de Castilho <re...@apache.org>.
Hi,

thanks for the changes. Measuring at a higher resolution makes sense and
measuring user time instead of system time also makes sense.

I tweaked the benchmark a bit more, added the option to have a max duration for each
batch and a few other changes.

It would be great if we could just do a PR with your changes, but the 
commit messages are supposed to have a specific format including a
Jira issue number. So at least the commit messages would need to be
rewritten or the PR would need to be squash-merged. The format of the
commit messages should look as follows:

```
[UIMA-6291] Improve uimaFIT benchmarking module

- Remove unused dependency
- ... other changes
- ... other changes
```

> On 4. Nov 2020, at 17:34, Mario Juric <ma...@cactusglobal.com> wrote:
> 
> Sorted by execution time:
> 19308000ns / 7667000ns -- WARM-UP
> 97388000ns / 3862000ns -- JCAS.getAnnotationIndex(Token.class).select().coveredBy(s.getBegin(), s.getEnd()).forEach(t -> {})
> 114012000ns / 7585000ns -- JCAS.getAnnotationIndex(Token.class).select().coveredBy(s).forEach(t -> {})
> 139689000ns / 11349000ns -- JCAS.select(Token.class).coveredBy(s.getBegin(), s.getEnd()).forEach(t -> {})
> 158952000ns / 30894000ns -- selectCovered(JCAS, Token.class, s.getBegin(), s.getEnd()).forEach(t -> {})
> 189711000ns / 12981000ns -- selectCovered(Token.class, s).forEach(t -> {})
> 233259000ns / 12051000ns -- JCAS.select(Token.class).coveredBy(s).forEach(t -> {})
> 737271000ns / 30204000ns -- JCAS.getAnnotationIndex(Token.class).stream().filter(t -> coveredBy(t, s)).forEach(t -> {})
> 846053000ns / 45272000ns -- JCAS.select(Token.class).filter(t -> coveredBy(t, s)).forEach(t -> {})

So I didn't commit my changes anywhere yet either because I made them on top of your branch for the moment ;)

But here are a few runs with the setup I did. I tried to make it so that the order of the methods is more stable,
but as you can see they still switch around a bit.

== Run 1
AVG: 0.432ms (10 seconds total) -- selectCovered(Token.class, s).forEach(t -> {})
AVG: 0.467ms (10 seconds total) -- JCAS.getAnnotationIndex(Token.class).select().coveredBy(s).forEach(t -> {})
AVG: 0.474ms (10 seconds total) -- JCAS.select(Token.class).coveredBy(s).forEach(t -> {})
AVG: 0.475ms (10 seconds total) -- selectCovered(JCAS, Token.class, s.getBegin(), s.getEnd()).forEach(t -> {})
AVG: 0.487ms (10 seconds total) -- JCAS.getAnnotationIndex(Token.class).select().coveredBy(s.getBegin(), s.getEnd()).forEach(t -> {})
AVG: 0.492ms (10 seconds total) -- JCAS.select(Token.class).coveredBy(s.getBegin(), s.getEnd()).forEach(t -> {})
AVG: 0.661ms (13 seconds total) -- JCAS.getAnnotationIndex(Token.class).stream().filter(t -> coveredBy(t, s)).forEach(t -> {})
AVG: 0.701ms (14 seconds total) -- JCAS.select(Token.class).filter(t -> coveredBy(t, s)).forEach(t -> {})

== Run 2
AVG: 0.429ms (10 seconds total) -- selectCovered(Token.class, s).forEach(t -> {})
AVG: 0.463ms (10 seconds total) -- JCAS.getAnnotationIndex(Token.class).select().coveredBy(s).forEach(t -> {})
AVG: 0.465ms (10 seconds total) -- selectCovered(JCAS, Token.class, s.getBegin(), s.getEnd()).forEach(t -> {})
AVG: 0.475ms (10 seconds total) -- JCAS.getAnnotationIndex(Token.class).select().coveredBy(s.getBegin(), s.getEnd()).forEach(t -> {})
AVG: 0.477ms (10 seconds total) -- JCAS.select(Token.class).coveredBy(s).forEach(t -> {})
AVG: 0.479ms (10 seconds total) -- JCAS.select(Token.class).coveredBy(s.getBegin(), s.getEnd()).forEach(t -> {})
AVG: 0.654ms (13 seconds total) -- JCAS.getAnnotationIndex(Token.class).stream().filter(t -> coveredBy(t, s)).forEach(t -> {})
AVG: 0.700ms (14 seconds total) -- JCAS.select(Token.class).filter(t -> coveredBy(t, s)).forEach(t -> {})

== Run 3
AVG: 0.431ms (10 seconds total) -- selectCovered(Token.class, s).forEach(t -> {})
AVG: 0.463ms (10 seconds total) -- JCAS.getAnnotationIndex(Token.class).select().coveredBy(s).forEach(t -> {})
AVG: 0.464ms (10 seconds total) -- selectCovered(JCAS, Token.class, s.getBegin(), s.getEnd()).forEach(t -> {})
AVG: 0.471ms (10 seconds total) -- JCAS.select(Token.class).coveredBy(s).forEach(t -> {})
AVG: 0.488ms (10 seconds total) -- JCAS.getAnnotationIndex(Token.class).select().coveredBy(s.getBegin(), s.getEnd()).forEach(t -> {})
AVG: 0.492ms (10 seconds total) -- JCAS.select(Token.class).coveredBy(s.getBegin(), s.getEnd()).forEach(t -> {})
AVG: 0.653ms (13 seconds total) -- JCAS.getAnnotationIndex(Token.class).stream().filter(t -> coveredBy(t, s)).forEach(t -> {})
AVG: 0.698ms (14 seconds total) -- JCAS.select(Token.class).filter(t -> coveredBy(t, s)).forEach(t -> {})

The choice for a specific approach really needs to look at the scalability. E.g.:

======================================================
JCAS.select(Token.class).coveredBy(s).forEach(t -> {})
======================================================
Running benchmark... 10 100 1000 10000 
[     10/  10000 | min:      0.004 | max:      4.441 | median:      0.011 | cumulative:    211.239 | fail:    0]
[    100/  10000 | min:      0.052 | max:      0.456 | median:      0.057 | cumulative:    681.855 | fail:    0]
[   1000/   2493 | min:      1.723 | max:      5.437 | median:      1.891 | cumulative:   4927.217 | fail:    0 | time limit exceeded]
[  10000/     37 | min:    127.892 | max:    146.936 | median:    135.230 | cumulative:   5024.777 | fail:    0 | time limit exceeded]

vs.

======================================================================
JCAS.select(Token.class).filter(t -> coveredBy(t, s)).forEach(t -> {})
======================================================================
Running benchmark... 10 100 1000 10000 
[     10/  10000 | min:      0.005 | max:      2.201 | median:      0.007 | cumulative:     90.482 | fail:    0]
[    100/  10000 | min:      0.275 | max:      0.774 | median:      0.277 | cumulative:   2858.870 | fail:    0]
[   1000/    190 | min:     25.235 | max:     28.488 | median:     26.280 | cumulative:   4996.646 | fail:    0 | time limit exceeded]
[  10000/      2 | min:   2698.091 | max:   2705.892 | median:   2701.992 | cumulative:   5403.983 | fail:    0 | time limit exceeded]

The micro-average which is currently in the first column of the summary report doesn't really reflect how the second approach here scales in comparison to the first.

Btw. the time limit was configured to be ~5 secs for each batch (each row in the summary).

Considering the results for t-coveredBy-s, I am slightly surprised that t-covering-s is per results not so much slower:

=====================================================
JCAS.select(Token.class).covering(s).forEach(t -> {})
=====================================================
Running benchmark... 10 100 1000 10000 
[     10/  10000 | min:      0.004 | max:      3.467 | median:      0.009 | cumulative:    167.222 | fail:    0]
[    100/  10000 | min:      0.059 | max:      0.800 | median:      0.060 | cumulative:    638.144 | fail:    0]
[   1000/   2254 | min:      1.995 | max:     18.008 | median:      2.194 | cumulative:   4978.578 | fail:    0 | time limit exceeded]
[  10000/     33 | min:    150.032 | max:    158.695 | median:    155.340 | cumulative:   5121.281 | fail:    0 | time limit exceeded]

That might be because the CASes are only generated to by max 130 offset positions long leading to a quite high density of annotations
and thus t-coveredBy-s may not be able to profit too much from its abiltiy to seek fast and to abort early. Probably the benchmark
shoud be adjusted to expand the offset space as more annotations get added for a lower annotation density.

... or there is some measurement/calculation bug ...

Cheers,

-- Richard

Re: Annotation access speed comparison (sneak-peek, non-authoritative)

Posted by Mario Juric <ma...@cactusglobal.com>.
I also added selectCovered benchmarks with s.getBegin and s.getEnd. Weird, they seem to be even faster.

GROUP: select covered by
WARM-UP: Running benchmark... 10 100 1000
selectCovered(Token.class, s).forEach(t -> {}): Running benchmark... 10 100 1000
JCAS.select(Token.class).coveredBy(s).forEach(t -> {}): Running benchmark... 10 100 1000
JCAS.getAnnotationIndex(Token.class).select().coveredBy(s).forEach(t -> {}): Running benchmark... 10 100 1000
JCAS.select(Token.class).filter(t -> coveredBy(t, s)).forEach(t -> {}): Running benchmark... 10 100 1000
JCAS.getAnnotationIndex(Token.class).stream().filter(t -> coveredBy(t, s)).forEach(t -> {}): Running benchmark... 10 100 1000
JCAS.select(Token.class).coveredBy(s.getBegin(), s.getEnd()).forEach(t -> {}): Running benchmark... 10 100 1000
JCAS.getAnnotationIndex(Token.class).select().coveredBy(s.getBegin(), s.getEnd()).forEach(t -> {}): Running benchmark... 10 100 1000
selectCovered(JCAS, Token.class, s.getBegin(), s.getEnd()).forEach(t -> {}): Running benchmark... 10 100 1000

Sorted by execution time:
19308000ns / 7667000ns -- WARM-UP
97388000ns / 3862000ns -- JCAS.getAnnotationIndex(Token.class).select().coveredBy(s.getBegin(), s.getEnd()).forEach(t -> {})
114012000ns / 7585000ns -- JCAS.getAnnotationIndex(Token.class).select().coveredBy(s).forEach(t -> {})
139689000ns / 11349000ns -- JCAS.select(Token.class).coveredBy(s.getBegin(), s.getEnd()).forEach(t -> {})
158952000ns / 30894000ns -- selectCovered(JCAS, Token.class, s.getBegin(), s.getEnd()).forEach(t -> {})
189711000ns / 12981000ns -- selectCovered(Token.class, s).forEach(t -> {})
233259000ns / 12051000ns -- JCAS.select(Token.class).coveredBy(s).forEach(t -> {})
737271000ns / 30204000ns -- JCAS.getAnnotationIndex(Token.class).stream().filter(t -> coveredBy(t, s)).forEach(t -> {})
846053000ns / 45272000ns -- JCAS.select(Token.class).filter(t -> coveredBy(t, s)).forEach(t -> {})

Cheers
Mario


> On 4 Nov 2020, at 16.03, Mario Juric <ma...@cactusglobal.com> wrote:
>
> Hi Richard,
>
> This is really useful, and it makes me much more positive about the select API performance than the previous benchmarks suggested, but you need to know what calling approach works best, e.g. getting the annotation index first and doing select on this performs better.
>
> I added support for nano seconds CPU time to Benchmark and changed the SelectBenchmark to use user timer, which I think is much more accurate about the actual time spent in the select operations. However, in this case I don’t think it really changes the overall conclusion to the normal system clock version, although I haven’t compared the results line by line. You can find the branch here:
>
> https://github.com/mjunsilo/uima-uimafit/tree/mjuric/benchmark-cpu-time
>
> Do I need to make a Jira issue for this? Let me first know if there is any interest.
>
> Cheers
> Mario
>
>
> On 4 Nov 2020, at 11.08, Richard Eckart de Castilho <re...@apache.org>> wrote:
>
> External email – Do not click links or open attachments unless you recognize the sender and know that the content is safe.
>
>
> Hi all,
>
> for those who are interested - here are a few results of benchmarking the access to annotations in the CAS using different approaches.
>
> These were done on a Macbook Pro (i7 2,2 GHz) basically under working conditions (many applications open, etc.)
>
> Used versions:
> - uimaj-core:   3.1.2-SNAPSHOT (commit 099a2e0a9)
> - uimafit-core: 3.1.1-SNAPSHOT (commit 72895b5c8)
>
> The benchmarks basically fill a CAS with random annotations (Sentence and Token type, but they do not behave
> like sentences/tokens usually would - i.e. they are generated at random positions and so may arbitrarily
> overlap with each other). All annotations start/end within a range of [0-130] and have a random length between
> 0 and 30. The benchmarks fill the CAS multiple times with increasing numbers of annotations and perform the
> selections repeatedly. If you want more details, check out the uimafit-benchmark module and run them yourself ;)
> The first timing is the cumulative time spend by benchmark. The second timing is the longest duration of a
> single execution.
>
>
> As for insights:
>
> * Don't look at the times in terms of absolute values - rather consider how the time of one approach
> behaves relative to the time of another approach.
>
> * I find it quite interesting that selects are slower when using JCAS.select(type) than when using
> JCAS.getAnnotationIndex(type).select(). I would expect both to run at the same speed.
>
> * Contrary to previous benchmark results, we can see that the (J)CAS.select() is typically faster than
> its uimaFIT counterpart with a few interesting exceptions.
>
> * Note that there is no CAS.select().overlapping() equivalent to the JCasUtil.selectOverlapping (yet)
>
>
> If you would like to see additional approaches measured or if you have ideas of how to improve the
> informativeness or general setup of the benchmarks, let me know. For small changes, you could also
> just open a PR on GitHub against uimaFIT master.
>
> Cheers,
>
> -- Richard
>
>
> GROUP: select
> =========================
>
> Sorted by execution time:
> 1136ms /    2ms -- JCAS.select(Token.class).forEach(x -> {})
> 1231ms /    3ms -- JCasUtil.select(JCAS, Token.class).forEach(x -> {})
> 2679ms /    4ms -- JCAS.select(TOP.class).forEach(x -> {})
> 2703ms /    4ms -- JCAS.select().forEach(x -> {})
> 3803ms /    6ms -- JCasUtil.select(JCAS, TOP.class).forEach(x -> {})
> 3997ms /   16ms -- JCasUtil.selectAll(JCAS).forEach(x -> {})
>
>
> GROUP: select covered by
> =========================
>
> Sorted by execution time:
>   84ms /    5ms -- JCAS.getAnnotationIndex(Token.class).select().coveredBy(s).forEach(t -> {})
>  134ms /   11ms -- JCasUtil.selectCovered(Token.class, s).forEach(t -> {})
>  159ms /   11ms -- JCAS.select(Token.class).coveredBy(s).forEach(t -> {})
>  836ms /   46ms -- JCAS.getAnnotationIndex(Token.class).stream().filter(t -> coveredBy(t, s)).forEach(t -> {})
>  842ms /   46ms -- JCAS.select(Token.class).filter(t -> coveredBy(t, s)).forEach(t -> {})
>
>
> GROUP: select covering
> =========================
>
> Sorted by execution time:
>   98ms /    5ms -- JCAS.getAnnotationIndex(Token.class).select().covering(s).forEach(t -> {})
>  109ms /    6ms -- CAS.getAnnotationIndex(getType(cas, TYPE_NAME_TOKEN)).select().covering(s).forEach(t -> {})
>  157ms /    7ms -- CasUtil.selectCovering(tokenType, s).forEach(t -> {})
>  170ms /   20ms -- JCasUtil.selectCovering(Token.class, s).forEach(t -> {})
>  187ms /   14ms -- JCAS.select(Token.class).covering(s).forEach(t -> {})
>  812ms /   47ms -- JCAS.select(Token.class).filter(t -> covering(t, s)).forEach(t -> {})
>  862ms /   45ms -- CAS.getAnnotationIndex(getType(cas, TYPE_NAME_TOKEN)).stream().filter(t -> covering(t, s)).forEach(t -> {})
> 1039ms /   65ms -- JCAS.getAnnotationIndex(Token.class).stream().filter(t -> covering(t, s)).forEach(t -> {})
>
>
> GROUP: select at
> =========================
>
> Sorted by execution time:
>   31ms /    2ms -- JCAS.select(Token.class).at(s).forEach(t -> {})
>   65ms /    4ms -- JCAS.select(Token.class).at(s.getBegin(), s.getEnd()).forEach(t -> {})
>  109ms /   29ms -- JCasUtil.selectAt(CAS, Token.class, s.getBegin(), s.getEnd()).forEach(t -> {})
>  880ms /   41ms -- JCAS.getAnnotationIndex(Token.class).stream().filter(t -> colocated(t, s)).forEach(t -> {})
>  936ms /   47ms -- JCAS.select(Token.class).filter(t -> colocated(t, s)).forEach(t -> {})
>
>
> GROUP: select overlapping
> =========================
>
> Sorted by execution time:
>  238ms /   34ms -- JCasUtil.selectOverlapping(JCAS, Token.class, s).forEach(t -> {})
>  354ms /   22ms -- JCAS.getAnnotationIndex(Token.class).stream().filter(t -> overlapping(t, s)).forEach(t -> {})
>  381ms /   24ms -- CAS.select(Token.class).filter(t -> overlapping(t, s)).forEach(t -> {})
>
>
>
> ________________________________
> Disclaimer:
> This email and any files transmitted with it are confidential and directed solely for the use of the intended addressee or addressees and may contain information that is legally privileged, confidential, and exempt from disclosure. If you have received this email in error, please notify the sender by telephone, fax, or return email and immediately delete this email and any files transmitted along with it. Unintended recipients are not authorized to disclose, disseminate, distribute, copy or take any action in reliance on information contained in this email and/or any files attached thereto, in any manner other than to notify the sender; any unauthorized use is subject to legal prosecution.
>


________________________________
Disclaimer:
This email and any files transmitted with it are confidential and directed solely for the use of the intended addressee or addressees and may contain information that is legally privileged, confidential, and exempt from disclosure. If you have received this email in error, please notify the sender by telephone, fax, or return email and immediately delete this email and any files transmitted along with it. Unintended recipients are not authorized to disclose, disseminate, distribute, copy or take any action in reliance on information contained in this email and/or any files attached thereto, in any manner other than to notify the sender; any unauthorized use is subject to legal prosecution.


Re: Annotation access speed comparison (sneak-peek, non-authoritative)

Posted by Mario Juric <ma...@cactusglobal.com>.
Hi Richard,

This is really useful, and it makes me much more positive about the select API performance than the previous benchmarks suggested, but you need to know what calling approach works best, e.g. getting the annotation index first and doing select on this performs better.

I added support for nano seconds CPU time to Benchmark and changed the SelectBenchmark to use user timer, which I think is much more accurate about the actual time spent in the select operations. However, in this case I don’t think it really changes the overall conclusion to the normal system clock version, although I haven’t compared the results line by line. You can find the branch here:

https://github.com/mjunsilo/uima-uimafit/tree/mjuric/benchmark-cpu-time

Do I need to make a Jira issue for this? Let me first know if there is any interest.

Cheers
Mario


On 4 Nov 2020, at 11.08, Richard Eckart de Castilho <re...@apache.org>> wrote:

External email – Do not click links or open attachments unless you recognize the sender and know that the content is safe.


Hi all,

for those who are interested - here are a few results of benchmarking the access to annotations in the CAS using different approaches.

These were done on a Macbook Pro (i7 2,2 GHz) basically under working conditions (many applications open, etc.)

Used versions:
- uimaj-core:   3.1.2-SNAPSHOT (commit 099a2e0a9)
- uimafit-core: 3.1.1-SNAPSHOT (commit 72895b5c8)

The benchmarks basically fill a CAS with random annotations (Sentence and Token type, but they do not behave
like sentences/tokens usually would - i.e. they are generated at random positions and so may arbitrarily
overlap with each other). All annotations start/end within a range of [0-130] and have a random length between
0 and 30. The benchmarks fill the CAS multiple times with increasing numbers of annotations and perform the
selections repeatedly. If you want more details, check out the uimafit-benchmark module and run them yourself ;)
The first timing is the cumulative time spend by benchmark. The second timing is the longest duration of a
single execution.


As for insights:

* Don't look at the times in terms of absolute values - rather consider how the time of one approach
behaves relative to the time of another approach.

* I find it quite interesting that selects are slower when using JCAS.select(type) than when using
JCAS.getAnnotationIndex(type).select(). I would expect both to run at the same speed.

* Contrary to previous benchmark results, we can see that the (J)CAS.select() is typically faster than
its uimaFIT counterpart with a few interesting exceptions.

* Note that there is no CAS.select().overlapping() equivalent to the JCasUtil.selectOverlapping (yet)


If you would like to see additional approaches measured or if you have ideas of how to improve the
informativeness or general setup of the benchmarks, let me know. For small changes, you could also
just open a PR on GitHub against uimaFIT master.

Cheers,

-- Richard


GROUP: select
=========================

Sorted by execution time:
 1136ms /    2ms -- JCAS.select(Token.class).forEach(x -> {})
 1231ms /    3ms -- JCasUtil.select(JCAS, Token.class).forEach(x -> {})
 2679ms /    4ms -- JCAS.select(TOP.class).forEach(x -> {})
 2703ms /    4ms -- JCAS.select().forEach(x -> {})
 3803ms /    6ms -- JCasUtil.select(JCAS, TOP.class).forEach(x -> {})
 3997ms /   16ms -- JCasUtil.selectAll(JCAS).forEach(x -> {})


GROUP: select covered by
=========================

Sorted by execution time:
   84ms /    5ms -- JCAS.getAnnotationIndex(Token.class).select().coveredBy(s).forEach(t -> {})
  134ms /   11ms -- JCasUtil.selectCovered(Token.class, s).forEach(t -> {})
  159ms /   11ms -- JCAS.select(Token.class).coveredBy(s).forEach(t -> {})
  836ms /   46ms -- JCAS.getAnnotationIndex(Token.class).stream().filter(t -> coveredBy(t, s)).forEach(t -> {})
  842ms /   46ms -- JCAS.select(Token.class).filter(t -> coveredBy(t, s)).forEach(t -> {})


GROUP: select covering
=========================

Sorted by execution time:
   98ms /    5ms -- JCAS.getAnnotationIndex(Token.class).select().covering(s).forEach(t -> {})
  109ms /    6ms -- CAS.getAnnotationIndex(getType(cas, TYPE_NAME_TOKEN)).select().covering(s).forEach(t -> {})
  157ms /    7ms -- CasUtil.selectCovering(tokenType, s).forEach(t -> {})
  170ms /   20ms -- JCasUtil.selectCovering(Token.class, s).forEach(t -> {})
  187ms /   14ms -- JCAS.select(Token.class).covering(s).forEach(t -> {})
  812ms /   47ms -- JCAS.select(Token.class).filter(t -> covering(t, s)).forEach(t -> {})
  862ms /   45ms -- CAS.getAnnotationIndex(getType(cas, TYPE_NAME_TOKEN)).stream().filter(t -> covering(t, s)).forEach(t -> {})
 1039ms /   65ms -- JCAS.getAnnotationIndex(Token.class).stream().filter(t -> covering(t, s)).forEach(t -> {})


GROUP: select at
=========================

Sorted by execution time:
   31ms /    2ms -- JCAS.select(Token.class).at(s).forEach(t -> {})
   65ms /    4ms -- JCAS.select(Token.class).at(s.getBegin(), s.getEnd()).forEach(t -> {})
  109ms /   29ms -- JCasUtil.selectAt(CAS, Token.class, s.getBegin(), s.getEnd()).forEach(t -> {})
  880ms /   41ms -- JCAS.getAnnotationIndex(Token.class).stream().filter(t -> colocated(t, s)).forEach(t -> {})
  936ms /   47ms -- JCAS.select(Token.class).filter(t -> colocated(t, s)).forEach(t -> {})


GROUP: select overlapping
=========================

Sorted by execution time:
  238ms /   34ms -- JCasUtil.selectOverlapping(JCAS, Token.class, s).forEach(t -> {})
  354ms /   22ms -- JCAS.getAnnotationIndex(Token.class).stream().filter(t -> overlapping(t, s)).forEach(t -> {})
  381ms /   24ms -- CAS.select(Token.class).filter(t -> overlapping(t, s)).forEach(t -> {})



________________________________
Disclaimer:
This email and any files transmitted with it are confidential and directed solely for the use of the intended addressee or addressees and may contain information that is legally privileged, confidential, and exempt from disclosure. If you have received this email in error, please notify the sender by telephone, fax, or return email and immediately delete this email and any files transmitted along with it. Unintended recipients are not authorized to disclose, disseminate, distribute, copy or take any action in reliance on information contained in this email and/or any files attached thereto, in any manner other than to notify the sender; any unauthorized use is subject to legal prosecution.