You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@stanbol.apache.org by Joseph M'Bimbi-Bene <jb...@object-ive.com> on 2013/10/29 15:30:23 UTC

timeouts on integration tests

Hello everybody,

I'm having a problem with Stanbol trying to enhance a lot of somewhat
"large" documents (40000 to 60000 characters).

Depending on the enhancement chain i use, i get a timeouts earlier or
later. The timeouts is configured by default
(langdetect + token + pos + sentence + dbPedia) = timeout after like the
10th enhancement request.
(langdetect + token + dbPedia) = timeout after 10 min. something like that.

I monitored Stanbol in the first case (langdetect + token + pos + sentence
+ dbPedia) with Yourkit Java Profiler.

I noticed that CPU wise, the hotspots are
  -opennlp.tools.util.BeamSearch.bestSequences(int, Object[], Object[],
double)  with 11% of the time spent.
  -opennlp.tools.util.Sequence.<init>(Sequence, String, double) 2%.

Memory wise, the hotspots are:
  -opennlp.tools.util.BeamSearch.bestSequences(int, Object[], Object[],
double)  with 12% of space taken.

I modified the following parameters inf the
{stanbol-working-dir}\stanbol\config\org\apache\felix\eventadmin\impl\EventAdmin.config
file.
org.apache.felix.eventadmin.ThreadPoolSize="100"
org.apache.felix.eventadmin.CacheSize="2048"

I kinda felt that it would delay the timeouts.

Anyway, I noticed that there would be A LOT of threads being created, then
immidiately going to "waiting" state, then dying after 60 seconds, exactly
the "stanbol.maxEnhancementJobWaitTime" parameter.

What other information can i provide ?

Re: timeouts on integration tests

Posted by Joseph M'Bimbi-Bene <jb...@object-ive.com>.

Hello, thanks you for your reply. I did what you told and I still have
these timeouts. I will send you screenshots from yourkit so you can have a
better insight. I will also share the yourkit snapshots via dropbox.


On Wed, Oct 30, 2013 at 7:11 AM, Rupert Westenthaler <
rupert.westenthaler@gmail.com> wrote:

> Hi Joseph,
>
> I am not sure that this indicates a bug in the EventJobManager. It
> could as well be the case that the requests are simple timing out
> because the chain does not finish within the 60sec.
>
> Possible reasons could be:
>
> * Entityhub linking on slow HDD (e.g. a laptop without SSD) can be
> slow. Especially if "Proper Noun Linking" is deactivated (meaning that
> all Nouns are marches with the Vocabulary) processing large documents
> will be time consuming. Having a lot of concurrent requests will
> increase the processing time additionally (as HDD IO is limited and
> does not scale with concurrent requests).
> * As ContentItems are kept in-memory heap size may also be the cause
> of the issue. Having concurrent requests with large documents will
> require additional memory. If Stanbol runs into low memory situations
> processing times can dramatically increase.
>
> I would suggest to:
>
> 1. try to increase the heap (-Xmx parameter)
> 2. try to configure a chain without EntityLinking (e.g. langdetect
> plus the openNLP engines) to check if the EventJobManager
> implementation is the cause of your problem.
>
> best
> Rupert
>
>
> On Tue, Oct 29, 2013 at 3:39 PM, Joseph M'Bimbi-Bene
> <jb...@object-ive.com> wrote:
> > Another interesting fact is that looking at the "monitor usage", most of
> > the blocker threads (50% of the time) have the following stack trace :
> >   -java.util.Currency.getInstance(String, int, int)
> >
> >
> -org.apache.felix.eventadmin.impl.tasks.AsyncDeliverTasks$TaskExecuter.run()
> >
> >       -EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run()
> >         -java.lang.Thread.run()
> >
> > I have the version 1.2.14 of felix.eventAdmin but in the source code,
> there
> > is no call to Currency.getInstance.
> >
> >
> > On Tue, Oct 29, 2013 at 3:30 PM, Joseph M'Bimbi-Bene
> > <jb...@object-ive.com>wrote:
> >
> >> Hello everybody,
> >>
> >> I'm having a problem with Stanbol trying to enhance a lot of somewhat
> >> "large" documents (40000 to 60000 characters).
> >>
> >> Depending on the enhancement chain i use, i get a timeouts earlier or
> >> later. The timeouts is configured by default
> >> (langdetect + token + pos + sentence + dbPedia) = timeout after like the
> >> 10th enhancement request.
> >> (langdetect + token + dbPedia) = timeout after 10 min. something like
> that.
> >>
> >> I monitored Stanbol in the first case (langdetect + token + pos +
> sentence
> >> + dbPedia) with Yourkit Java Profiler.
> >>
> >> I noticed that CPU wise, the hotspots are
> >>   -opennlp.tools.util.BeamSearch.bestSequences(int, Object[], Object[],
> >> double)  with 11% of the time spent.
> >>   -opennlp.tools.util.Sequence.<init>(Sequence, String, double) 2%.
> >>
> >> Memory wise, the hotspots are:
> >>   -opennlp.tools.util.BeamSearch.bestSequences(int, Object[], Object[],
> >> double)  with 12% of space taken.
> >>
> >> I modified the following parameters inf the
> >>
> {stanbol-working-dir}\stanbol\config\org\apache\felix\eventadmin\impl\EventAdmin.config
> >> file.
> >> org.apache.felix.eventadmin.ThreadPoolSize="100"
> >> org.apache.felix.eventadmin.CacheSize="2048"
> >>
> >> I kinda felt that it would delay the timeouts.
> >>
> >> Anyway, I noticed that there would be A LOT of threads being created,
> then
> >> immidiately going to "waiting" state, then dying after 60 seconds,
> exactly
> >> the "stanbol.maxEnhancementJobWaitTime" parameter.
> >>
> >> What other information can i provide ?
> >>
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>

Re: timeouts on integration tests

Posted by Rupert Westenthaler <ru...@gmail.com>.

Hi Joseph,

I am not sure that this indicates a bug in the EventJobManager. It
could as well be the case that the requests are simple timing out
because the chain does not finish within the 60sec.

Possible reasons could be:

* Entityhub linking on slow HDD (e.g. a laptop without SSD) can be
slow. Especially if "Proper Noun Linking" is deactivated (meaning that
all Nouns are marches with the Vocabulary) processing large documents
will be time consuming. Having a lot of concurrent requests will
increase the processing time additionally (as HDD IO is limited and
does not scale with concurrent requests).
* As ContentItems are kept in-memory heap size may also be the cause
of the issue. Having concurrent requests with large documents will
require additional memory. If Stanbol runs into low memory situations
processing times can dramatically increase.

I would suggest to:

1. try to increase the heap (-Xmx parameter)
2. try to configure a chain without EntityLinking (e.g. langdetect
plus the openNLP engines) to check if the EventJobManager
implementation is the cause of your problem.

best
Rupert

On Tue, Oct 29, 2013 at 3:39 PM, Joseph M'Bimbi-Bene
<jb...@object-ive.com> wrote:
> Another interesting fact is that looking at the "monitor usage", most of
> the blocker threads (50% of the time) have the following stack trace :
>   -java.util.Currency.getInstance(String, int, int)
>
> -org.apache.felix.eventadmin.impl.tasks.AsyncDeliverTasks$TaskExecuter.run()
>
>       -EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run()
>         -java.lang.Thread.run()
>
> I have the version 1.2.14 of felix.eventAdmin but in the source code, there
> is no call to Currency.getInstance.
>
>
> On Tue, Oct 29, 2013 at 3:30 PM, Joseph M'Bimbi-Bene
> <jb...@object-ive.com>wrote:
>
>> Hello everybody,
>>
>> I'm having a problem with Stanbol trying to enhance a lot of somewhat
>> "large" documents (40000 to 60000 characters).
>>
>> Depending on the enhancement chain i use, i get a timeouts earlier or
>> later. The timeouts is configured by default
>> (langdetect + token + pos + sentence + dbPedia) = timeout after like the
>> 10th enhancement request.
>> (langdetect + token + dbPedia) = timeout after 10 min. something like that.
>>
>> I monitored Stanbol in the first case (langdetect + token + pos + sentence
>> + dbPedia) with Yourkit Java Profiler.
>>
>> I noticed that CPU wise, the hotspots are
>>   -opennlp.tools.util.BeamSearch.bestSequences(int, Object[], Object[],
>> double)  with 11% of the time spent.
>>   -opennlp.tools.util.Sequence.<init>(Sequence, String, double) 2%.
>>
>> Memory wise, the hotspots are:
>>   -opennlp.tools.util.BeamSearch.bestSequences(int, Object[], Object[],
>> double)  with 12% of space taken.
>>
>> I modified the following parameters inf the
>> {stanbol-working-dir}\stanbol\config\org\apache\felix\eventadmin\impl\EventAdmin.config
>> file.
>> org.apache.felix.eventadmin.ThreadPoolSize="100"
>> org.apache.felix.eventadmin.CacheSize="2048"
>>
>> I kinda felt that it would delay the timeouts.
>>
>> Anyway, I noticed that there would be A LOT of threads being created, then
>> immidiately going to "waiting" state, then dying after 60 seconds, exactly
>> the "stanbol.maxEnhancementJobWaitTime" parameter.
>>
>> What other information can i provide ?
>>

-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: timeouts on integration tests

Posted by Joseph M'Bimbi-Bene <jb...@object-ive.com>.

Another interesting fact is that looking at the "monitor usage", most of
the blocker threads (50% of the time) have the following stack trace :
  -java.util.Currency.getInstance(String, int, int)

-org.apache.felix.eventadmin.impl.tasks.AsyncDeliverTasks$TaskExecuter.run()

      -EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run()
        -java.lang.Thread.run()

I have the version 1.2.14 of felix.eventAdmin but in the source code, there
is no call to Currency.getInstance.


On Tue, Oct 29, 2013 at 3:30 PM, Joseph M'Bimbi-Bene
<jb...@object-ive.com>wrote:

> Hello everybody,
>
> I'm having a problem with Stanbol trying to enhance a lot of somewhat
> "large" documents (40000 to 60000 characters).
>
> Depending on the enhancement chain i use, i get a timeouts earlier or
> later. The timeouts is configured by default
> (langdetect + token + pos + sentence + dbPedia) = timeout after like the
> 10th enhancement request.
> (langdetect + token + dbPedia) = timeout after 10 min. something like that.
>
> I monitored Stanbol in the first case (langdetect + token + pos + sentence
> + dbPedia) with Yourkit Java Profiler.
>
> I noticed that CPU wise, the hotspots are
>   -opennlp.tools.util.BeamSearch.bestSequences(int, Object[], Object[],
> double)  with 11% of the time spent.
>   -opennlp.tools.util.Sequence.<init>(Sequence, String, double) 2%.
>
> Memory wise, the hotspots are:
>   -opennlp.tools.util.BeamSearch.bestSequences(int, Object[], Object[],
> double)  with 12% of space taken.
>
> I modified the following parameters inf the
> {stanbol-working-dir}\stanbol\config\org\apache\felix\eventadmin\impl\EventAdmin.config
> file.
> org.apache.felix.eventadmin.ThreadPoolSize="100"
> org.apache.felix.eventadmin.CacheSize="2048"
>
> I kinda felt that it would delay the timeouts.
>
> Anyway, I noticed that there would be A LOT of threads being created, then
> immidiately going to "waiting" state, then dying after 60 seconds, exactly
> the "stanbol.maxEnhancementJobWaitTime" parameter.
>
> What other information can i provide ?
>