You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stanbol.apache.org by harish suvarna <hs...@gmail.com> on 2013/03/02 13:23:41 UTC

Opennlp thread safety in Stanbol

OpenNLP documentation says postagger and tokenizer etc are not thread safe.
Couple of Internet posts and OpenNLP discussion forums also indicate this.
How is Stanbol using OpenNLP to make it thread safe? Do you use java
synchonised or thread-local or any java locking to make it thread safe.?
I have not ran into this thread safe issues in Stanbol yet.  Opennlp guy
says create one instance of opennlp components per thread.

http://grokbase.com/t/opennlp/dev/1176mzaen1/thread-safety-or-lack-thereof
-- 
Thanks
Harish

Re: Opennlp thread safety in Stanbol

Posted by harish suvarna <hs...@gmail.com>.
Got it Rupert. Thanks.
-harish

On Sat, Mar 2, 2013 at 6:56 AM, Rupert Westenthaler <
rupert.westenthaler@gmail.com> wrote:

> On Sat, Mar 2, 2013 at 2:23 PM, harish suvarna <hs...@gmail.com> wrote:
> > Rupert,
> > Who creates the one instance per thread specifically one opennlp
> > tokenizer/postagger per thread.? is it the
> >  commons.opennlp or Stanbol has its own code.?
>
> You must have misunderstood me.
>
> * Models are singletons that are used by all threads . SentenceModel,
> TokenizerModel, POSModel, ChunkerModel and TokenNameFinderModel are
> all singeltons. Those things do need a lot of memory so it good to
> have them as singletons.
> * SentenceDetectors, Tokenizers, POSTagger, Chunker and
> TokenNameFinders are created for each request (on top of the singleton
> models). Those are lightweight components so reusing them would not
> bring much of an advantage.
>
> The code for loading and managing the singelton models is part of the
> org.apache.stanbol.commons.opennlp module (see
> org.apache.stanbol.commons.opennlp.OpenNLP for details). But this
> class is mainly about
>
> * OSGI integration
> * using the Stanbol DataFileProvider [1] infrastructure for loading model
> files.
>
> and not to workaround some OpenNLP concurrency issues. Actually the
> way OpenNLP treats with concurrency seams to me just fine. I had much
> more troubles with concurrency when integrating Freeling [2] and
> Talismane [3] with Stanbol.
>
> best
> Rupert
>
> [1] http://stanbol.apache.org/docs/trunk/utils/datafileprovider
> [2] https://github.com/insideout10/stanbol-freeling
> [3] https://github.com/westei/stanbol-talismane
>
> >
> > -harish
> >
> > On Sat, Mar 2, 2013 at 5:14 AM, Rupert Westenthaler <
> > rupert.westenthaler@gmail.com> wrote:
> >
> >> Hi
> >>
> >> Stanbol uses a single instance of Models (e.g. POSModel). They are
> >> loaded and managed by the OpenNLP service (commons.opennlp module).
> >> Stanbol does not reuse OpenNLP Tagger, Finder, ... objects build on
> >> top of models (e.g. POSTagger on top of the PosModel). So each request
> >> will create a new instance. This is exactly because PostTagger,
> >> Tokenizers ... are not thread safe (as stated by the documentation).
> >> As the documentation also mentions hat those objects are rather light
> >> weight it was not taken in considerations to cache those things in
> >> ResourcePools are ThreadLocal variables.
> >>
> >> best
> >> Rupert
> >>
> >> On Sat, Mar 2, 2013 at 1:23 PM, harish suvarna <hs...@gmail.com>
> wrote:
> >> > OpenNLP documentation says postagger and tokenizer etc are not thread
> >> safe.
> >> > Couple of Internet posts and OpenNLP discussion forums also indicate
> >> this.
> >> > How is Stanbol using OpenNLP to make it thread safe? Do you use java
> >> > synchonised or thread-local or any java locking to make it thread
> safe.?
> >> > I have not ran into this thread safe issues in Stanbol yet.  Opennlp
> guy
> >> > says create one instance of opennlp components per thread.
> >> >
> >> >
> >>
> http://grokbase.com/t/opennlp/dev/1176mzaen1/thread-safety-or-lack-thereof
> >> > --
> >> > Thanks
> >> > Harish
> >>
> >>
> >>
> >> --
> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> >> | Bodenlehenstraße 11                             ++43-699-11108907
> >> | A-5500 Bischofshofen
> >>
> >
> >
> >
> > --
> > Thanks
> > Harish
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>



-- 
Thanks
Harish

Re: Opennlp thread safety in Stanbol

Posted by Rupert Westenthaler <ru...@gmail.com>.
On Sat, Mar 2, 2013 at 2:23 PM, harish suvarna <hs...@gmail.com> wrote:
> Rupert,
> Who creates the one instance per thread specifically one opennlp
> tokenizer/postagger per thread.? is it the
>  commons.opennlp or Stanbol has its own code.?

You must have misunderstood me.

* Models are singletons that are used by all threads . SentenceModel,
TokenizerModel, POSModel, ChunkerModel and TokenNameFinderModel are
all singeltons. Those things do need a lot of memory so it good to
have them as singletons.
* SentenceDetectors, Tokenizers, POSTagger, Chunker and
TokenNameFinders are created for each request (on top of the singleton
models). Those are lightweight components so reusing them would not
bring much of an advantage.

The code for loading and managing the singelton models is part of the
org.apache.stanbol.commons.opennlp module (see
org.apache.stanbol.commons.opennlp.OpenNLP for details). But this
class is mainly about

* OSGI integration
* using the Stanbol DataFileProvider [1] infrastructure for loading model files.

and not to workaround some OpenNLP concurrency issues. Actually the
way OpenNLP treats with concurrency seams to me just fine. I had much
more troubles with concurrency when integrating Freeling [2] and
Talismane [3] with Stanbol.

best
Rupert

[1] http://stanbol.apache.org/docs/trunk/utils/datafileprovider
[2] https://github.com/insideout10/stanbol-freeling
[3] https://github.com/westei/stanbol-talismane

>
> -harish
>
> On Sat, Mar 2, 2013 at 5:14 AM, Rupert Westenthaler <
> rupert.westenthaler@gmail.com> wrote:
>
>> Hi
>>
>> Stanbol uses a single instance of Models (e.g. POSModel). They are
>> loaded and managed by the OpenNLP service (commons.opennlp module).
>> Stanbol does not reuse OpenNLP Tagger, Finder, ... objects build on
>> top of models (e.g. POSTagger on top of the PosModel). So each request
>> will create a new instance. This is exactly because PostTagger,
>> Tokenizers ... are not thread safe (as stated by the documentation).
>> As the documentation also mentions hat those objects are rather light
>> weight it was not taken in considerations to cache those things in
>> ResourcePools are ThreadLocal variables.
>>
>> best
>> Rupert
>>
>> On Sat, Mar 2, 2013 at 1:23 PM, harish suvarna <hs...@gmail.com> wrote:
>> > OpenNLP documentation says postagger and tokenizer etc are not thread
>> safe.
>> > Couple of Internet posts and OpenNLP discussion forums also indicate
>> this.
>> > How is Stanbol using OpenNLP to make it thread safe? Do you use java
>> > synchonised or thread-local or any java locking to make it thread safe.?
>> > I have not ran into this thread safe issues in Stanbol yet.  Opennlp guy
>> > says create one instance of opennlp components per thread.
>> >
>> >
>> http://grokbase.com/t/opennlp/dev/1176mzaen1/thread-safety-or-lack-thereof
>> > --
>> > Thanks
>> > Harish
>>
>>
>>
>> --
>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>>
>
>
>
> --
> Thanks
> Harish



--
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Opennlp thread safety in Stanbol

Posted by harish suvarna <hs...@gmail.com>.
Rupert,
Who creates the one instance per thread specifically one opennlp
tokenizer/postagger per thread.? is it the
 commons.opennlp or Stanbol has its own code.?

-harish

On Sat, Mar 2, 2013 at 5:14 AM, Rupert Westenthaler <
rupert.westenthaler@gmail.com> wrote:

> Hi
>
> Stanbol uses a single instance of Models (e.g. POSModel). They are
> loaded and managed by the OpenNLP service (commons.opennlp module).
> Stanbol does not reuse OpenNLP Tagger, Finder, ... objects build on
> top of models (e.g. POSTagger on top of the PosModel). So each request
> will create a new instance. This is exactly because PostTagger,
> Tokenizers ... are not thread safe (as stated by the documentation).
> As the documentation also mentions hat those objects are rather light
> weight it was not taken in considerations to cache those things in
> ResourcePools are ThreadLocal variables.
>
> best
> Rupert
>
> On Sat, Mar 2, 2013 at 1:23 PM, harish suvarna <hs...@gmail.com> wrote:
> > OpenNLP documentation says postagger and tokenizer etc are not thread
> safe.
> > Couple of Internet posts and OpenNLP discussion forums also indicate
> this.
> > How is Stanbol using OpenNLP to make it thread safe? Do you use java
> > synchonised or thread-local or any java locking to make it thread safe.?
> > I have not ran into this thread safe issues in Stanbol yet.  Opennlp guy
> > says create one instance of opennlp components per thread.
> >
> >
> http://grokbase.com/t/opennlp/dev/1176mzaen1/thread-safety-or-lack-thereof
> > --
> > Thanks
> > Harish
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>



-- 
Thanks
Harish

Re: Opennlp thread safety in Stanbol

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi

Stanbol uses a single instance of Models (e.g. POSModel). They are
loaded and managed by the OpenNLP service (commons.opennlp module).
Stanbol does not reuse OpenNLP Tagger, Finder, ... objects build on
top of models (e.g. POSTagger on top of the PosModel). So each request
will create a new instance. This is exactly because PostTagger,
Tokenizers ... are not thread safe (as stated by the documentation).
As the documentation also mentions hat those objects are rather light
weight it was not taken in considerations to cache those things in
ResourcePools are ThreadLocal variables.

best
Rupert

On Sat, Mar 2, 2013 at 1:23 PM, harish suvarna <hs...@gmail.com> wrote:
> OpenNLP documentation says postagger and tokenizer etc are not thread safe.
> Couple of Internet posts and OpenNLP discussion forums also indicate this.
> How is Stanbol using OpenNLP to make it thread safe? Do you use java
> synchonised or thread-local or any java locking to make it thread safe.?
> I have not ran into this thread safe issues in Stanbol yet.  Opennlp guy
> says create one instance of opennlp components per thread.
>
> http://grokbase.com/t/opennlp/dev/1176mzaen1/thread-safety-or-lack-thereof
> --
> Thanks
> Harish



--
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen