You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Shahim Essaid <sh...@essaid.com> on 2012/04/30 20:01:50 UTC

Synchronizing the initialization of a component in an aggregate

Hi All,

I am trying to write a primitive analysis engine that checks, and
creates or updates a database schema based on the type system. I need
to synchronize the initialization of this component so that only one
instance of the component will do this task when there are multiple
instances being instantiated.

What is the correct object to synchronize on?  Is the type system
object the correct one and does it maintain its identity throughout a
JVM run?  Is it a different object in the other aggregates even though
they use the same type system description?

I need to block all other threads in the other instances of the
current component until the database is updated.  I also need this
object to be specific to the current aggregate so that other
aggregates running in the same JVM can have their own synchronization
objects and database updates independent of each other. In other
words, I can't use a JVM wide object.

Thank you,
Shahim

Re: Synchronizing the initialization of a component in an aggregate

Posted by Marshall Schor <ms...@schor.com>.


On 5/1/2012 3:03 PM, Shahim Essaid wrote:
> Hi Eddie,
>
> I am new to UIMA so I might have to reread the documentation. I was
> under the impression that I can use multiple threads as long as I use
> a cas pool and a multi threaded AE. This is described here:
>
> http://uima.apache.org/d/uimaj-2.4.0/tutorials_and_users_guides.html#ugr.tug.applications.multi_threaded
>
> Did I misunderstand this part of the documentation?  In my case I am
> running Core UIMA and I would benefit from multitasking.  (many of my
> annotators have a lot of idle time waiting for responses from remote
> servers)

The part of the documentation you're reading is in Chapter 3 - covering the UIMA 
Application APIs.  While it is quite possible to run multiple UIMA pipelines 
yourself, by writing the appropriate application, most users don't do this, but 
instead use one of the two scale-out capabilities that come with UIMA to handle 
the complexities of doing this.  This is what Eddie was referring to.

The 1st generation scaleout for UIMA was the CPE (Collection Processing 
Engine).  A 2nd generation scaleout is UIMA-AS (AS stands for Asynchronous 
Scaleout).  UIMA-AS is an separate download and adds on to base UIMA a robust 
scaleout mechanism that can use multiple cores in one node, as well as clusters 
of multiple machines.

-Marshall Schor

>
> This is my first time using UIMA and I have spent too much time trying
> to persist the annotations in a database because I am frequently
> changing my type system and experimenting with various analysis
> approaches. Keeping the schema and INSERT statements in synch with the
> type system was time consuming and error prone.  I would like to
> automate the persistence of the annotations based on the current type
> system if possible. I was looking at Liquibase as an API that could be
> used during the initialization of my pipelines to update the database
> schema and then write a generic JDBC based annotator to write/insert
> the cas to the database. Any thoughts?
>
> Best,
> Shahim
>
>
>
> On Tue, May 1, 2012 at 10:54 AM, Eddie Epstein<ea...@gmail.com>  wrote:
>> Core UIMA is a single-threaded framework. There are two multi-threaded
>> deployment wrappers, UIMA-AS and the older CPE. Are you using one of
>> these wrappers?
>>
>> Eddie
>>
>>
>>
>> On Mon, Apr 30, 2012 at 2:01 PM, Shahim Essaid<sh...@essaid.com>  wrote:
>>> Hi All,
>>>
>>> I am trying to write a primitive analysis engine that checks, and
>>> creates or updates a database schema based on the type system. I need
>>> to synchronize the initialization of this component so that only one
>>> instance of the component will do this task when there are multiple
>>> instances being instantiated.
>>>
>>> What is the correct object to synchronize on?  Is the type system
>>> object the correct one and does it maintain its identity throughout a
>>> JVM run?  Is it a different object in the other aggregates even though
>>> they use the same type system description?
>>>
>>> I need to block all other threads in the other instances of the
>>> current component until the database is updated.  I also need this
>>> object to be specific to the current aggregate so that other
>>> aggregates running in the same JVM can have their own synchronization
>>> objects and database updates independent of each other. In other
>>> words, I can't use a JVM wide object.
>>>
>>> Thank you,
>>> Shahim

Re: Synchronizing the initialization of a component in an aggregate

Posted by Marshall Schor <ms...@schor.com>.

On 5/1/2012 10:19 PM, Eddie Epstein wrote:
> Shahim,
>
> That reference shows user code implementing the multithreaded wrapper around
> core uima. Given that it would be [your] application code, you could stop after
> instantiating one analysis engine, inspect the typesystem and update the
> database schema before instantiating more AE. In this case it is all under
> the control of the application code.
>
> Note that UIMA-AS and the CPE do not synchronize access to an annotator
> from different threads; they run multiple instances of an annotator in
> different threads,
> assuming that the user specifies that the annotator is thread safe by declaring
> in the annotator's descriptor "MultipleInstancesAllowed=true".

Note that thread safe here means that it's OK to run multiple threads, each on 
its own instance of the Annotator class.  In practice this means that you have 
to think about and insure thread safety only for "static" fields.  Normal 
instance fields will each have separate threads, and no thread safety issues 
should arise for those.

Typically, the MultipleInstancesAllowed might be set to false in the case that 
you wanted an annotator not to be replicated because in the flow, you wanted all 
CASes to flow through it (it might be counting the CASes or accumulating some 
other statistics over all the CASes).

-Marshall
>
> Eddie
>
> On Tue, May 1, 2012 at 3:03 PM, Shahim Essaid<sh...@essaid.com>  wrote:
>> Hi Eddie,
>>
>> I am new to UIMA so I might have to reread the documentation. I was
>> under the impression that I can use multiple threads as long as I use
>> a cas pool and a multi threaded AE. This is described here:
>>
>> http://uima.apache.org/d/uimaj-2.4.0/tutorials_and_users_guides.html#ugr.tug.applications.multi_threaded
>>
>> Did I misunderstand this part of the documentation?  In my case I am
>> running Core UIMA and I would benefit from multitasking.  (many of my
>> annotators have a lot of idle time waiting for responses from remote
>> servers)
>>
>> This is my first time using UIMA and I have spent too much time trying
>> to persist the annotations in a database because I am frequently
>> changing my type system and experimenting with various analysis
>> approaches. Keeping the schema and INSERT statements in synch with the
>> type system was time consuming and error prone.  I would like to
>> automate the persistence of the annotations based on the current type
>> system if possible. I was looking at Liquibase as an API that could be
>> used during the initialization of my pipelines to update the database
>> schema and then write a generic JDBC based annotator to write/insert
>> the cas to the database. Any thoughts?
>>
>> Best,
>> Shahim
>>
>>
>>
>> On Tue, May 1, 2012 at 10:54 AM, Eddie Epstein<ea...@gmail.com>  wrote:
>>> Core UIMA is a single-threaded framework. There are two multi-threaded
>>> deployment wrappers, UIMA-AS and the older CPE. Are you using one of
>>> these wrappers?
>>>
>>> Eddie
>>>
>>>
>>>
>>> On Mon, Apr 30, 2012 at 2:01 PM, Shahim Essaid<sh...@essaid.com>  wrote:
>>>> Hi All,
>>>>
>>>> I am trying to write a primitive analysis engine that checks, and
>>>> creates or updates a database schema based on the type system. I need
>>>> to synchronize the initialization of this component so that only one
>>>> instance of the component will do this task when there are multiple
>>>> instances being instantiated.
>>>>
>>>> What is the correct object to synchronize on?  Is the type system
>>>> object the correct one and does it maintain its identity throughout a
>>>> JVM run?  Is it a different object in the other aggregates even though
>>>> they use the same type system description?
>>>>
>>>> I need to block all other threads in the other instances of the
>>>> current component until the database is updated.  I also need this
>>>> object to be specific to the current aggregate so that other
>>>> aggregates running in the same JVM can have their own synchronization
>>>> objects and database updates independent of each other. In other
>>>> words, I can't use a JVM wide object.
>>>>
>>>> Thank you,
>>>> Shahim

Re: Synchronizing the initialization of a component in an aggregate

Posted by Eddie Epstein <ea...@gmail.com>.
Shahim,

That reference shows user code implementing the multithreaded wrapper around
core uima. Given that it would be [your] application code, you could stop after
instantiating one analysis engine, inspect the typesystem and update the
database schema before instantiating more AE. In this case it is all under
the control of the application code.

Note that UIMA-AS and the CPE do not synchronize access to an annotator
from different threads; they run multiple instances of an annotator in
different threads,
assuming that the user specifies that the annotator is thread safe by declaring
in the annotator's descriptor "MultipleInstancesAllowed=true".

Eddie

On Tue, May 1, 2012 at 3:03 PM, Shahim Essaid <sh...@essaid.com> wrote:
> Hi Eddie,
>
> I am new to UIMA so I might have to reread the documentation. I was
> under the impression that I can use multiple threads as long as I use
> a cas pool and a multi threaded AE. This is described here:
>
> http://uima.apache.org/d/uimaj-2.4.0/tutorials_and_users_guides.html#ugr.tug.applications.multi_threaded
>
> Did I misunderstand this part of the documentation?  In my case I am
> running Core UIMA and I would benefit from multitasking.  (many of my
> annotators have a lot of idle time waiting for responses from remote
> servers)
>
> This is my first time using UIMA and I have spent too much time trying
> to persist the annotations in a database because I am frequently
> changing my type system and experimenting with various analysis
> approaches. Keeping the schema and INSERT statements in synch with the
> type system was time consuming and error prone.  I would like to
> automate the persistence of the annotations based on the current type
> system if possible. I was looking at Liquibase as an API that could be
> used during the initialization of my pipelines to update the database
> schema and then write a generic JDBC based annotator to write/insert
> the cas to the database. Any thoughts?
>
> Best,
> Shahim
>
>
>
> On Tue, May 1, 2012 at 10:54 AM, Eddie Epstein <ea...@gmail.com> wrote:
>> Core UIMA is a single-threaded framework. There are two multi-threaded
>> deployment wrappers, UIMA-AS and the older CPE. Are you using one of
>> these wrappers?
>>
>> Eddie
>>
>>
>>
>> On Mon, Apr 30, 2012 at 2:01 PM, Shahim Essaid <sh...@essaid.com> wrote:
>>> Hi All,
>>>
>>> I am trying to write a primitive analysis engine that checks, and
>>> creates or updates a database schema based on the type system. I need
>>> to synchronize the initialization of this component so that only one
>>> instance of the component will do this task when there are multiple
>>> instances being instantiated.
>>>
>>> What is the correct object to synchronize on?  Is the type system
>>> object the correct one and does it maintain its identity throughout a
>>> JVM run?  Is it a different object in the other aggregates even though
>>> they use the same type system description?
>>>
>>> I need to block all other threads in the other instances of the
>>> current component until the database is updated.  I also need this
>>> object to be specific to the current aggregate so that other
>>> aggregates running in the same JVM can have their own synchronization
>>> objects and database updates independent of each other. In other
>>> words, I can't use a JVM wide object.
>>>
>>> Thank you,
>>> Shahim

Re: Synchronizing the initialization of a component in an aggregate

Posted by Shahim Essaid <sh...@essaid.com>.
Hi Eddie,

I am new to UIMA so I might have to reread the documentation. I was
under the impression that I can use multiple threads as long as I use
a cas pool and a multi threaded AE. This is described here:

http://uima.apache.org/d/uimaj-2.4.0/tutorials_and_users_guides.html#ugr.tug.applications.multi_threaded

Did I misunderstand this part of the documentation?  In my case I am
running Core UIMA and I would benefit from multitasking.  (many of my
annotators have a lot of idle time waiting for responses from remote
servers)

This is my first time using UIMA and I have spent too much time trying
to persist the annotations in a database because I am frequently
changing my type system and experimenting with various analysis
approaches. Keeping the schema and INSERT statements in synch with the
type system was time consuming and error prone.  I would like to
automate the persistence of the annotations based on the current type
system if possible. I was looking at Liquibase as an API that could be
used during the initialization of my pipelines to update the database
schema and then write a generic JDBC based annotator to write/insert
the cas to the database. Any thoughts?

Best,
Shahim



On Tue, May 1, 2012 at 10:54 AM, Eddie Epstein <ea...@gmail.com> wrote:
> Core UIMA is a single-threaded framework. There are two multi-threaded
> deployment wrappers, UIMA-AS and the older CPE. Are you using one of
> these wrappers?
>
> Eddie
>
>
>
> On Mon, Apr 30, 2012 at 2:01 PM, Shahim Essaid <sh...@essaid.com> wrote:
>> Hi All,
>>
>> I am trying to write a primitive analysis engine that checks, and
>> creates or updates a database schema based on the type system. I need
>> to synchronize the initialization of this component so that only one
>> instance of the component will do this task when there are multiple
>> instances being instantiated.
>>
>> What is the correct object to synchronize on?  Is the type system
>> object the correct one and does it maintain its identity throughout a
>> JVM run?  Is it a different object in the other aggregates even though
>> they use the same type system description?
>>
>> I need to block all other threads in the other instances of the
>> current component until the database is updated.  I also need this
>> object to be specific to the current aggregate so that other
>> aggregates running in the same JVM can have their own synchronization
>> objects and database updates independent of each other. In other
>> words, I can't use a JVM wide object.
>>
>> Thank you,
>> Shahim

Re: Synchronizing the initialization of a component in an aggregate

Posted by Eddie Epstein <ea...@gmail.com>.
Core UIMA is a single-threaded framework. There are two multi-threaded
deployment wrappers, UIMA-AS and the older CPE. Are you using one of
these wrappers?

Eddie



On Mon, Apr 30, 2012 at 2:01 PM, Shahim Essaid <sh...@essaid.com> wrote:
> Hi All,
>
> I am trying to write a primitive analysis engine that checks, and
> creates or updates a database schema based on the type system. I need
> to synchronize the initialization of this component so that only one
> instance of the component will do this task when there are multiple
> instances being instantiated.
>
> What is the correct object to synchronize on?  Is the type system
> object the correct one and does it maintain its identity throughout a
> JVM run?  Is it a different object in the other aggregates even though
> they use the same type system description?
>
> I need to block all other threads in the other instances of the
> current component until the database is updated.  I also need this
> object to be specific to the current aggregate so that other
> aggregates running in the same JVM can have their own synchronization
> objects and database updates independent of each other. In other
> words, I can't use a JVM wide object.
>
> Thank you,
> Shahim