You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Jörn Kottmann <ko...@gmail.com> on 2009/06/19 15:56:59 UTC

UIMA-AS and CasManager.defineCasPool() was called twice by the same Analysis Engine

Hello everyone,

I have been using uima as already for tagging text with a custom AAE,
though I did not scaled the AAE because I run in a few issues back then and
had no time to solve them.

Now I tried again to scale the AAE and failed again. The AAE gets a 
document id
which is sent to it via uimaj-as-camel component. A cas multiplier then 
fetches the
actual document out of a database and thats also the component which causes
trouble.

Because the AAE is not thread safe uima as must scale it through 
creating multiple
instances of it.
After reading through the uima as documentation I came up with this 
deployment descriptor:
            ...
            <analysisEngine key="TextAnalysis" async="false">
                <scaleout numberOfInstances="8" />

                <delegates>
                    <analysisEngine key="HBaseCasMultiplier">
                        <casMultiplier poolSize="8"/>
                    </analysisEngine>
                </delegates>
            </analysisEngine>
            ...

I must admit the documentation confused me a bit about the meaning of 
the async attribute.
Is it correct that async=false means that uima as creates multiple 
instances which are each called
from one worker thread ? And async=true would then mean that one AE is 
called by multiple threads.

If the numberOfInstacnes is larger then 1 I always get this exception:
Caused by: org.apache.uima.UIMARuntimeException: The method 
CasManager.defineCasPool() was called twice by the same Analysis Engine 
(/HBaseCasMultiplier/).
    at 
org.apache.uima.resource.impl.CasManager_impl.defineCasPool(CasManager_impl.java:181)
    at 
org.apache.uima.resource.impl.CasManager_impl.defineCasPool(CasManager_impl.java:161)
    at 
org.apache.uima.aae.EECasManager_impl.defineCasPool(EECasManager_impl.java:75)
    at 
org.apache.uima.impl.UimaContext_ImplBase.getEmptyCas(UimaContext_ImplBase.java:565)
    at 
org.apache.uima.analysis_component.CasMultiplier_ImplBase.getEmptyCAS(CasMultiplier_ImplBase.java:109)
    at 
dk.infopaq.nlp.repository.connector.HBaseReadCasMultiplier.hasNext(HBaseReadCasMultiplier.java:107)
    at 
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl$AnalysisComponentCasIterator.hasNext(PrimitiveAnalysisEngine_impl.java:563)
    at 
org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:566)
    ... 20 more


A while back I had a problem which resulted in the same exception message,
but I was solved by updating UIMA to the current 2.3.0-SNAPSHOT:
http://www.mail-archive.com/uima-user@incubator.apache.org/msg02054.html

The version I am using is 2.3.0-SNAPSHOT from mid of may.

Thanks,
Jörn

Re: UIMA-AS and CasManager.defineCasPool() was called twice by the same Analysis Engine

Posted by Jörn Kottmann <ko...@gmail.com>.
Jörn Kottmann wrote:
> One more thing, is it possible that my two AAE instances share
> the same Cas Manager instance ?
I spent a little more time in the code and I think the following test
represents whats happening in UIMA AS.

      // with 2 simultaneous AEs
      segmenterDesc = UIMAFramework.getXMLParser()
              .parseAnalysisEngineDescription(
                      new XMLInputSource(JUnitExtension
                              
.getFile("TextAnalysisEngineImplTest/NewlineSegmenter.xml")));
     
      ResourceManager rsrcMgr = UIMAFramework.newDefaultResourceManager();
     
      Map<String, Object> params = new HashMap<String, Object>();
      params.put(AnalysisEngine.PARAM_NUM_SIMULTANEOUS_REQUESTS, 2);
     
      AnalysisEngine ae1 = 
UIMAFramework.produceAnalysisEngine(segmenterDesc, rsrcMgr, params);
      AnalysisEngine ae2 = 
UIMAFramework.produceAnalysisEngine(segmenterDesc, rsrcMgr, params);
     
      // start with testing first ae
      CAS cas1 = ae1.newCAS();
      cas1.setDocumentText("Line one\nLine two\nLine three");
      CasIterator iter1 = ae1.processAndOutputNewCASes(cas1);
      assertTrue(iter1.hasNext());
      CAS outCas1 = iter1.next();
      assertEquals("Line one", outCas1.getDocumentText());
     
      // now test second ae
      CAS cas2 = ae2.newCAS();
      cas2.setDocumentText("Line one\nLine two\nLine three");
      CasIterator iter2 = ae2.processAndOutputNewCASes(cas2);
      assertTrue(iter2.hasNext());
      CAS outCas2 = iter2.next();
      assertEquals("Line one", outCas2.getDocumentText());
      outCas2.release();
      assertTrue(iter2.hasNext());
      outCas2 = iter2.next();
      assertEquals("Line two", outCas2.getDocumentText());
      outCas2.release();
      assertTrue(iter2.hasNext());
      outCas2 = iter2.next();
      assertEquals("Line three", outCas2.getDocumentText());
      outCas2.release();
      assertFalse(iter2.hasNext());
     
      // continue testing first ae
      outCas1.release();
      assertTrue(iter1.hasNext());
      outCas1 = iter1.next();
      assertEquals("Line two", outCas1.getDocumentText());
      outCas1.release();
      assertTrue(iter1.hasNext());
      outCas1 = iter1.next();
      assertEquals("Line three", outCas1.getDocumentText());
      outCas1.release();
      assertFalse(iter1.hasNext());

In this sample the resource manager is shared between the two instances,
thats the reason the code later runs in the exception when it tries to 
define
a cas pool with the same name a second time.

I am note sure if our API allows sharing a resource manager.

If each AE has its own resource manager the test runs through.

Jörn

Re: UIMA-AS and CasManager.defineCasPool() was called twice by the same Analysis Engine

Posted by Jörn Kottmann <ko...@gmail.com>.
Jaroslaw Cwiklik wrote:
> Jorn, hopefully the last changes to the uima core have fixed a problem you
> were having while deploying       multiple instances of a Uima Aggregate
> with an embedded CM. Thanks for finding this.
> Can you please confirm that the code fix has resolved your problem.
I do not longer encounter the issue, thanks for your help.

Jörn

Re: UIMA-AS and CasManager.defineCasPool() was called twice by the same Analysis Engine

Posted by Jaroslaw Cwiklik <ui...@gmail.com>.
Jorn, hopefully the last changes to the uima core have fixed a problem you
were having while deploying       multiple instances of a Uima Aggregate
with an embedded CM. Thanks for finding this.
Can you please confirm that the code fix has resolved your problem.

Thanks, Jerry


On Thu, Jun 25, 2009 at 9:34 AM, Jaroslaw Cwiklik <ui...@gmail.com> wrote:

> PARAM_NUM_SIMULTANEOUS_REQUESTS is not used directly by Uima AS. The scale
> out for Uima AS service is controlled by settings in the deployment
> descriptor. I believe the PARAM_NUM_SIMULTANEOUS_REQUESTS is used in either
> SOAP based or Vinci based service wrappers.
> -jerry
>
>
> On Thu, Jun 25, 2009 at 6:23 AM, Jörn Kottmann <ko...@gmail.com> wrote:
>
>> Jaroslaw Cwiklik wrote:
>>
>>> We should probably add a new test case to the uima core test suite as
>>> well.
>>> I'll use your example code from a previous thread for that.
>>> -jerry
>>>
>>
>> I derived the code from our test code. You can just
>> add it to AnalysisEngine_ImplTest.testProcessAndOutputNewCASes.
>>
>> Can you please comment on if PARAM_NUM_SIMULTANEOUS_REQUESTS
>> is necessary for the UIMA AS code which creates the multiple AE instances
>> ?
>>
>> Jörn
>>
>
>

Re: UIMA-AS and CasManager.defineCasPool() was called twice by the same Analysis Engine

Posted by Jaroslaw Cwiklik <ui...@gmail.com>.
PARAM_NUM_SIMULTANEOUS_REQUESTS is not used directly by Uima AS. The scale
out for Uima AS service is controlled by settings in the deployment
descriptor. I believe the PARAM_NUM_SIMULTANEOUS_REQUESTS is used in either
SOAP based or Vinci based service wrappers.
-jerry

On Thu, Jun 25, 2009 at 6:23 AM, Jörn Kottmann <ko...@gmail.com> wrote:

> Jaroslaw Cwiklik wrote:
>
>> We should probably add a new test case to the uima core test suite as
>> well.
>> I'll use your example code from a previous thread for that.
>> -jerry
>>
>
> I derived the code from our test code. You can just
> add it to AnalysisEngine_ImplTest.testProcessAndOutputNewCASes.
>
> Can you please comment on if PARAM_NUM_SIMULTANEOUS_REQUESTS
> is necessary for the UIMA AS code which creates the multiple AE instances ?
>
> Jörn
>

Re: UIMA-AS and CasManager.defineCasPool() was called twice by the same Analysis Engine

Posted by Jörn Kottmann <ko...@gmail.com>.
Jaroslaw Cwiklik wrote:
> We should probably add a new test case to the uima core test suite as well.
> I'll use your example code from a previous thread for that.
> -jerry

I derived the code from our test code. You can just
add it to AnalysisEngine_ImplTest.testProcessAndOutputNewCASes.

Can you please comment on if PARAM_NUM_SIMULTANEOUS_REQUESTS
is necessary for the UIMA AS code which creates the multiple AE instances ?

Jörn

Re: UIMA-AS and CasManager.defineCasPool() was called twice by the same Analysis Engine

Posted by Jaroslaw Cwiklik <ui...@gmail.com>.
We should probably add a new test case to the uima core test suite as well.
I'll use your example code from a previous thread for that.
-jerry

On Wed, Jun 24, 2009 at 5:10 PM, Jaroslaw Cwiklik <ui...@gmail.com> wrote:

> Extended tests are in the Apache Uima project called uimaj-as-activemq.
> Look for:
> TestUimaASExtended.java
>
> The jUnit testcase is called: testScaledSyncAggregateProcess()
> it uses this descriptor:
>
>   <deployment protocol="jms" provider="activemq">
>     <casPool numberOfCASes="5"/>
>     <service>
>       <inputQueue endpoint="TopLevelTaeQueue"
> brokerURL="tcp://localhost:8118" prefetch="1"/>
>       <topDescriptor>
>         <import
> location="../descriptors/analysis_engine/SimpleTestAggregate.xml"/>
>       </topDescriptor>
>       <analysisEngine async="false">
>           <scaleout numberOfInstances="5"/>
>       </analysisEngine>
>
>     </service>
>   </deployment>
>
> -jerry
>
>
> On Wed, Jun 24, 2009 at 4:44 PM, Jörn Kottmann <ko...@gmail.com> wrote:
>
>> Jaroslaw Cwiklik wrote:
>>
>>> Yes, I've added a new test case called:
>>> testScaledSyncAggregateProcess()
>>>
>>> in the extended tests.
>>>
>>>
>> What are the extended tests ? Are they not part of Apache UIMA ?
>> The commits you did for UIMA-1400 doesn't list the add of
>> the test case. I am just curious and would like to have a look at it.
>>
>> Jörn
>>
>>
>

Re: UIMA-AS and CasManager.defineCasPool() was called twice by the same Analysis Engine

Posted by Jaroslaw Cwiklik <ui...@gmail.com>.
Extended tests are in the Apache Uima project called uimaj-as-activemq. Look
for:
TestUimaASExtended.java

The jUnit testcase is called: testScaledSyncAggregateProcess()
it uses this descriptor:

  <deployment protocol="jms" provider="activemq">
    <casPool numberOfCASes="5"/>
    <service>
      <inputQueue endpoint="TopLevelTaeQueue"
brokerURL="tcp://localhost:8118" prefetch="1"/>
      <topDescriptor>
        <import
location="../descriptors/analysis_engine/SimpleTestAggregate.xml"/>
      </topDescriptor>
      <analysisEngine async="false">
          <scaleout numberOfInstances="5"/>
      </analysisEngine>

    </service>
  </deployment>

-jerry

On Wed, Jun 24, 2009 at 4:44 PM, Jörn Kottmann <ko...@gmail.com> wrote:

> Jaroslaw Cwiklik wrote:
>
>> Yes, I've added a new test case called:
>> testScaledSyncAggregateProcess()
>>
>> in the extended tests.
>>
>>
> What are the extended tests ? Are they not part of Apache UIMA ?
> The commits you did for UIMA-1400 doesn't list the add of
> the test case. I am just curious and would like to have a look at it.
>
> Jörn
>
>

Re: UIMA-AS and CasManager.defineCasPool() was called twice by the same Analysis Engine

Posted by Jörn Kottmann <ko...@gmail.com>.
Jaroslaw Cwiklik wrote:
> Yes, I've added a new test case called:
> testScaledSyncAggregateProcess()
>
> in the extended tests.
>   
What are the extended tests ? Are they not part of Apache UIMA ?
The commits you did for UIMA-1400 doesn't list the add of
the test case. I am just curious and would like to have a look at it.

Jörn


Re: UIMA-AS and CasManager.defineCasPool() was called twice by the same Analysis Engine

Posted by Jaroslaw Cwiklik <ui...@gmail.com>.
Yes, I've added a new test case called:
testScaledSyncAggregateProcess()

in the extended tests.

jerry

On Wed, Jun 24, 2009 at 3:06 PM, Jörn Kottmann <ko...@gmail.com> wrote:

> Jaroslaw Cwiklik wrote:
>
>> Jörn, the fix for your problem was committed under JIRA UIMA-1400.
>> Jerry Cwiklik
>>
>>
> Thanks, I will test it later today. Did you added a test case ?
>
> Maybe we could add the code snippet I posted to demonstrate
> the problem to our unit tests.
>
> Jörn
>

Re: UIMA-AS and CasManager.defineCasPool() was called twice by the same Analysis Engine

Posted by Jörn Kottmann <ko...@gmail.com>.
Jaroslaw Cwiklik wrote:
> Jörn, the fix for your problem was committed under JIRA UIMA-1400.
> Jerry Cwiklik
>   
Thanks, I will test it later today. Did you added a test case ?

Maybe we could add the code snippet I posted to demonstrate
the problem to our unit tests.

Jörn

Re: UIMA-AS and CasManager.defineCasPool() was called twice by the same Analysis Engine

Posted by Jaroslaw Cwiklik <ui...@gmail.com>.
Jörn, the fix for your problem was committed under JIRA UIMA-1400.
Jerry Cwiklik


On Tue, Jun 23, 2009 at 10:53 AM, Jörn Kottmann <ko...@gmail.com> wrote:

> Jaroslaw Cwiklik wrote:
>
>> Jörn, I was able to replicate the problem and will address this soon.
>> Indeed, there is one CasManager instance that is shared by multiple AAE
>> instances causing the exception. Code in the CasManager should be changed
>> to
>> prevent the exception. I will let you know when this is fixed.
>>
>>
> Thanks, please note that the pool handling in CasManager_impl
> is not thread safe and can run into concurrency issues when
> accessed from more than one thread.
>
> If I remember correctly I also had the issue with CPE, but thought
> that CPE just cannot handle Cas Multipliers.
>
> Jörn
>
>

Re: UIMA-AS and CasManager.defineCasPool() was called twice by the same Analysis Engine

Posted by Jörn Kottmann <ko...@gmail.com>.
Jaroslaw Cwiklik wrote:
> Jörn, I was able to replicate the problem and will address this soon.
> Indeed, there is one CasManager instance that is shared by multiple AAE
> instances causing the exception. Code in the CasManager should be changed to
> prevent the exception. I will let you know when this is fixed.
>   
Thanks, please note that the pool handling in CasManager_impl
is not thread safe and can run into concurrency issues when
accessed from more than one thread.

If I remember correctly I also had the issue with CPE, but thought
that CPE just cannot handle Cas Multipliers.

Jörn


Re: UIMA-AS and CasManager.defineCasPool() was called twice by the same Analysis Engine

Posted by Jaroslaw Cwiklik <ui...@gmail.com>.
Jörn, I was able to replicate the problem and will address this soon.
Indeed, there is one CasManager instance that is shared by multiple AAE
instances causing the exception. Code in the CasManager should be changed to
prevent the exception. I will let you know when this is fixed.

Jerry

On Mon, Jun 22, 2009 at 8:25 PM, Jörn Kottmann <ko...@gmail.com> wrote:

> Hi,
>
> I looked a bit through the code and maybe found
> an issue not sure if it is related.
>
> In PrimitiveAnalysisEngineController_impl the AE instances
> are created in initializeAnalysisEngine with this call:
>
> UIMAFramework.produceAnalysisEngine(rSpecifier, paramsMap);
>
> If I got it right it should contain the PARAM_NUM_SIMULTANEOUS_REQUESTS
> parameter, but it doesn't. Is this a potential problem ?
> Though setting the parameter to 2 does not fix my problem.
>
> One more thing, is it possible that my two AAE instances share
> the same Cas Manager instance ?
>
> Jörn
>

Re: UIMA-AS and CasManager.defineCasPool() was called twice by the same Analysis Engine

Posted by Jörn Kottmann <ko...@gmail.com>.
Hi,

I looked a bit through the code and maybe found
an issue not sure if it is related.

In PrimitiveAnalysisEngineController_impl the AE instances
are created in initializeAnalysisEngine with this call:

UIMAFramework.produceAnalysisEngine(rSpecifier, paramsMap);

If I got it right it should contain the PARAM_NUM_SIMULTANEOUS_REQUESTS
parameter, but it doesn't. Is this a potential problem ?
Though setting the parameter to 2 does not fix my problem.

One more thing, is it possible that my two AAE instances share
the same Cas Manager instance ?

Jörn

Re: UIMA-AS and CasManager.defineCasPool() was called twice by the same Analysis Engine

Posted by Jörn Kottmann <ko...@gmail.com>.
Eddie Epstein wrote:
>> I am note sure if I should run async or not. Right now
>> the analysis is running on one quad core server.
>> Now I would like to setup UIMA AS in a way that
>> it uses all the CPU time of all cores for fetching/writing
>> documents to and from HBase and for analysis.
>> The interaction with HBase makes the thread idling
>> for short period of time, thats why I need maybe like
>> 10 threads for fetching and 10 threads for writing
>> to pump enough documents through the machine
>> to keep it busy.
>>
>> Having the AAE async would have the advantage for me
>> that I only need 10 instances of the fetching CM and 10
>> instance of the writing delegate AE and not 20 instances
>> of the whole AAE. The same is true for  analysis there
>> I can just scale the AEs which are slow.
>> Though for scaling the CM I have to use the suggested
>> workaround.
>>
>> So all in all I think having it async would be an advantage,
>> but for now it would just be fine to not have it async because
>> that seems easier.
>>     
>>> Assuming that your
>>> AE runs correctly as a single threaded aggregate, creating multiple
>>> instances of this seems fine. The correction to your previous deployment
>>> descriptor would just be:
>>>
>>>          <analysisEngine key="TextAnalysis" async="false">
>>>              <scaleout numberOfInstances="8" />
>>>          </analysisEngine>
>>>
>>> From UIMA AS point of view, this component is not a CasMultiplier
>>> because [I assume] it comsumes new CASes internally and does not
>>> return them.
>>>
>>> Let emphasize that before AS scaleout the aggregate should be tested
>>> as a simple UIMA aggregate with the normal tools like CVD, runAE,
>>> or a custom driver.
>>>
>>>       
>> I tested the correction but got the first exception again.
>> Here is now the full stack trace and not only the cause:
>>
>>     
>
> Does this error happen right away, or randomly after some period of
> processing? Can you confirm that if you run this configuration with
> scaleout=1 there is no problem?
>   

Yes with numberOfInstances=1 it works.

Here is the configuration again:
<analysisEngine async="false">
    <scaleout numberOfInstances="1" />
</analysisEngine>

Now changed numberOfInstances to 2.
The first CAS goes through with out an error,
second CAS throws the exception and third goes through
without an error, fourth CAS throws the exception again and then I stopped
debugging. I used the 2.3.0-SNAPSHOT of today for the test.

For me it looks a bit like that one of the two AAE instances works properly.

Jörn



Re: UIMA-AS and CasManager.defineCasPool() was called twice by the same Analysis Engine

Posted by Eddie Epstein <ea...@gmail.com>.
> I am note sure if I should run async or not. Right now
> the analysis is running on one quad core server.
> Now I would like to setup UIMA AS in a way that
> it uses all the CPU time of all cores for fetching/writing
> documents to and from HBase and for analysis.
> The interaction with HBase makes the thread idling
> for short period of time, thats why I need maybe like
> 10 threads for fetching and 10 threads for writing
> to pump enough documents through the machine
> to keep it busy.
>
> Having the AAE async would have the advantage for me
> that I only need 10 instances of the fetching CM and 10
> instance of the writing delegate AE and not 20 instances
> of the whole AAE. The same is true for  analysis there
> I can just scale the AEs which are slow.
> Though for scaling the CM I have to use the suggested
> workaround.
>
> So all in all I think having it async would be an advantage,
> but for now it would just be fine to not have it async because
> that seems easier.
>>
>> Assuming that your
>> AE runs correctly as a single threaded aggregate, creating multiple
>> instances of this seems fine. The correction to your previous deployment
>> descriptor would just be:
>>
>>          <analysisEngine key="TextAnalysis" async="false">
>>              <scaleout numberOfInstances="8" />
>>          </analysisEngine>
>>
>> From UIMA AS point of view, this component is not a CasMultiplier
>> because [I assume] it comsumes new CASes internally and does not
>> return them.
>>
>> Let emphasize that before AS scaleout the aggregate should be tested
>> as a simple UIMA aggregate with the normal tools like CVD, runAE,
>> or a custom driver.
>>
>
> I tested the correction but got the first exception again.
> Here is now the full stack trace and not only the cause:
>

Does this error happen right away, or randomly after some period of
processing? Can you confirm that if you run this configuration with
scaleout=1 there is no problem?

> How does CorpusReader get the id which is included in the input Cas ?

Have the CM put the id into the new Cas for the CorpusReader.
Just create an FS with the appropriate feature to hold the id, and add
that FS to the index. The getAllIndexedFS(type) method is convenient
for getting an indexed FS that does not have a custom covering index
defined.

Eddie

Re: UIMA-AS and CasManager.defineCasPool() was called twice by the same Analysis Engine

Posted by Jörn Kottmann <ko...@gmail.com>.
Eddie Epstein wrote:
>> I reduced my AAE to three delegate AEs:
>>
>> 1. HBaseCasMultiplier -> fetches the actual text from hbase
>> 2. Tokenizer -> adds tokens to my CAS
>> 3. HBaseWrite -> writes the tokens back into hbase
>>
>> These delegates are not thread safe, to scale these AEs
>> one instance per worker thread must be created.
>> Thats what I want UIMA AS to do for me, so I think thats
>> also the case which is described in the documentation in 1.4.1:
>>
>> "... The classes for annotators and flow controllers do not need to be
>> "thread-safe"
>> with respect to their instance data - meaning, they do not need to be
>> implemented
>> with synchronization locks for access to their instance data, because each
>> instance
>> will only be called using one thread at a time. Scale out for these classes
>> is done using
>> multiple instances of the class. ..."
>>
>>     
>
> That documentation is correct, but apparently not as clear as we'd like.
>
> Note that the following paragraph in the documentation goes on to say
>  "However, if you have class "static" fields shared by all instances,
>   or other kinds of external data shared by all instances (such as a
>   writable file), you must be aware of the possibility of multiple threads
>   accessing these fields or external resources, running on separate
>   instances of the class, and do any required synchronization for these."
>
> So, barring any static fields or resources that would cause problems with
> multiple instantiations, UIMA AS scaleout in the same JVM should work.
>   
Yes, sure I do not have any shared data by the AE instances, actually
I think that paragraph goes without saying, but its still good
to get reminded.

> Hmm, not clear to me that you want async=true. 
I am note sure if I should run async or not. Right now
the analysis is running on one quad core server.
Now I would like to setup UIMA AS in a way that
it uses all the CPU time of all cores for fetching/writing
documents to and from HBase and for analysis.
The interaction with HBase makes the thread idling
for short period of time, thats why I need maybe like
10 threads for fetching and 10 threads for writing
to pump enough documents through the machine
to keep it busy.

Having the AAE async would have the advantage for me
that I only need 10 instances of the fetching CM and 10
instance of the writing delegate AE and not 20 instances
of the whole AAE. The same is true for  analysis there
I can just scale the AEs which are slow.
Though for scaling the CM I have to use the suggested
workaround.

So all in all I think having it async would be an advantage,
but for now it would just be fine to not have it async because
that seems easier.
> Assuming that your
> AE runs correctly as a single threaded aggregate, creating multiple
> instances of this seems fine. The correction to your previous deployment
> descriptor would just be:
>
>           <analysisEngine key="TextAnalysis" async="false">
>               <scaleout numberOfInstances="8" />
>           </analysisEngine>
>
> From UIMA AS point of view, this component is not a CasMultiplier
> because [I assume] it comsumes new CASes internally and does not
> return them.
>
> Let emphasize that before AS scaleout the aggregate should be tested
> as a simple UIMA aggregate with the normal tools like CVD, runAE,
> or a custom driver.
>   
I tested the correction but got the first exception again.
Here is now the full stack trace and not only the cause:

org.apache.uima.analysis_engine.AnalysisEngineProcessException: The 
method CasManager.defineCasPool() was called twice by the same Analysis 
Engine (/HBaseCasMultiplier/).
    at 
org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:699)
    at 
org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.<init>(ASB_impl.java:407)
    at 
org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:340)
    at 
org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:267)
    at 
org.apache.uima.aae.controller.PrimitiveAnalysisEngineController_impl.process(PrimitiveAnalysisEngineController_impl.java:376)
    at 
org.apache.uima.aae.handler.HandlerBase.invokeProcess(HandlerBase.java:130)
    at 
org.apache.uima.aae.handler.input.ProcessRequestHandler_impl.handleProcessRequestFromRemoteDelegate(ProcessRequestHandler_impl.java:453)
    at 
org.apache.uima.aae.handler.input.ProcessRequestHandler_impl.handle(ProcessRequestHandler_impl.java:896)
    at 
org.apache.uima.aae.handler.input.MetadataRequestHandler_impl.handle(MetadataRequestHandler_impl.java:84)
    at 
org.apache.uima.adapter.jms.activemq.JmsInputChannel.onMessage(JmsInputChannel.java:665)
    at 
org.springframework.jms.listener.AbstractMessageListenerContainer.doInvokeListener(AbstractMessageListenerContainer.java:485)
    at 
org.springframework.jms.listener.AbstractMessageListenerContainer.invokeListener(AbstractMessageListenerContainer.java:442)
    at 
org.springframework.jms.listener.AbstractMessageListenerContainer.doExecuteListener(AbstractMessageListenerContainer.java:414)
    at 
org.springframework.jms.listener.AbstractPollingMessageListenerContainer.doReceiveAndExecute(AbstractPollingMessageListenerContainer.java:309)
    at 
org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveAndExecute(AbstractPollingMessageListenerContainer.java:254)
    at 
org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.invokeListener(DefaultMessageListenerContainer.java:871)
    at 
org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:818)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at 
org.apache.uima.aae.UimaAsThreadFactory$1.run(UimaAsThreadFactory.java:69)
    at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.uima.UIMARuntimeException: The method 
CasManager.defineCasPool() was called twice by the same Analysis Engine 
(/HBaseCasMultiplier/).
    at 
org.apache.uima.resource.impl.CasManager_impl.defineCasPool(CasManager_impl.java:181)
    at 
org.apache.uima.resource.impl.CasManager_impl.defineCasPool(CasManager_impl.java:161)
    at 
org.apache.uima.aae.EECasManager_impl.defineCasPool(EECasManager_impl.java:75)
    at 
org.apache.uima.impl.UimaContext_ImplBase.getEmptyCas(UimaContext_ImplBase.java:565)
    at 
org.apache.uima.analysis_component.CasMultiplier_ImplBase.getEmptyCAS(CasMultiplier_ImplBase.java:109)
    at 
dk.infopaq.nlp.repository.connector.HBaseReadCasMultiplier.hasNext(HBaseReadCasMultiplier.java:107)
    at 
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl$AnalysisComponentCasIterator.hasNext(PrimitiveAnalysisEngine_impl.java:563)
    at 
org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:566)
    ... 20 more

>> Ok, I changed it to fit to case described above:
>>           <analysisEngine>
>>               <delegates>
>>                   <analysisEngine key="HBaseCasMultiplier">
>>                       <casMultiplier poolSize="4"/>
>>                       <scaleout numberOfInstances="2" />
>>                   </analysisEngine>
>>                   <analysisEngine key="Tokenizer">
>>                       <scaleout numberOfInstances="4" />
>>                   </analysisEngine>
>>                   <analysisEngine key="HBaseWriter">
>>                       <scaleout numberOfInstances="4" />
>>                   </analysisEngine>
>>               </delegates>
>>           </analysisEngine>
>>
>> I would like to scale the HBaseCasMultiplier to more threads
>> then two, because there is a short delay when reading from hbase.
>> First I am not sure which value I should choose for the
>> Cas Multiplier pool size. If the numberOfInstances get larger
>> then two I get a few exceptions (stack trace below) when UIMA AS
>> starts to process the first documents. So I think I am doing something
>> wrong here. And what is the minimal possible casPoolSize, since
>> I need CAS instances for my 4 Tokenizers, 4 HBaseWriters
>> and 4 (?) for the CAS Multiplier, which would result in a minimum
>> size of 12, right ?
>>
>> The HBaseCasMultiplier gets one CAS which contains the id and
>> then outputs one CAS which contains an actual text.
>>     
>
> Supporting the complexities raised by Cas multipliers has been quite
> challenging. I'm pretty sure that a co-located CM cannot be scaled; we
> need to check this and clarify the situation. (This is different from having
> more than one CM in the same aggregate, which is supported with
> the latest code.)
>
> Here is a possible workaround to run this aggregate asynchronously.
> If I understand your scenario, each input Cas is tiny, the CM creates
> a new Cas with the document to be processed and consumed by
> HBaseWriters, and finally the aggregate returns just the tiny input Cas.
> The workaround is to have the CM create a new Cas, but not fetch
> the document. Add a new delegate immediately following the CM,
> say CorpusReader, which fills the new CASes with documents and can
> be scaled out as desired.
>   
How does CorpusReader get the id which is included in the input Cas ?

Jörn

Re: UIMA-AS and CasManager.defineCasPool() was called twice by the same Analysis Engine

Posted by Eddie Epstein <ea...@gmail.com>.
Hi Jörn,

Please see comments below..

On Fri, Jun 19, 2009 at 8:44 PM, Jörn Kottmann<ko...@gmail.com> wrote:
> Thanks for your reply Jaroslaw, it seems that I misunderstood
> the way UIMA AS works.
>
>> 1)
>> "... Because the AAE is not thread safe uima as must scale it through
>> creating multiple instances of it..."
>>
>> Since the AAE is not thread safe you should not try to scale it out in the
>> same JVM. If AAE
>> is not thread safe, you should only have one instance of it per JVM. You
>> can
>> scale it by
>> starting multiple JVMs.
>>
>
> I reduced my AAE to three delegate AEs:
>
> 1. HBaseCasMultiplier -> fetches the actual text from hbase
> 2. Tokenizer -> adds tokens to my CAS
> 3. HBaseWrite -> writes the tokens back into hbase
>
> These delegates are not thread safe, to scale these AEs
> one instance per worker thread must be created.
> Thats what I want UIMA AS to do for me, so I think thats
> also the case which is described in the documentation in 1.4.1:
>
> "... The classes for annotators and flow controllers do not need to be
> "thread-safe"
> with respect to their instance data - meaning, they do not need to be
> implemented
> with synchronization locks for access to their instance data, because each
> instance
> will only be called using one thread at a time. Scale out for these classes
> is done using
> multiple instances of the class. ..."
>

That documentation is correct, but apparently not as clear as we'd like.

Note that the following paragraph in the documentation goes on to say
 "However, if you have class "static" fields shared by all instances,
  or other kinds of external data shared by all instances (such as a
  writable file), you must be aware of the possibility of multiple threads
  accessing these fields or external resources, running on separate
  instances of the class, and do any required synchronization for these."

So, barring any static fields or resources that would cause problems with
multiple instantiations, UIMA AS scaleout in the same JVM should work.

>> 2)
>> "...I must admit the documentation confused me a bit about the meaning of
>> the async attribute..."
>>
>> The async attribute is only used for aggregates, and specifies that this
>> aggregate will be run asynchronously (with input queues in front of all of
>> its delegates) or not. If you choose async="false" it means that you want
>> to
>> deploy the aggregate synchronously. Meaning it will be single-threaded. To
>> UIMA AS a synchronous aggregate is the same as a
>> UIMA primitive AE.
>>
>
> Thanks, understood the difference, so I want async="true"
>
>> 3)            ...
>>            <analysisEngine key="TextAnalysis" async="false">
>>                <scaleout numberOfInstances="8" />
>>
>>                <delegates>
>>                    <analysisEngine key="HBaseCasMultiplier">
>>                        <casMultiplier poolSize="8"/>
>>                    </analysisEngine>
>>                </delegates>
>>            </analysisEngine>
>>            ...
>>
>> The above is an inconsistent configuration.  You are specifying that
>> "TextAnalytics" should be deployed synchronously but then adding delegate
>> configuration, which forces the aggregate to be deployed asynchronously.
>> Synchronous aggregate delegate's are not "visible" to the uima-as, and
>> cannot be configured in the deployment descriptor.
>>
>

Hmm, not clear to me that you want async=true. Assuming that your
AE runs correctly as a single threaded aggregate, creating multiple
instances of this seems fine. The correction to your previous deployment
descriptor would just be:

          <analysisEngine key="TextAnalysis" async="false">
              <scaleout numberOfInstances="8" />
          </analysisEngine>

>From UIMA AS point of view, this component is not a CasMultiplier
because [I assume] it comsumes new CASes internally and does not
return them.

Let emphasize that before AS scaleout the aggregate should be tested
as a simple UIMA aggregate with the normal tools like CVD, runAE,
or a custom driver.

> Ok, I changed it to fit to case described above:
>           <analysisEngine>
>               <delegates>
>                   <analysisEngine key="HBaseCasMultiplier">
>                       <casMultiplier poolSize="4"/>
>                       <scaleout numberOfInstances="2" />
>                   </analysisEngine>
>                   <analysisEngine key="Tokenizer">
>                       <scaleout numberOfInstances="4" />
>                   </analysisEngine>
>                   <analysisEngine key="HBaseWriter">
>                       <scaleout numberOfInstances="4" />
>                   </analysisEngine>
>               </delegates>
>           </analysisEngine>
>
> I would like to scale the HBaseCasMultiplier to more threads
> then two, because there is a short delay when reading from hbase.
> First I am not sure which value I should choose for the
> Cas Multiplier pool size. If the numberOfInstances get larger
> then two I get a few exceptions (stack trace below) when UIMA AS
> starts to process the first documents. So I think I am doing something
> wrong here. And what is the minimal possible casPoolSize, since
> I need CAS instances for my 4 Tokenizers, 4 HBaseWriters
> and 4 (?) for the CAS Multiplier, which would result in a minimum
> size of 12, right ?
>
> The HBaseCasMultiplier gets one CAS which contains the id and
> then outputs one CAS which contains an actual text.
>

Supporting the complexities raised by Cas multipliers has been quite
challenging. I'm pretty sure that a co-located CM cannot be scaled; we
need to check this and clarify the situation. (This is different from having
more than one CM in the same aggregate, which is supported with
the latest code.)

Here is a possible workaround to run this aggregate asynchronously.
If I understand your scenario, each input Cas is tiny, the CM creates
a new Cas with the document to be processed and consumed by
HBaseWriters, and finally the aggregate returns just the tiny input Cas.
The workaround is to have the CM create a new Cas, but not fetch
the document. Add a new delegate immediately following the CM,
say CorpusReader, which fills the new CASes with documents and can
be scaled out as desired.

Regards,
Eddie

Re: UIMA-AS and CasManager.defineCasPool() was called twice by the same Analysis Engine

Posted by Jörn Kottmann <ko...@gmail.com>.
Thanks for your reply Jaroslaw, it seems that I misunderstood
the way UIMA AS works.

> 1)
> "... Because the AAE is not thread safe uima as must scale it through
> creating multiple instances of it..."
>
> Since the AAE is not thread safe you should not try to scale it out in the
> same JVM. If AAE
> is not thread safe, you should only have one instance of it per JVM. You can
> scale it by
> starting multiple JVMs.
>   
I reduced my AAE to three delegate AEs:

1. HBaseCasMultiplier -> fetches the actual text from hbase
2. Tokenizer -> adds tokens to my CAS
3. HBaseWrite -> writes the tokens back into hbase

These delegates are not thread safe, to scale these AEs
one instance per worker thread must be created.
Thats what I want UIMA AS to do for me, so I think thats
also the case which is described in the documentation in 1.4.1:

"... The classes for annotators and flow controllers do not need to be 
"thread-safe"
with respect to their instance data - meaning, they do not need to be 
implemented
with synchronization locks for access to their instance data, because 
each instance
will only be called using one thread at a time. Scale out for these 
classes is done using
multiple instances of the class. ..."

> 2)
> "...I must admit the documentation confused me a bit about the meaning of
> the async attribute..."
>
> The async attribute is only used for aggregates, and specifies that this
> aggregate will be run asynchronously (with input queues in front of all of
> its delegates) or not. If you choose async="false" it means that you want to
> deploy the aggregate synchronously. Meaning it will be single-threaded. To
> UIMA AS a synchronous aggregate is the same as a
> UIMA primitive AE.
>   
Thanks, understood the difference, so I want async="true"

> 3)            ...
>             <analysisEngine key="TextAnalysis" async="false">
>                 <scaleout numberOfInstances="8" />
>
>                 <delegates>
>                     <analysisEngine key="HBaseCasMultiplier">
>                         <casMultiplier poolSize="8"/>
>                     </analysisEngine>
>                 </delegates>
>             </analysisEngine>
>             ...
>
> The above is an inconsistent configuration.  You are specifying that
> "TextAnalytics" should be deployed synchronously but then adding delegate
> configuration, which forces the aggregate to be deployed asynchronously.
> Synchronous aggregate delegate's are not "visible" to the uima-as, and
> cannot be configured in the deployment descriptor.
>   
Ok, I changed it to fit to case described above:
            <analysisEngine>
                <delegates>
                    <analysisEngine key="HBaseCasMultiplier">
                        <casMultiplier poolSize="4"/>
                        <scaleout numberOfInstances="2" />
                    </analysisEngine>
                    <analysisEngine key="Tokenizer">
                        <scaleout numberOfInstances="4" />
                    </analysisEngine>
                    <analysisEngine key="HBaseWriter">
                        <scaleout numberOfInstances="4" />
                    </analysisEngine>
                </delegates>
            </analysisEngine>

I would like to scale the HBaseCasMultiplier to more threads
then two, because there is a short delay when reading from hbase.
First I am not sure which value I should choose for the
Cas Multiplier pool size. If the numberOfInstances get larger
then two I get a few exceptions (stack trace below) when UIMA AS
starts to process the first documents. So I think I am doing something
wrong here. And what is the minimal possible casPoolSize, since
I need CAS instances for my 4 Tokenizers, 4 HBaseWriters
and 4 (?) for the CAS Multiplier, which would result in a minimum
size of 12, right ?

The HBaseCasMultiplier gets one CAS which contains the id and
then outputs one CAS which contains an actual text.

Here is the full stack trace for the exception I get now:
org.apache.uima.UIMARuntimeException: AnalysisComponent 
"/HBaseCasMultiplier/" requested more CASes (2) than defined in its 
getCasInstancesRequired() method (1).  It is possible that the 
AnalysisComponent is not properly releasing CASes when it encounters an 
error.
    at 
org.apache.uima.impl.UimaContext_ImplBase.getEmptyCas(UimaContext_ImplBase.java:575)
    at 
org.apache.uima.analysis_component.CasMultiplier_ImplBase.getEmptyCAS(CasMultiplier_ImplBase.java:109)
    at 
dk.infopaq.nlp.repository.connector.HBaseReadCasMultiplier.hasNext(HBaseReadCasMultiplier.java:107)
    at 
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl$AnalysisComponentCasIterator.hasNext(PrimitiveAnalysisEngine_impl.java:563)
    at 
org.apache.uima.aae.controller.PrimitiveAnalysisEngineController_impl.process(PrimitiveAnalysisEngineController_impl.java:388)
    at 
org.apache.uima.aae.handler.HandlerBase.invokeProcess(HandlerBase.java:130)
    at 
org.apache.uima.aae.handler.input.ProcessRequestHandler_impl.handleProcessRequestWithCASReference(ProcessRequestHandler_impl.java:655)
    at 
org.apache.uima.aae.handler.input.ProcessRequestHandler_impl.handle(ProcessRequestHandler_impl.java:887)
    at 
org.apache.uima.aae.spi.transport.vm.UimaVmMessageListener.onMessage(UimaVmMessageListener.java:99)
    at 
org.apache.uima.aae.spi.transport.vm.UimaVmMessageDispatcher$1.run(UimaVmMessageDispatcher.java:66)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at 
org.apache.uima.aae.UimaAsThreadFactory$1.run(UimaAsThreadFactory.java:69)
    at java.lang.Thread.run(Thread.java:619)
CASAdminException: Can't flush CAS, flushing is disabled.
    at org.apache.uima.cas.impl.CASImpl.reset(CASImpl.java:850)
    at org.apache.uima.util.CasPool.releaseCas(CasPool.java:228)
    at 
org.apache.uima.resource.impl.CasManager_impl.releaseCas(CasManager_impl.java:141)
    at 
org.apache.uima.cas.AbstractCas_ImplBase.release(AbstractCas_ImplBase.java:35)
    at org.apache.uima.cas.impl.CASImpl.release(CASImpl.java:3561)
    at org.apache.uima.cas.impl.CASImpl.release(CASImpl.java:3559)
    at 
org.apache.uima.aae.controller.BaseAnalysisEngineController.dropCAS(BaseAnalysisEngineController.java:1044)
    at 
org.apache.uima.aae.controller.BaseAnalysisEngineController.dropCAS(BaseAnalysisEngineController.java:1269)
    at 
org.apache.uima.aae.controller.AggregateAnalysisEngineController_impl.dropCAS(AggregateAnalysisEngineController_impl.java:318)
    at 
org.apache.uima.aae.controller.BaseAnalysisEngineController.handleAction(BaseAnalysisEngineController.java:1212)
    at 
org.apache.uima.aae.controller.AggregateAnalysisEngineController_impl.takeAction(AggregateAnalysisEngineController_impl.java:533)
    at 
org.apache.uima.aae.error.handler.ProcessCasErrorHandler.handleError(ProcessCasErrorHandler.java:566)
    at 
org.apache.uima.aae.error.ErrorHandlerChain.handle(ErrorHandlerChain.java:64)
    at 
org.apache.uima.aae.handler.input.ProcessResponseHandler.handleProcessResponseWithException(ProcessResponseHandler.java:544)
    at 
org.apache.uima.aae.handler.input.ProcessResponseHandler.handle(ProcessResponseHandler.java:644)
    at 
org.apache.uima.aae.handler.HandlerBase.delegate(HandlerBase.java:158)
    at 
org.apache.uima.aae.handler.input.ProcessRequestHandler_impl.handle(ProcessRequestHandler_impl.java:927)
    at 
org.apache.uima.aae.spi.transport.vm.UimaVmMessageListener.onMessage(UimaVmMessageListener.java:99)
    at 
org.apache.uima.aae.spi.transport.vm.UimaVmMessageDispatcher$1.run(UimaVmMessageDispatcher.java:66)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)

Thanks for your help,
Jörn

Re: UIMA-AS and CasManager.defineCasPool() was called twice by the same Analysis Engine

Posted by Jaroslaw Cwiklik <ui...@gmail.com>.
Jorn, there a couple of problems here:

1)
"... Because the AAE is not thread safe uima as must scale it through
creating multiple instances of it..."

Since the AAE is not thread safe you should not try to scale it out in the
same JVM. If AAE
is not thread safe, you should only have one instance of it per JVM. You can
scale it by
starting multiple JVMs.

2)
"...I must admit the documentation confused me a bit about the meaning of
the async attribute..."

The async attribute is only used for aggregates, and specifies that this
aggregate will be run asynchronously (with input queues in front of all of
its delegates) or not. If you choose async="false" it means that you want to
deploy the aggregate synchronously. Meaning it will be single-threaded. To
UIMA AS a synchronous aggregate is the same as a
UIMA primitive AE.

3)            ...
            <analysisEngine key="TextAnalysis" async="false">
                <scaleout numberOfInstances="8" />

                <delegates>
                    <analysisEngine key="HBaseCasMultiplier">
                        <casMultiplier poolSize="8"/>
                    </analysisEngine>
                </delegates>
            </analysisEngine>
            ...

The above is an inconsistent configuration.  You are specifying that
"TextAnalytics" should be deployed synchronously but then adding delegate
configuration, which forces the aggregate to be deployed asynchronously.
Synchronous aggregate delegate's are not "visible" to the uima-as, and
cannot be configured in the deployment descriptor.

The stack trace you've submitted seems incomplete to determine what really
happened.

Regards, Jerry C
On Fri, Jun 19, 2009 at 9:56 AM, Jörn Kottmann <ko...@gmail.com> wrote:

> Hello everyone,
>
> I have been using uima as already for tagging text with a custom AAE,
> though I did not scaled the AAE because I run in a few issues back then and
> had no time to solve them.
>
> Now I tried again to scale the AAE and failed again. The AAE gets a
> document id
> which is sent to it via uimaj-as-camel component. A cas multiplier then
> fetches the
> actual document out of a database and thats also the component which causes
> trouble.
>
> Because the AAE is not thread safe uima as must scale it through creating
> multiple
> instances of it.
> After reading through the uima as documentation I came up with this
> deployment descriptor:
>           ...
>           <analysisEngine key="TextAnalysis" async="false">
>               <scaleout numberOfInstances="8" />
>
>               <delegates>
>                   <analysisEngine key="HBaseCasMultiplier">
>                       <casMultiplier poolSize="8"/>
>                   </analysisEngine>
>               </delegates>
>           </analysisEngine>
>           ...
>
> I must admit the documentation confused me a bit about the meaning of the
> async attribute.
> Is it correct that async=false means that uima as creates multiple
> instances which are each called
> from one worker thread ? And async=true would then mean that one AE is
> called by multiple threads.
>
> If the numberOfInstacnes is larger then 1 I always get this exception:
> Caused by: org.apache.uima.UIMARuntimeException: The method
> CasManager.defineCasPool() was called twice by the same Analysis Engine
> (/HBaseCasMultiplier/).
>   at
> org.apache.uima.resource.impl.CasManager_impl.defineCasPool(CasManager_impl.java:181)
>   at
> org.apache.uima.resource.impl.CasManager_impl.defineCasPool(CasManager_impl.java:161)
>   at
> org.apache.uima.aae.EECasManager_impl.defineCasPool(EECasManager_impl.java:75)
>   at
> org.apache.uima.impl.UimaContext_ImplBase.getEmptyCas(UimaContext_ImplBase.java:565)
>   at
> org.apache.uima.analysis_component.CasMultiplier_ImplBase.getEmptyCAS(CasMultiplier_ImplBase.java:109)
>   at
> dk.infopaq.nlp.repository.connector.HBaseReadCasMultiplier.hasNext(HBaseReadCasMultiplier.java:107)
>   at
> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl$AnalysisComponentCasIterator.hasNext(PrimitiveAnalysisEngine_impl.java:563)
>   at
> org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:566)
>   ... 20 more
>
>
> A while back I had a problem which resulted in the same exception message,
> but I was solved by updating UIMA to the current 2.3.0-SNAPSHOT:
> http://www.mail-archive.com/uima-user@incubator.apache.org/msg02054.html
>
> The version I am using is 2.3.0-SNAPSHOT from mid of may.
>
> Thanks,
> Jörn
>