You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Eric VACHON <er...@temis.com> on 2007/03/26 15:04:00 UTC

EntityProcessStatus.isEntitySkipped() always returns false

Hi

As Olivier already described it in the alphaworks forum, we have a simple CPE composed of a collection reader, an analysis engine (named ERROR) remotely deployed as a Vinci service and an integrated
consumer. Our error handling parameters are set to "continue"

The descriptor looks like:

<?xml version="1.0" encoding="UTF-8"?>
<cpeDescription>
    <collectionReader>
        <collectionIterator>
            <descriptor>
                <include href="SimpleCollectionReader.xml"/>
            </descriptor>
            <configurationParameterSettings />
        </collectionIterator>
    </collectionReader>
    <casProcessors casPoolSize="2" processingUnitThreadCount="1">
        <casProcessor deployment="remote" name="ERROR">
            <descriptor>
                <include href="ERROR.xml"/>
            </descriptor>
            <filter/>
            <errorHandling>
                <errorRateThreshold action="continue" value="2/5"/>
                <maxConsecutiveRestarts action="continue" value="2"/>
                <timeout max="300000"/>
            </errorHandling>
            <checkpoint batch="1"/>
            <deploymentParameters>
                <parameter name="vnsHost" type="string" value="localhost"/>
                <parameter name="vnsPort" type="string" value="9000"/>
                <parameter name="service-access" type="string" value="exclusive"/>
            </deploymentParameters>
        </casProcessor>
        <casProcessor deployment="integrated" name="SimpleCasConsumer">
            <descriptor>
                <include href="SimpleCasConsumer.xml"/>
            </descriptor>
            <deploymentParameters/>
            <filter/>
            <errorHandling>
                <errorRateThreshold action="continue" value="2/5"/>
                <maxConsecutiveRestarts action="continue" value="2"/>
                <timeout max="300000"/>
            </errorHandling>
            <checkpoint batch="1"/>
        </casProcessor>
    </casProcessors>
    <cpeConfig>
        <numToProcess>-1</numToProcess>
        <deployAs>immediate</deployAs>
        <checkpoint file="" time="300000"/>
        <timerImpl/>
    </cpeConfig>
</cpeDescription>


The ERROR cas processor generates AnnotatorProcessException when the processing of some particular documents ( CAS ) went wrong.

We have a StatusCallbackListener with a entityProcessComplete method implemented like that:

public void entityProcessComplete(CAS aCas, EntityProcessStatus aStatus) {
		if (aStatus.isEntitySkipped()) {
				System.err.println("The CAS has been skipped by at least one component");
		}
		if (aStatus.isException()) {
			List exceptions = aStatus.getExceptions();
			for (int i = 0; i < exceptions.size(); i++) {
				((Throwable) exceptions.get(i)).printStackTrace(System.err);
			}
			return;
		}
		entityCount++;
		size += aCas.getTCAS().getDocumentText().length();
	}


Our ERROR cas processor generates exceptions that are nicely detected by the aStatus.getExceptions() call. But after the number of 'retries' is reached for the processor (this number is set to 2 as
you can see in the descriptor), the entityProcessComplete is called with a aStatus.isException() set to false which is the expected behaviour but the aStatus.isEntitySkipped() is also set to false
which is not the expected value because the CAS has actually been skipped by the ERROR processor.

We have this bug in the UIMA 1.4.3 version and we have to stick with it for some more time. But we will upgrade to the apache version soon and we would like to know if this issue has been fixed in 2.1.

Can you confirm that it is a known bug and if it is already fixed ?

Thanks a lot

-- 
Eric Vachon
Applicative Development Leader
TEMIS
193-197 rue de Bercy
75012 PARIS

Re: EntityProcessStatus.isEntitySkipped() always returns false

Posted by Marshall Schor <ms...@schor.com>.
Hi Eric -

This appears to be a new bug, unknown before.  We think it is *not* 
fixed in the 2.1 release.   We'll work on getting it fixed in the next 
release.

Thanks for reporting it.  -Marshall

Eric VACHON wrote:
> Hi
>
> As Olivier already described it in the alphaworks forum, we have a simple CPE composed of a collection reader, an analysis engine (named ERROR) remotely deployed as a Vinci service and an integrated
> consumer. Our error handling parameters are set to "continue"
>
> The descriptor looks like:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <cpeDescription>
>     <collectionReader>
>         <collectionIterator>
>             <descriptor>
>                 <include href="SimpleCollectionReader.xml"/>
>             </descriptor>
>             <configurationParameterSettings />
>         </collectionIterator>
>     </collectionReader>
>     <casProcessors casPoolSize="2" processingUnitThreadCount="1">
>         <casProcessor deployment="remote" name="ERROR">
>             <descriptor>
>                 <include href="ERROR.xml"/>
>             </descriptor>
>             <filter/>
>             <errorHandling>
>                 <errorRateThreshold action="continue" value="2/5"/>
>                 <maxConsecutiveRestarts action="continue" value="2"/>
>                 <timeout max="300000"/>
>             </errorHandling>
>             <checkpoint batch="1"/>
>             <deploymentParameters>
>                 <parameter name="vnsHost" type="string" value="localhost"/>
>                 <parameter name="vnsPort" type="string" value="9000"/>
>                 <parameter name="service-access" type="string" value="exclusive"/>
>             </deploymentParameters>
>         </casProcessor>
>         <casProcessor deployment="integrated" name="SimpleCasConsumer">
>             <descriptor>
>                 <include href="SimpleCasConsumer.xml"/>
>             </descriptor>
>             <deploymentParameters/>
>             <filter/>
>             <errorHandling>
>                 <errorRateThreshold action="continue" value="2/5"/>
>                 <maxConsecutiveRestarts action="continue" value="2"/>
>                 <timeout max="300000"/>
>             </errorHandling>
>             <checkpoint batch="1"/>
>         </casProcessor>
>     </casProcessors>
>     <cpeConfig>
>         <numToProcess>-1</numToProcess>
>         <deployAs>immediate</deployAs>
>         <checkpoint file="" time="300000"/>
>         <timerImpl/>
>     </cpeConfig>
> </cpeDescription>
>
>
> The ERROR cas processor generates AnnotatorProcessException when the processing of some particular documents ( CAS ) went wrong.
>
> We have a StatusCallbackListener with a entityProcessComplete method implemented like that:
>
> public void entityProcessComplete(CAS aCas, EntityProcessStatus aStatus) {
> 		if (aStatus.isEntitySkipped()) {
> 				System.err.println("The CAS has been skipped by at least one component");
> 		}
> 		if (aStatus.isException()) {
> 			List exceptions = aStatus.getExceptions();
> 			for (int i = 0; i < exceptions.size(); i++) {
> 				((Throwable) exceptions.get(i)).printStackTrace(System.err);
> 			}
> 			return;
> 		}
> 		entityCount++;
> 		size += aCas.getTCAS().getDocumentText().length();
> 	}
>
>
> Our ERROR cas processor generates exceptions that are nicely detected by the aStatus.getExceptions() call. But after the number of 'retries' is reached for the processor (this number is set to 2 as
> you can see in the descriptor), the entityProcessComplete is called with a aStatus.isException() set to false which is the expected behaviour but the aStatus.isEntitySkipped() is also set to false
> which is not the expected value because the CAS has actually been skipped by the ERROR processor.
>
> We have this bug in the UIMA 1.4.3 version and we have to stick with it for some more time. But we will upgrade to the apache version soon and we would like to know if this issue has been fixed in 2.1.
>
> Can you confirm that it is a known bug and if it is already fixed ?
>
> Thanks a lot
>
>