You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by David Thibault <da...@gmail.com> on 2011/02/03 18:06:40 UTC

StringIndexOutOfBoundsException using Solrcas

Hello all,

First off, I apologize for sending this to both the user and dev lists, but
I'm not sure which list should get it.  This is my first email to either
list.

I am working with UIMA and Solrcas and I'm getting this error:
org.apache.uima.analysis_engine.AnalysisEngineProcessException
    at
org.apache.uima.solrcas.SolrCASConsumer.process(SolrCASConsumer.java:138)
    at
org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48)
    at
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:377)
    at
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:295)
    at
org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267)
    at
org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.processNext(ProcessingUnit.java:897)
    at
org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.run(ProcessingUnit.java:577)
Caused by: java.lang.StringIndexOutOfBoundsException: String index out of
range: -1
    at java.lang.String.substring(String.java:1931)
    at
org.apache.uima.jcas.tcas.Annotation.getCoveredText(Annotation.java:119)
    at
org.apache.uima.solrcas.SolrCASConsumer.process(SolrCASConsumer.java:126)
    ... 6 more
org.apache.uima.analysis_engine.AnalysisEngineProcessException
    at
org.apache.uima.solrcas.SolrCASConsumer.process(SolrCASConsumer.java:138)
    at
org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48)
    at
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:377)
    at
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:295)
    at
org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267)
    at
org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.processNext(ProcessingUnit.java:897)
    at
org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.run(ProcessingUnit.java:577)
Caused by: java.lang.StringIndexOutOfBoundsException: String index out of
range: -1
    at java.lang.String.substring(String.java:1931)
    at
org.apache.uima.jcas.tcas.Annotation.getCoveredText(Annotation.java:119)
    at
org.apache.uima.solrcas.SolrCASConsumer.process(SolrCASConsumer.java:126)
    ... 6 more

I edited SolrCASConsumer with the following lines right before line 126:
       Annotation fsTemp = (Annotation) fs;
       System.out.println("Processing Annotation: " + fsTemp.toString());

Therefore, now right before it calls fs.getCoveredText() it prints this:
Processing Annotation: Phrase
   sofa: _InitialView
   begin: -1
   end: 60
   candidates: FSArray
   mappings: FSArray

Therefore, it's obvious why it's saying the string index is out of bounds.
However, I'm not sure why it's getting those values from my analysis
engine.  I'm using MetaMapAEApi from the NIH's MetaMap project.

This is the first phrase it is processing on this document and the first
time int prints that subsection of debug tex.  If I use the same AE in
DocumentAnalyzer it correctly shows the first Document as starting on
position 0 and ending on position 191, with the first phrase as being from
positions 0 to 7.

I'm trying to run this in the CPE GUI with the following CPEDescriptor.xml:
<?xml version="1.0" encoding="UTF-8"?>
<cpeDescription xmlns="http://uima.apache.org/resourceSpecifier">
    <collectionReader>
        <collectionIterator>
            <descriptor>
                <import
location="../../../../../../../usr/local/apache-uima/examples/descriptors/collection_reader/FileSystemCollectionReader.xml"/>
            </descriptor>
            <configurationParameterSettings>
                <nameValuePair>
                    <name>InputDirectory</name>
                    <value>

<string>/Users/davidt/Documents/workspace/BioSearch/resources/test_input</string>
                    </value>
                </nameValuePair>
            </configurationParameterSettings>
        </collectionIterator>
    </collectionReader>
    <casProcessors casPoolSize="3" processingUnitThreadCount="1">
        <casProcessor deployment="integrated" name="MetaMapApiAE">
            <descriptor>
                <import location="../../../MetaMap UIMA
Annotator/descriptors/MetaMapApiAE.xml"/>
            </descriptor>
            <deploymentParameters/>
            <errorHandling>
                <errorRateThreshold action="terminate" value="0/1000"/>
                <maxConsecutiveRestarts action="terminate" value="30"/>
                <timeout max="100000" default="-1"/>
            </errorHandling>
            <checkpoint batch="10000" time="1000ms"/>
            <configurationParameterSettings>
                <nameValuePair>
                    <name>tempdir_path</name>
                    <value>
                        <string>/Users/davidt/tmp</string>
                    </value>
                </nameValuePair>
            </configurationParameterSettings>
        </casProcessor>
        <casProcessor deployment="integrated" name="SolrcasAE.xml">
            <descriptor>
                <import
location="../../../Apache_UIMA_Sandbox/Solrcas/desc/SolrcasAE.xml"/>
            </descriptor>
            <deploymentParameters/>
            <errorHandling>
                <errorRateThreshold action="terminate" value="0/1000"/>
                <maxConsecutiveRestarts action="terminate" value="30"/>
                <timeout max="100000" default="-1"/>
            </errorHandling>
            <checkpoint batch="10000" time="1000ms"/>
        </casProcessor>
    </casProcessors>
    <cpeConfig>
        <numToProcess>-1</numToProcess>
        <deployAs>immediate</deployAs>
        <checkpoint batch="0" time="300000ms"/>
        <timerImpl/>
    </cpeConfig>
</cpeDescription>

I'm at a loss as to where that -1 is coming from or how to debug it
further.  Any ideas would be greatly appreciated.

Best,
Dave

Re: StringIndexOutOfBoundsException using Solrcas

Posted by David Thibault <da...@gmail.com>.
Actually, I just tried it with the Annotation Printer instead of Solrcas and
it got the same exception.  I will back up and troubleshoot this by
inspecting the output of the MetaMapApiAE.

Dave


On Thu, Feb 3, 2011 at 12:06 PM, David Thibault
<da...@gmail.com>wrote:

> Hello all,
>
> First off, I apologize for sending this to both the user and dev lists, but
> I'm not sure which list should get it.  This is my first email to either
> list.
>
> I am working with UIMA and Solrcas and I'm getting this error:
> org.apache.uima.analysis_engine.AnalysisEngineProcessException
>     at
> org.apache.uima.solrcas.SolrCASConsumer.process(SolrCASConsumer.java:138)
>     at
> org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48)
>     at
> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:377)
>     at
> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:295)
>     at
> org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267)
>     at
> org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.processNext(ProcessingUnit.java:897)
>     at
> org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.run(ProcessingUnit.java:577)
> Caused by: java.lang.StringIndexOutOfBoundsException: String index out of
> range: -1
>     at java.lang.String.substring(String.java:1931)
>     at
> org.apache.uima.jcas.tcas.Annotation.getCoveredText(Annotation.java:119)
>     at
> org.apache.uima.solrcas.SolrCASConsumer.process(SolrCASConsumer.java:126)
>     ... 6 more
> org.apache.uima.analysis_engine.AnalysisEngineProcessException
>     at
> org.apache.uima.solrcas.SolrCASConsumer.process(SolrCASConsumer.java:138)
>     at
> org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48)
>     at
> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:377)
>     at
> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:295)
>     at
> org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267)
>     at
> org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.processNext(ProcessingUnit.java:897)
>     at
> org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.run(ProcessingUnit.java:577)
> Caused by: java.lang.StringIndexOutOfBoundsException: String index out of
> range: -1
>     at java.lang.String.substring(String.java:1931)
>     at
> org.apache.uima.jcas.tcas.Annotation.getCoveredText(Annotation.java:119)
>     at
> org.apache.uima.solrcas.SolrCASConsumer.process(SolrCASConsumer.java:126)
>     ... 6 more
>
> I edited SolrCASConsumer with the following lines right before line 126:
>        Annotation fsTemp = (Annotation) fs;
>        System.out.println("Processing Annotation: " + fsTemp.toString());
>
> Therefore, now right before it calls fs.getCoveredText() it prints this:
> Processing Annotation: Phrase
>    sofa: _InitialView
>    begin: -1
>    end: 60
>    candidates: FSArray
>    mappings: FSArray
>
> Therefore, it's obvious why it's saying the string index is out of bounds.
> However, I'm not sure why it's getting those values from my analysis
> engine.  I'm using MetaMapAEApi from the NIH's MetaMap project.
>
> This is the first phrase it is processing on this document and the first
> time int prints that subsection of debug tex.  If I use the same AE in
> DocumentAnalyzer it correctly shows the first Document as starting on
> position 0 and ending on position 191, with the first phrase as being from
> positions 0 to 7.
>
> I'm trying to run this in the CPE GUI with the following CPEDescriptor.xml:
> <?xml version="1.0" encoding="UTF-8"?>
> <cpeDescription xmlns="http://uima.apache.org/resourceSpecifier">
>     <collectionReader>
>         <collectionIterator>
>             <descriptor>
>                 <import
> location="../../../../../../../usr/local/apache-uima/examples/descriptors/collection_reader/FileSystemCollectionReader.xml"/>
>             </descriptor>
>             <configurationParameterSettings>
>                 <nameValuePair>
>                     <name>InputDirectory</name>
>                     <value>
>
> <string>/Users/davidt/Documents/workspace/BioSearch/resources/test_input</string>
>                     </value>
>                 </nameValuePair>
>             </configurationParameterSettings>
>         </collectionIterator>
>     </collectionReader>
>     <casProcessors casPoolSize="3" processingUnitThreadCount="1">
>         <casProcessor deployment="integrated" name="MetaMapApiAE">
>             <descriptor>
>                 <import location="../../../MetaMap UIMA
> Annotator/descriptors/MetaMapApiAE.xml"/>
>             </descriptor>
>             <deploymentParameters/>
>             <errorHandling>
>                 <errorRateThreshold action="terminate" value="0/1000"/>
>                 <maxConsecutiveRestarts action="terminate" value="30"/>
>                 <timeout max="100000" default="-1"/>
>             </errorHandling>
>             <checkpoint batch="10000" time="1000ms"/>
>             <configurationParameterSettings>
>                 <nameValuePair>
>                     <name>tempdir_path</name>
>                     <value>
>                         <string>/Users/davidt/tmp</string>
>                     </value>
>                 </nameValuePair>
>             </configurationParameterSettings>
>         </casProcessor>
>         <casProcessor deployment="integrated" name="SolrcasAE.xml">
>             <descriptor>
>                 <import
> location="../../../Apache_UIMA_Sandbox/Solrcas/desc/SolrcasAE.xml"/>
>             </descriptor>
>             <deploymentParameters/>
>             <errorHandling>
>                 <errorRateThreshold action="terminate" value="0/1000"/>
>                 <maxConsecutiveRestarts action="terminate" value="30"/>
>                 <timeout max="100000" default="-1"/>
>             </errorHandling>
>             <checkpoint batch="10000" time="1000ms"/>
>         </casProcessor>
>     </casProcessors>
>     <cpeConfig>
>         <numToProcess>-1</numToProcess>
>         <deployAs>immediate</deployAs>
>         <checkpoint batch="0" time="300000ms"/>
>         <timerImpl/>
>     </cpeConfig>
> </cpeDescription>
>
> I'm at a loss as to where that -1 is coming from or how to debug it
> further.  Any ideas would be greatly appreciated.
>
> Best,
> Dave
>
>

Re: StringIndexOutOfBoundsException using Solrcas

Posted by da...@gmail.com.
I added a couple of print statements for debugging purposes.  The same Exception comes up when using the AnnotationPrinter as well, and if I use a different AE besides the MetaMap one with either of these consumers it doesn't have the problem, so the problem is in the MetaMap UIMA annotator, not Solrcas.  Sorry about the false alarm.  I have sent a question to the authors of MetaMap.

Best,
Dave

-----Original Message-----
From: Jörn Kottmann <ko...@gmail.com>
Date: Fri, 04 Feb 2011 10:06:36 
To: <de...@uima.apache.org>
Reply-To: dev@uima.apache.org
Subject: Re: StringIndexOutOfBoundsException using Solrcas

On 2/3/11 6:06 PM, David Thibault wrote:
> Hello all,
>
> First off, I apologize for sending this to both the user and dev lists, but
> I'm not sure which list should get it.  This is my first email to either
> list.
>
> I am working with UIMA and Solrcas and I'm getting this error:
> org.apache.uima.analysis_engine.AnalysisEngineProcessException
>      at
> org.apache.uima.solrcas.SolrCASConsumer.process(SolrCASConsumer.java:138)
>      at

Looked at line 138 in SolrCasConsumer, but that is at the end of the 
file and
not inside the process method as the stack trace says. That means that 
the code
your are running and the trunk code is different. Did you change the 
code? Or
are you using and older version?

Jörn

Re: StringIndexOutOfBoundsException using Solrcas

Posted by Jörn Kottmann <ko...@gmail.com>.
On 2/3/11 6:06 PM, David Thibault wrote:
> Hello all,
>
> First off, I apologize for sending this to both the user and dev lists, but
> I'm not sure which list should get it.  This is my first email to either
> list.
>
> I am working with UIMA and Solrcas and I'm getting this error:
> org.apache.uima.analysis_engine.AnalysisEngineProcessException
>      at
> org.apache.uima.solrcas.SolrCASConsumer.process(SolrCASConsumer.java:138)
>      at

Looked at line 138 in SolrCasConsumer, but that is at the end of the 
file and
not inside the process method as the stack trace says. That means that 
the code
your are running and the trunk code is different. Did you change the 
code? Or
are you using and older version?

Jörn

Re: StringIndexOutOfBoundsException using Solrcas

Posted by David Thibault <da...@gmail.com>.
Actually, I just tried it with the Annotation Printer instead of Solrcas and
it got the same exception.  I will back up and troubleshoot this by
inspecting the output of the MetaMapApiAE.

Dave


On Thu, Feb 3, 2011 at 12:06 PM, David Thibault
<da...@gmail.com>wrote:

> Hello all,
>
> First off, I apologize for sending this to both the user and dev lists, but
> I'm not sure which list should get it.  This is my first email to either
> list.
>
> I am working with UIMA and Solrcas and I'm getting this error:
> org.apache.uima.analysis_engine.AnalysisEngineProcessException
>     at
> org.apache.uima.solrcas.SolrCASConsumer.process(SolrCASConsumer.java:138)
>     at
> org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48)
>     at
> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:377)
>     at
> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:295)
>     at
> org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267)
>     at
> org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.processNext(ProcessingUnit.java:897)
>     at
> org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.run(ProcessingUnit.java:577)
> Caused by: java.lang.StringIndexOutOfBoundsException: String index out of
> range: -1
>     at java.lang.String.substring(String.java:1931)
>     at
> org.apache.uima.jcas.tcas.Annotation.getCoveredText(Annotation.java:119)
>     at
> org.apache.uima.solrcas.SolrCASConsumer.process(SolrCASConsumer.java:126)
>     ... 6 more
> org.apache.uima.analysis_engine.AnalysisEngineProcessException
>     at
> org.apache.uima.solrcas.SolrCASConsumer.process(SolrCASConsumer.java:138)
>     at
> org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48)
>     at
> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:377)
>     at
> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:295)
>     at
> org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267)
>     at
> org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.processNext(ProcessingUnit.java:897)
>     at
> org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.run(ProcessingUnit.java:577)
> Caused by: java.lang.StringIndexOutOfBoundsException: String index out of
> range: -1
>     at java.lang.String.substring(String.java:1931)
>     at
> org.apache.uima.jcas.tcas.Annotation.getCoveredText(Annotation.java:119)
>     at
> org.apache.uima.solrcas.SolrCASConsumer.process(SolrCASConsumer.java:126)
>     ... 6 more
>
> I edited SolrCASConsumer with the following lines right before line 126:
>        Annotation fsTemp = (Annotation) fs;
>        System.out.println("Processing Annotation: " + fsTemp.toString());
>
> Therefore, now right before it calls fs.getCoveredText() it prints this:
> Processing Annotation: Phrase
>    sofa: _InitialView
>    begin: -1
>    end: 60
>    candidates: FSArray
>    mappings: FSArray
>
> Therefore, it's obvious why it's saying the string index is out of bounds.
> However, I'm not sure why it's getting those values from my analysis
> engine.  I'm using MetaMapAEApi from the NIH's MetaMap project.
>
> This is the first phrase it is processing on this document and the first
> time int prints that subsection of debug tex.  If I use the same AE in
> DocumentAnalyzer it correctly shows the first Document as starting on
> position 0 and ending on position 191, with the first phrase as being from
> positions 0 to 7.
>
> I'm trying to run this in the CPE GUI with the following CPEDescriptor.xml:
> <?xml version="1.0" encoding="UTF-8"?>
> <cpeDescription xmlns="http://uima.apache.org/resourceSpecifier">
>     <collectionReader>
>         <collectionIterator>
>             <descriptor>
>                 <import
> location="../../../../../../../usr/local/apache-uima/examples/descriptors/collection_reader/FileSystemCollectionReader.xml"/>
>             </descriptor>
>             <configurationParameterSettings>
>                 <nameValuePair>
>                     <name>InputDirectory</name>
>                     <value>
>
> <string>/Users/davidt/Documents/workspace/BioSearch/resources/test_input</string>
>                     </value>
>                 </nameValuePair>
>             </configurationParameterSettings>
>         </collectionIterator>
>     </collectionReader>
>     <casProcessors casPoolSize="3" processingUnitThreadCount="1">
>         <casProcessor deployment="integrated" name="MetaMapApiAE">
>             <descriptor>
>                 <import location="../../../MetaMap UIMA
> Annotator/descriptors/MetaMapApiAE.xml"/>
>             </descriptor>
>             <deploymentParameters/>
>             <errorHandling>
>                 <errorRateThreshold action="terminate" value="0/1000"/>
>                 <maxConsecutiveRestarts action="terminate" value="30"/>
>                 <timeout max="100000" default="-1"/>
>             </errorHandling>
>             <checkpoint batch="10000" time="1000ms"/>
>             <configurationParameterSettings>
>                 <nameValuePair>
>                     <name>tempdir_path</name>
>                     <value>
>                         <string>/Users/davidt/tmp</string>
>                     </value>
>                 </nameValuePair>
>             </configurationParameterSettings>
>         </casProcessor>
>         <casProcessor deployment="integrated" name="SolrcasAE.xml">
>             <descriptor>
>                 <import
> location="../../../Apache_UIMA_Sandbox/Solrcas/desc/SolrcasAE.xml"/>
>             </descriptor>
>             <deploymentParameters/>
>             <errorHandling>
>                 <errorRateThreshold action="terminate" value="0/1000"/>
>                 <maxConsecutiveRestarts action="terminate" value="30"/>
>                 <timeout max="100000" default="-1"/>
>             </errorHandling>
>             <checkpoint batch="10000" time="1000ms"/>
>         </casProcessor>
>     </casProcessors>
>     <cpeConfig>
>         <numToProcess>-1</numToProcess>
>         <deployAs>immediate</deployAs>
>         <checkpoint batch="0" time="300000ms"/>
>         <timerImpl/>
>     </cpeConfig>
> </cpeDescription>
>
> I'm at a loss as to where that -1 is coming from or how to debug it
> further.  Any ideas would be greatly appreciated.
>
> Best,
> Dave
>
>

Re: StringIndexOutOfBoundsException using Solrcas

Posted by Jörn Kottmann <ko...@gmail.com>.
On 2/3/11 6:06 PM, David Thibault wrote:
> Therefore, it's obvious why it's saying the string index is out of bounds.
> However, I'm not sure why it's getting those values from my analysis
> engine.  I'm using MetaMapAEApi from the NIH's MetaMap project.

I guess you have an annotation which has invalid spans, with invalid I mean
begin index is after the end index.

Jörn