You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Eddie Epstein <ea...@gmail.com> on 2009/12/10 22:47:22 UTC

Re: Problems with "deserializeCasFromXmi" after using C++ AS Annotator

Hi Christoph,

I could not reproduce the problem, with 2.2.2 or the latest 2.3.0
release candidate code.

My scenario was:
1. add the following feature to type David:
  	<featureDescription>
  	  <name>documentURL</name>
  	  <description></description>
  	  <rangeTypeName>uima.cas.String</rangeTypeName>
  	</featureDescription>

2. add this code to DaveDetector.cpp:
      AnnotationFS fsNewExp =
        tcas.createAnnotation(david, uiExprBeginPos, uiExprEndPos);
+      Feature documentURL  = david.getFeatureByBaseName("documentURL");
+      fsNewExp.setStringValue(documentURL,
"http://www.gesundheitsnachrichten.net/live/navigation/live.php?navigation_id=11&_psmand=1");
      indexRep.addFS(fsNewExp);

3. start DaveDetector as a service:
    deployCppService descriptors\DaveDetector.xml DaveDetector

4. create a DaveDetector JMS service descriptor pointing at the service:
<customResourceSpecifier xmlns="http://uima.apache.org/resourceSpecifier">
   <resourceClassName>org.apache.uima.aae.jms_adapter.JmsAnalysisEngineServiceAdapter</resourceClassName>
   <parameters>
     <parameter name="brokerURL" value="tcp://localhost:61616"/>
     <parameter name="endpoint" value="DaveDetector"/>
   </parameters>
</customResourceSpecifier>

5. use cvd to connect to the service, put a Dave and David in the
text, call it, and inspect the results.

Please do provide the modifications to DaveDetector and a details
description of the scenario.

Regards,
Eddie

On Thu, Dec 10, 2009 at 10:43 AM, Christoph Büscher
<ch...@neofonie.de> wrote:
> Hi again,
>
> I was able to reproduce the problem also with the DaveDetector now. I wrote
> a short unit test that I can provide upon request to demonstrate the
> problem.
>
> Christoph
>
> Christoph Büscher schrieb:
>>
>> Hi,
>>
>> I currently encountered a problem with the XMI deserialization of a
>> feature structure after calling a remote C++ AS annotator from a CPE. The
>> szenario is the following:
>>
>> 1. I add a custom feature structure "DocumentData" containing an String
>> Feature (the document URL) to the CAS in my CPE. The exact URL causing the
>> problem is:
>>
>>
>> documentURL="http://www.gesundheitsnachrichten.net/live/navigation/live.php?navigation_id=11&_psmand=1"
>>
>> 2. The CAS get's serialized to XMI before sending it to a remote C++ TAE.
>> I added a breakpoint to UimaSerializer.serializeCasToXmi() and get the
>> following part in the XMI string:
>>
>>
>> documentURL="http://www.gesundheitsnachrichten.net/live/navigation/live.php?navigation_id=11&amp;_psmand=1"
>>
>> So here the "&" character seems to be excaped correctly.
>>
>> 3. When the document comes back, the same feature in the XMI string
>> received in UimaSerializer.deserializeCasFromXmi() reads:
>>
>>
>> documentURL="http://www.gesundheitsnachrichten.net/live/navigation/live.php?navigation_id=11&_psmand=1"
>>
>> an now the SAXParser throws the following exception:
>>
>> org.xml.sax.SAXParseException: The reference to entity "_psmand" must end
>> with the ';' delimiter.
>>    at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
>>    at
>> org.apache.uima.aae.UimaSerializer.deserializeCasFromXmi(UimaSerializer.java:170)
>>    at ...
>>
>> because the "&" comes back unescaped. I'm sure the C++ annotator in
>> question doesn't change the feature in question and it also correctly adds
>> its own annotations. I suspect there's something wrong in
>> deserializing/serializing the CAS from XMI and back on the C++ side of
>> things.
>> Do you have any idea what might cause this problem or any suggestion where
>> I can start to further narrow down the problem?
>>
>> The remote C++ AE is running with version "uimacpp-2.2.2-incubating".
>>
>>
>>
>
>
> --
> --------------------------------
> Christoph Büscher
> Softwareentwicklung
>
> neofonie
> Technologieentwicklung und
> Informationsmanagement GmbH
> Robert-Koch-Platz 4
> 10115 Berlin
> fon: +49.30 24627 522
> fax: +49.30 24627 120
> http://www.neofonie.de
>
> Handelsregister
> Berlin-Charlottenburg: HRB 67460
>
> Geschäftsführung
> Helmut Hoffer von Ankershoffen
> (Sprecher der Geschaeftsfuehrung)
> Nurhan Yildirim
>

Re: Problems with "deserializeCasFromXmi" after using C++ AS Annotator

Posted by Eddie Epstein <ea...@gmail.com>.
Christoph,

If you are building uimacpp, a fix for this problem has been
commited and the new code is in

http://svn.apache.org/repos/asf/incubator/uima/uimacpp/branches/uimacpp-2.3.0/src/cas/xmiwriter.cpp

Thanks again for the help,
Eddie

On Mon, Dec 14, 2009 at 10:27 AM, Christoph Büscher
<ch...@neofonie.de> wrote:
> Hi Eddie,
>
> I just checked the same test using the Java RoomNumberAnnotator from the
> UIMA AS examples. This specific problem doesn't seem to happen there. Hope
> this helps.
> Christoph
>
> Eddie Epstein schrieb:
>>
>> On Mon, Dec 14, 2009 at 9:29 AM, Christoph Büscher
>> Wonder if this is a problem for remote Java annotators too :)
>>
>> Thanks for debugging!
>> Eddie
>

Re: Problems with "deserializeCasFromXmi" after using C++ AS Annotator

Posted by Christoph Büscher <ch...@neofonie.de>.
Hi Eddie,

I just checked the same test using the Java RoomNumberAnnotator from the UIMA AS 
examples. This specific problem doesn't seem to happen there. Hope this helps.
Christoph

Eddie Epstein schrieb:
> On Mon, Dec 14, 2009 at 9:29 AM, Christoph Büscher
> Wonder if this is a problem for remote Java annotators too :)
> 
> Thanks for debugging!
> Eddie

Re: Problems with "deserializeCasFromXmi" after using C++ AS Annotator

Posted by Eddie Epstein <ea...@gmail.com>.
On Mon, Dec 14, 2009 at 9:29 AM, Christoph Büscher
<ch...@neofonie.de> wrote:
> Hi,
>
> I did some further testing and the problem seems to happen when the FS is
> not
> declared in the remote C++ TAE but declared and set in an AE in the (lokal)
> main
>  application (in out case a CPE running various Java AEs).

Ah, must have to do with C++ service handing of "out-of-type-system" data.

>  From my UIMA understanding so far a remote AS service shouldn't have to
> declare
> or import all types and type systems by potential AE clients connecting to
> it,
> or am I wrong in this regard?
> I attached the descriptors used in the Junit test above. Hope this helps. I
> will
> continue to try to reproduce the problem using CVD.

Your understanding is correct. We should have enough to go on for now.
Wonder if this is a problem for remote Java annotators too :)

Thanks for debugging!
Eddie

Re: Problems with "deserializeCasFromXmi" after using C++ AS Annotator

Posted by Christoph Büscher <ch...@neofonie.de>.
Hi,

I did some further testing and the problem seems to happen when the FS is not
declared in the remote C++ TAE but declared and set in an AE in the (lokal) main
   application (in out case a CPE running various Java AEs).
In my unit test I create a "DummyAE" descriptor basically doing nothing but
declaring a "DocumentData" type with the URL Feature. I also created a
customResourceSpecifier pointing to the DAVEDETECTORQ on the remote system. My
test looks like this:


public class RemoteCTaeTest {

      private AnalysisEngine daveAe;
      private AnalysisEngine myAe;

      /**
       * @throws java.lang.Exception
       */
      @Before
      public void setUp() throws Exception {
          URL daveResource =
RemoteCTaeTest.class.getClassLoader().getResource("DaveResource.xml");
          URL myResource =
RemoteCTaeTest.class.getClassLoader().getResource("DummyAE.xml");
          ResourceSpecifier res =
UIMAFramework.getXMLParser().parseResourceSpecifier(
                  new XMLInputSource(daveResource));
          AnalysisEngineDescription aeDesc =
UIMAFramework.getXMLParser().parseAnalysisEngineDescription(
                  new XMLInputSource(myResource));
          this.daveAe = UIMAFramework.produceAnalysisEngine(res);
          this.myAe = UIMAFramework.produceAnalysisEngine(aeDesc);
      }

      /**
       * @throws Exception
       */
      @Test
      public void testSendCAS() throws Exception {
          CAS cas = CasCreationUtils.createCas(Arrays.asList(new
AnalysisEngineMetaData[] {
                  this.daveAe.getAnalysisEngineMetaData(),
this.myAe.getAnalysisEngineMetaData() }));
          JCas cas2 = cas.getJCas();
          DocumentData metadata = new DocumentData(cas2);

metadata.setDocumentURL("http://www.gesundheitsnachrichten.net/live/navigation/live.php?navigation_id=11&_psmand=1");
          String text = "This is a  Dave Test.";
          cas2.setDocumentText(text);
          cas2.addFsToIndexes(metadata);
          this.daveAe.process(cas2);
System.out.print(cas2.getJFSIndexRepository().getAllIndexedFS(DocumentData.type).next());
      }
}

The test fails when I used the DaveDetector descriptor delivered with
"2.2.2-incubating". When I add DocumentData type to the DaveDetector descriptor,
the test succeeds and the URL is returned correctly by the remote AE:

(../examples/descriptors/DaveDetector.xml)
<typeSystemDescription>
    <types>
      <typeDescription>
        <name>org.apache.uima.examples.David</name>
        <description></description>
        <supertypeName>uima.tcas.Annotation</supertypeName>
        <features>
        </features>
      </typeDescription>
      <typeDescription>
        <name>de.neofonie.DocumentData</name>
        <description>Metadata for a document</description>
        <supertypeName>uima.cas.TOP</supertypeName>
        <features>
          <featureDescription>
            <name>documentURL</name>
            <description>The original URL of the document</description>
            <rangeTypeName>uima.cas.String</rangeTypeName>
          </featureDescription>
        </features>
      </typeDescription>
    </types>
</typeSystemDescription>

I will attach both descriptors used in the test. The implementation of the
DummyAE is completely empty, since it is not called in the test.
  From my UIMA understanding so far a remote AS service shouldn't have to declare
or import all types and type systems by potential AE clients connecting to it,
or am I wrong in this regard?
I attached the descriptors used in the Junit test above. Hope this helps. I will
continue to try to reproduce the problem using CVD.

Christoph


Eddie Epstein schrieb:
> Hi,
> 
> Well, I tried another, simpler scenario based on your description:
> 
> Just add the following to DaveDetector.xml:
>     <typeDescription>
>       <name>uima.tcas.Chris</name>
>       <description></description>
>       <supertypeName>uima.tcas.Annotation</supertypeName>
>       <features>
>   	<featureDescription>
>   	  <name>documentURL</name>
>   	  <description></description>
>   	  <rangeTypeName>uima.cas.String</rangeTypeName>
>   	</featureDescription>
>       </features>
>     </typeDescription>
> 
> Create the following CasXmi file:
> <?xml version="1.0" encoding="UTF-8"?>
> <xmi:XMI xmlns:cas="http:///uima/cas.ecore"
> xmlns:tcas="http:///uima/tcas.ecore"
> xmlns:xmi="http://www.omg.org/XMI"  xmi:version="2.0">
> <cas:NULL xmi:id="0"/>
>  <cas:Sofa xmi:id="1" sofaNum="1" sofaID="_InitialView"
> mimeType="text" sofaString="This is a text document with Dave for
> analysis"/>
>  <tcas:DocumentAnnotation xmi:id="8" sofa="1" begin="0" end="46" language=""/>
>  <tcas:Chris xmi:id="20" sofa="1" begin="0" end="1"
> documentURL="http://www.gesundheitsnachrichten.net/live/navigation/live.php?navigation_id=11&amp;_psmand=1"/>
> <cas:View sofa="1" members="8 20"/>
> </xmi:XMI>
> 
> Launch the unaltered DaveDetector as a service, have CVD connect to it
> via JMS service descriptor, use File->Read Xmi Cas File to load the test Cas,
> use Run->Run DaveDetector On CAS to call the remote service, and, finally
> expand the annotation index to see the results. No problems.
> 
> This was using something close to uima-2.3.0.
> 
> Eddie
> 
> 
> 
> On Fri, Dec 11, 2009 at 9:04 AM, Christoph Buescher
> <ch...@gmx.de> wrote:
>> Hi Eddie,
>>
>> unfortunately I'm out of the office today so I can only send you the unit
>> test reproducing the problem on monday. But our scenario is more or less the
>> following:
>>
>> - A CPE running several Java AEs first, then send the CAS to a remote AS
>> Service which is a C++ AE
>> - the collection reader adds a FS with document metadata (including the
>> String Feature "URL") to the CAS. This FS directly extends the Top-Type, not
>> Annotation.
>> - I used the unaltered DaveDetector to replace our own C++ AE to reproduce
>> the problem
>>
>> For the unit test I wrote a "No-OP" Java AE which uses a Typesystem only
>> including this "DocumentData" FS with only this one String Feature. I used a
>> Custom Resource Specifier like in the AS Documentation to reference the
>> DaveDetector on a remote machine. I then create a CAS using the
>> CasCreationUtil which in turn uses the "No-OP" AE descriptor and the
>> DaveDetector-Resource Specifier. I then add the problematic Feature in
>> question and call "process" on the remote Dave-AE. Then the exception I
>> mentioned in my earlier mail happens.
>>
>> I will send you the test code on monday and also try to use CVD to reproduce
>> the problem.
>>
>> Thanks,
>> Christoph

Re: Problems with "deserializeCasFromXmi" after using C++ AS Annotator

Posted by Eddie Epstein <ea...@gmail.com>.
Hi,

Well, I tried another, simpler scenario based on your description:

Just add the following to DaveDetector.xml:
    <typeDescription>
      <name>uima.tcas.Chris</name>
      <description></description>
      <supertypeName>uima.tcas.Annotation</supertypeName>
      <features>
  	<featureDescription>
  	  <name>documentURL</name>
  	  <description></description>
  	  <rangeTypeName>uima.cas.String</rangeTypeName>
  	</featureDescription>
      </features>
    </typeDescription>

Create the following CasXmi file:
<?xml version="1.0" encoding="UTF-8"?>
<xmi:XMI xmlns:cas="http:///uima/cas.ecore"
xmlns:tcas="http:///uima/tcas.ecore"
xmlns:xmi="http://www.omg.org/XMI"  xmi:version="2.0">
<cas:NULL xmi:id="0"/>
 <cas:Sofa xmi:id="1" sofaNum="1" sofaID="_InitialView"
mimeType="text" sofaString="This is a text document with Dave for
analysis"/>
 <tcas:DocumentAnnotation xmi:id="8" sofa="1" begin="0" end="46" language=""/>
 <tcas:Chris xmi:id="20" sofa="1" begin="0" end="1"
documentURL="http://www.gesundheitsnachrichten.net/live/navigation/live.php?navigation_id=11&amp;_psmand=1"/>
<cas:View sofa="1" members="8 20"/>
</xmi:XMI>

Launch the unaltered DaveDetector as a service, have CVD connect to it
via JMS service descriptor, use File->Read Xmi Cas File to load the test Cas,
use Run->Run DaveDetector On CAS to call the remote service, and, finally
expand the annotation index to see the results. No problems.

This was using something close to uima-2.3.0.

Eddie



On Fri, Dec 11, 2009 at 9:04 AM, Christoph Buescher
<ch...@gmx.de> wrote:
> Hi Eddie,
>
> unfortunately I'm out of the office today so I can only send you the unit
> test reproducing the problem on monday. But our scenario is more or less the
> following:
>
> - A CPE running several Java AEs first, then send the CAS to a remote AS
> Service which is a C++ AE
> - the collection reader adds a FS with document metadata (including the
> String Feature "URL") to the CAS. This FS directly extends the Top-Type, not
> Annotation.
> - I used the unaltered DaveDetector to replace our own C++ AE to reproduce
> the problem
>
> For the unit test I wrote a "No-OP" Java AE which uses a Typesystem only
> including this "DocumentData" FS with only this one String Feature. I used a
> Custom Resource Specifier like in the AS Documentation to reference the
> DaveDetector on a remote machine. I then create a CAS using the
> CasCreationUtil which in turn uses the "No-OP" AE descriptor and the
> DaveDetector-Resource Specifier. I then add the problematic Feature in
> question and call "process" on the remote Dave-AE. Then the exception I
> mentioned in my earlier mail happens.
>
> I will send you the test code on monday and also try to use CVD to reproduce
> the problem.
>
> Thanks,
> Christoph

Re: Problems with "deserializeCasFromXmi" after using C++ AS Annotator

Posted by Christoph Buescher <ch...@gmx.de>.
Hi Eddie,

unfortunately I'm out of the office today so I can only send you the unit test 
reproducing the problem on monday. But our scenario is more or less the following:

- A CPE running several Java AEs first, then send the CAS to a remote AS Service 
which is a C++ AE
- the collection reader adds a FS with document metadata (including the String 
Feature "URL") to the CAS. This FS directly extends the Top-Type, not Annotation.
- I used the unaltered DaveDetector to replace our own C++ AE to reproduce the 
problem

For the unit test I wrote a "No-OP" Java AE which uses a Typesystem only 
including this "DocumentData" FS with only this one String Feature. I used a 
Custom Resource Specifier like in the AS Documentation to reference the 
DaveDetector on a remote machine. I then create a CAS using the CasCreationUtil 
which in turn uses the "No-OP" AE descriptor and the DaveDetector-Resource 
Specifier. I then add the problematic Feature in question and call "process" on 
the remote Dave-AE. Then the exception I mentioned in my earlier mail happens.

I will send you the test code on monday and also try to use CVD to reproduce the 
problem.

Thanks,
Christoph


Eddie Epstein schrieb:
> Hi Christoph,
> 
> I could not reproduce the problem, with 2.2.2 or the latest 2.3.0
> release candidate code.
> 
> My scenario was:
> 1. add the following feature to type David:
>   	<featureDescription>
>   	  <name>documentURL</name>
>   	  <description></description>
>   	  <rangeTypeName>uima.cas.String</rangeTypeName>
>   	</featureDescription>
> 
> 2. add this code to DaveDetector.cpp:
>       AnnotationFS fsNewExp =
>         tcas.createAnnotation(david, uiExprBeginPos, uiExprEndPos);
> +      Feature documentURL  = david.getFeatureByBaseName("documentURL");
> +      fsNewExp.setStringValue(documentURL,
> "http://www.gesundheitsnachrichten.net/live/navigation/live.php?navigation_id=11&_psmand=1");
>       indexRep.addFS(fsNewExp);
> 
> 3. start DaveDetector as a service:
>     deployCppService descriptors\DaveDetector.xml DaveDetector
> 
> 4. create a DaveDetector JMS service descriptor pointing at the service:
> <customResourceSpecifier xmlns="http://uima.apache.org/resourceSpecifier">
>    <resourceClassName>org.apache.uima.aae.jms_adapter.JmsAnalysisEngineServiceAdapter</resourceClassName>
>    <parameters>
>      <parameter name="brokerURL" value="tcp://localhost:61616"/>
>      <parameter name="endpoint" value="DaveDetector"/>
>    </parameters>
> </customResourceSpecifier>
> 
> 5. use cvd to connect to the service, put a Dave and David in the
> text, call it, and inspect the results.
> 
> Please do provide the modifications to DaveDetector and a details
> description of the scenario.
> 
> Regards,
> Eddie
>