You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Eddie Epstein <ea...@gmail.com> on 2009/12/10 22:47:22 UTC
Re: Problems with "deserializeCasFromXmi" after using C++ AS
Annotator
Hi Christoph,
I could not reproduce the problem, with 2.2.2 or the latest 2.3.0
release candidate code.
My scenario was:
1. add the following feature to type David:
<featureDescription>
<name>documentURL</name>
<description></description>
<rangeTypeName>uima.cas.String</rangeTypeName>
</featureDescription>
2. add this code to DaveDetector.cpp:
AnnotationFS fsNewExp =
tcas.createAnnotation(david, uiExprBeginPos, uiExprEndPos);
+ Feature documentURL = david.getFeatureByBaseName("documentURL");
+ fsNewExp.setStringValue(documentURL,
"http://www.gesundheitsnachrichten.net/live/navigation/live.php?navigation_id=11&_psmand=1");
indexRep.addFS(fsNewExp);
3. start DaveDetector as a service:
deployCppService descriptors\DaveDetector.xml DaveDetector
4. create a DaveDetector JMS service descriptor pointing at the service:
<customResourceSpecifier xmlns="http://uima.apache.org/resourceSpecifier">
<resourceClassName>org.apache.uima.aae.jms_adapter.JmsAnalysisEngineServiceAdapter</resourceClassName>
<parameters>
<parameter name="brokerURL" value="tcp://localhost:61616"/>
<parameter name="endpoint" value="DaveDetector"/>
</parameters>
</customResourceSpecifier>
5. use cvd to connect to the service, put a Dave and David in the
text, call it, and inspect the results.
Please do provide the modifications to DaveDetector and a details
description of the scenario.
Regards,
Eddie
On Thu, Dec 10, 2009 at 10:43 AM, Christoph Büscher
<ch...@neofonie.de> wrote:
> Hi again,
>
> I was able to reproduce the problem also with the DaveDetector now. I wrote
> a short unit test that I can provide upon request to demonstrate the
> problem.
>
> Christoph
>
> Christoph Büscher schrieb:
>>
>> Hi,
>>
>> I currently encountered a problem with the XMI deserialization of a
>> feature structure after calling a remote C++ AS annotator from a CPE. The
>> szenario is the following:
>>
>> 1. I add a custom feature structure "DocumentData" containing an String
>> Feature (the document URL) to the CAS in my CPE. The exact URL causing the
>> problem is:
>>
>>
>> documentURL="http://www.gesundheitsnachrichten.net/live/navigation/live.php?navigation_id=11&_psmand=1"
>>
>> 2. The CAS get's serialized to XMI before sending it to a remote C++ TAE.
>> I added a breakpoint to UimaSerializer.serializeCasToXmi() and get the
>> following part in the XMI string:
>>
>>
>> documentURL="http://www.gesundheitsnachrichten.net/live/navigation/live.php?navigation_id=11&_psmand=1"
>>
>> So here the "&" character seems to be excaped correctly.
>>
>> 3. When the document comes back, the same feature in the XMI string
>> received in UimaSerializer.deserializeCasFromXmi() reads:
>>
>>
>> documentURL="http://www.gesundheitsnachrichten.net/live/navigation/live.php?navigation_id=11&_psmand=1"
>>
>> an now the SAXParser throws the following exception:
>>
>> org.xml.sax.SAXParseException: The reference to entity "_psmand" must end
>> with the ';' delimiter.
>> at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
>> at
>> org.apache.uima.aae.UimaSerializer.deserializeCasFromXmi(UimaSerializer.java:170)
>> at ...
>>
>> because the "&" comes back unescaped. I'm sure the C++ annotator in
>> question doesn't change the feature in question and it also correctly adds
>> its own annotations. I suspect there's something wrong in
>> deserializing/serializing the CAS from XMI and back on the C++ side of
>> things.
>> Do you have any idea what might cause this problem or any suggestion where
>> I can start to further narrow down the problem?
>>
>> The remote C++ AE is running with version "uimacpp-2.2.2-incubating".
>>
>>
>>
>
>
> --
> --------------------------------
> Christoph Büscher
> Softwareentwicklung
>
> neofonie
> Technologieentwicklung und
> Informationsmanagement GmbH
> Robert-Koch-Platz 4
> 10115 Berlin
> fon: +49.30 24627 522
> fax: +49.30 24627 120
> http://www.neofonie.de
>
> Handelsregister
> Berlin-Charlottenburg: HRB 67460
>
> Geschäftsführung
> Helmut Hoffer von Ankershoffen
> (Sprecher der Geschaeftsfuehrung)
> Nurhan Yildirim
>
Re: Problems with "deserializeCasFromXmi" after using C++ AS
Annotator
Posted by Eddie Epstein <ea...@gmail.com>.
Christoph,
If you are building uimacpp, a fix for this problem has been
commited and the new code is in
http://svn.apache.org/repos/asf/incubator/uima/uimacpp/branches/uimacpp-2.3.0/src/cas/xmiwriter.cpp
Thanks again for the help,
Eddie
On Mon, Dec 14, 2009 at 10:27 AM, Christoph Büscher
<ch...@neofonie.de> wrote:
> Hi Eddie,
>
> I just checked the same test using the Java RoomNumberAnnotator from the
> UIMA AS examples. This specific problem doesn't seem to happen there. Hope
> this helps.
> Christoph
>
> Eddie Epstein schrieb:
>>
>> On Mon, Dec 14, 2009 at 9:29 AM, Christoph Büscher
>> Wonder if this is a problem for remote Java annotators too :)
>>
>> Thanks for debugging!
>> Eddie
>
Re: Problems with "deserializeCasFromXmi" after using C++ AS Annotator
Posted by Christoph Büscher <ch...@neofonie.de>.
Hi Eddie,
I just checked the same test using the Java RoomNumberAnnotator from the UIMA AS
examples. This specific problem doesn't seem to happen there. Hope this helps.
Christoph
Eddie Epstein schrieb:
> On Mon, Dec 14, 2009 at 9:29 AM, Christoph Büscher
> Wonder if this is a problem for remote Java annotators too :)
>
> Thanks for debugging!
> Eddie
Re: Problems with "deserializeCasFromXmi" after using C++ AS
Annotator
Posted by Eddie Epstein <ea...@gmail.com>.
On Mon, Dec 14, 2009 at 9:29 AM, Christoph Büscher
<ch...@neofonie.de> wrote:
> Hi,
>
> I did some further testing and the problem seems to happen when the FS is
> not
> declared in the remote C++ TAE but declared and set in an AE in the (lokal)
> main
> application (in out case a CPE running various Java AEs).
Ah, must have to do with C++ service handing of "out-of-type-system" data.
> From my UIMA understanding so far a remote AS service shouldn't have to
> declare
> or import all types and type systems by potential AE clients connecting to
> it,
> or am I wrong in this regard?
> I attached the descriptors used in the Junit test above. Hope this helps. I
> will
> continue to try to reproduce the problem using CVD.
Your understanding is correct. We should have enough to go on for now.
Wonder if this is a problem for remote Java annotators too :)
Thanks for debugging!
Eddie
Re: Problems with "deserializeCasFromXmi" after using C++ AS Annotator
Posted by Christoph Büscher <ch...@neofonie.de>.
Hi,
I did some further testing and the problem seems to happen when the FS is not
declared in the remote C++ TAE but declared and set in an AE in the (lokal) main
application (in out case a CPE running various Java AEs).
In my unit test I create a "DummyAE" descriptor basically doing nothing but
declaring a "DocumentData" type with the URL Feature. I also created a
customResourceSpecifier pointing to the DAVEDETECTORQ on the remote system. My
test looks like this:
public class RemoteCTaeTest {
private AnalysisEngine daveAe;
private AnalysisEngine myAe;
/**
* @throws java.lang.Exception
*/
@Before
public void setUp() throws Exception {
URL daveResource =
RemoteCTaeTest.class.getClassLoader().getResource("DaveResource.xml");
URL myResource =
RemoteCTaeTest.class.getClassLoader().getResource("DummyAE.xml");
ResourceSpecifier res =
UIMAFramework.getXMLParser().parseResourceSpecifier(
new XMLInputSource(daveResource));
AnalysisEngineDescription aeDesc =
UIMAFramework.getXMLParser().parseAnalysisEngineDescription(
new XMLInputSource(myResource));
this.daveAe = UIMAFramework.produceAnalysisEngine(res);
this.myAe = UIMAFramework.produceAnalysisEngine(aeDesc);
}
/**
* @throws Exception
*/
@Test
public void testSendCAS() throws Exception {
CAS cas = CasCreationUtils.createCas(Arrays.asList(new
AnalysisEngineMetaData[] {
this.daveAe.getAnalysisEngineMetaData(),
this.myAe.getAnalysisEngineMetaData() }));
JCas cas2 = cas.getJCas();
DocumentData metadata = new DocumentData(cas2);
metadata.setDocumentURL("http://www.gesundheitsnachrichten.net/live/navigation/live.php?navigation_id=11&_psmand=1");
String text = "This is a Dave Test.";
cas2.setDocumentText(text);
cas2.addFsToIndexes(metadata);
this.daveAe.process(cas2);
System.out.print(cas2.getJFSIndexRepository().getAllIndexedFS(DocumentData.type).next());
}
}
The test fails when I used the DaveDetector descriptor delivered with
"2.2.2-incubating". When I add DocumentData type to the DaveDetector descriptor,
the test succeeds and the URL is returned correctly by the remote AE:
(../examples/descriptors/DaveDetector.xml)
<typeSystemDescription>
<types>
<typeDescription>
<name>org.apache.uima.examples.David</name>
<description></description>
<supertypeName>uima.tcas.Annotation</supertypeName>
<features>
</features>
</typeDescription>
<typeDescription>
<name>de.neofonie.DocumentData</name>
<description>Metadata for a document</description>
<supertypeName>uima.cas.TOP</supertypeName>
<features>
<featureDescription>
<name>documentURL</name>
<description>The original URL of the document</description>
<rangeTypeName>uima.cas.String</rangeTypeName>
</featureDescription>
</features>
</typeDescription>
</types>
</typeSystemDescription>
I will attach both descriptors used in the test. The implementation of the
DummyAE is completely empty, since it is not called in the test.
From my UIMA understanding so far a remote AS service shouldn't have to declare
or import all types and type systems by potential AE clients connecting to it,
or am I wrong in this regard?
I attached the descriptors used in the Junit test above. Hope this helps. I will
continue to try to reproduce the problem using CVD.
Christoph
Eddie Epstein schrieb:
> Hi,
>
> Well, I tried another, simpler scenario based on your description:
>
> Just add the following to DaveDetector.xml:
> <typeDescription>
> <name>uima.tcas.Chris</name>
> <description></description>
> <supertypeName>uima.tcas.Annotation</supertypeName>
> <features>
> <featureDescription>
> <name>documentURL</name>
> <description></description>
> <rangeTypeName>uima.cas.String</rangeTypeName>
> </featureDescription>
> </features>
> </typeDescription>
>
> Create the following CasXmi file:
> <?xml version="1.0" encoding="UTF-8"?>
> <xmi:XMI xmlns:cas="http:///uima/cas.ecore"
> xmlns:tcas="http:///uima/tcas.ecore"
> xmlns:xmi="http://www.omg.org/XMI" xmi:version="2.0">
> <cas:NULL xmi:id="0"/>
> <cas:Sofa xmi:id="1" sofaNum="1" sofaID="_InitialView"
> mimeType="text" sofaString="This is a text document with Dave for
> analysis"/>
> <tcas:DocumentAnnotation xmi:id="8" sofa="1" begin="0" end="46" language=""/>
> <tcas:Chris xmi:id="20" sofa="1" begin="0" end="1"
> documentURL="http://www.gesundheitsnachrichten.net/live/navigation/live.php?navigation_id=11&_psmand=1"/>
> <cas:View sofa="1" members="8 20"/>
> </xmi:XMI>
>
> Launch the unaltered DaveDetector as a service, have CVD connect to it
> via JMS service descriptor, use File->Read Xmi Cas File to load the test Cas,
> use Run->Run DaveDetector On CAS to call the remote service, and, finally
> expand the annotation index to see the results. No problems.
>
> This was using something close to uima-2.3.0.
>
> Eddie
>
>
>
> On Fri, Dec 11, 2009 at 9:04 AM, Christoph Buescher
> <ch...@gmx.de> wrote:
>> Hi Eddie,
>>
>> unfortunately I'm out of the office today so I can only send you the unit
>> test reproducing the problem on monday. But our scenario is more or less the
>> following:
>>
>> - A CPE running several Java AEs first, then send the CAS to a remote AS
>> Service which is a C++ AE
>> - the collection reader adds a FS with document metadata (including the
>> String Feature "URL") to the CAS. This FS directly extends the Top-Type, not
>> Annotation.
>> - I used the unaltered DaveDetector to replace our own C++ AE to reproduce
>> the problem
>>
>> For the unit test I wrote a "No-OP" Java AE which uses a Typesystem only
>> including this "DocumentData" FS with only this one String Feature. I used a
>> Custom Resource Specifier like in the AS Documentation to reference the
>> DaveDetector on a remote machine. I then create a CAS using the
>> CasCreationUtil which in turn uses the "No-OP" AE descriptor and the
>> DaveDetector-Resource Specifier. I then add the problematic Feature in
>> question and call "process" on the remote Dave-AE. Then the exception I
>> mentioned in my earlier mail happens.
>>
>> I will send you the test code on monday and also try to use CVD to reproduce
>> the problem.
>>
>> Thanks,
>> Christoph
Re: Problems with "deserializeCasFromXmi" after using C++ AS
Annotator
Posted by Eddie Epstein <ea...@gmail.com>.
Hi,
Well, I tried another, simpler scenario based on your description:
Just add the following to DaveDetector.xml:
<typeDescription>
<name>uima.tcas.Chris</name>
<description></description>
<supertypeName>uima.tcas.Annotation</supertypeName>
<features>
<featureDescription>
<name>documentURL</name>
<description></description>
<rangeTypeName>uima.cas.String</rangeTypeName>
</featureDescription>
</features>
</typeDescription>
Create the following CasXmi file:
<?xml version="1.0" encoding="UTF-8"?>
<xmi:XMI xmlns:cas="http:///uima/cas.ecore"
xmlns:tcas="http:///uima/tcas.ecore"
xmlns:xmi="http://www.omg.org/XMI" xmi:version="2.0">
<cas:NULL xmi:id="0"/>
<cas:Sofa xmi:id="1" sofaNum="1" sofaID="_InitialView"
mimeType="text" sofaString="This is a text document with Dave for
analysis"/>
<tcas:DocumentAnnotation xmi:id="8" sofa="1" begin="0" end="46" language=""/>
<tcas:Chris xmi:id="20" sofa="1" begin="0" end="1"
documentURL="http://www.gesundheitsnachrichten.net/live/navigation/live.php?navigation_id=11&_psmand=1"/>
<cas:View sofa="1" members="8 20"/>
</xmi:XMI>
Launch the unaltered DaveDetector as a service, have CVD connect to it
via JMS service descriptor, use File->Read Xmi Cas File to load the test Cas,
use Run->Run DaveDetector On CAS to call the remote service, and, finally
expand the annotation index to see the results. No problems.
This was using something close to uima-2.3.0.
Eddie
On Fri, Dec 11, 2009 at 9:04 AM, Christoph Buescher
<ch...@gmx.de> wrote:
> Hi Eddie,
>
> unfortunately I'm out of the office today so I can only send you the unit
> test reproducing the problem on monday. But our scenario is more or less the
> following:
>
> - A CPE running several Java AEs first, then send the CAS to a remote AS
> Service which is a C++ AE
> - the collection reader adds a FS with document metadata (including the
> String Feature "URL") to the CAS. This FS directly extends the Top-Type, not
> Annotation.
> - I used the unaltered DaveDetector to replace our own C++ AE to reproduce
> the problem
>
> For the unit test I wrote a "No-OP" Java AE which uses a Typesystem only
> including this "DocumentData" FS with only this one String Feature. I used a
> Custom Resource Specifier like in the AS Documentation to reference the
> DaveDetector on a remote machine. I then create a CAS using the
> CasCreationUtil which in turn uses the "No-OP" AE descriptor and the
> DaveDetector-Resource Specifier. I then add the problematic Feature in
> question and call "process" on the remote Dave-AE. Then the exception I
> mentioned in my earlier mail happens.
>
> I will send you the test code on monday and also try to use CVD to reproduce
> the problem.
>
> Thanks,
> Christoph
Re: Problems with "deserializeCasFromXmi" after using C++ AS Annotator
Posted by Christoph Buescher <ch...@gmx.de>.
Hi Eddie,
unfortunately I'm out of the office today so I can only send you the unit test
reproducing the problem on monday. But our scenario is more or less the following:
- A CPE running several Java AEs first, then send the CAS to a remote AS Service
which is a C++ AE
- the collection reader adds a FS with document metadata (including the String
Feature "URL") to the CAS. This FS directly extends the Top-Type, not Annotation.
- I used the unaltered DaveDetector to replace our own C++ AE to reproduce the
problem
For the unit test I wrote a "No-OP" Java AE which uses a Typesystem only
including this "DocumentData" FS with only this one String Feature. I used a
Custom Resource Specifier like in the AS Documentation to reference the
DaveDetector on a remote machine. I then create a CAS using the CasCreationUtil
which in turn uses the "No-OP" AE descriptor and the DaveDetector-Resource
Specifier. I then add the problematic Feature in question and call "process" on
the remote Dave-AE. Then the exception I mentioned in my earlier mail happens.
I will send you the test code on monday and also try to use CVD to reproduce the
problem.
Thanks,
Christoph
Eddie Epstein schrieb:
> Hi Christoph,
>
> I could not reproduce the problem, with 2.2.2 or the latest 2.3.0
> release candidate code.
>
> My scenario was:
> 1. add the following feature to type David:
> <featureDescription>
> <name>documentURL</name>
> <description></description>
> <rangeTypeName>uima.cas.String</rangeTypeName>
> </featureDescription>
>
> 2. add this code to DaveDetector.cpp:
> AnnotationFS fsNewExp =
> tcas.createAnnotation(david, uiExprBeginPos, uiExprEndPos);
> + Feature documentURL = david.getFeatureByBaseName("documentURL");
> + fsNewExp.setStringValue(documentURL,
> "http://www.gesundheitsnachrichten.net/live/navigation/live.php?navigation_id=11&_psmand=1");
> indexRep.addFS(fsNewExp);
>
> 3. start DaveDetector as a service:
> deployCppService descriptors\DaveDetector.xml DaveDetector
>
> 4. create a DaveDetector JMS service descriptor pointing at the service:
> <customResourceSpecifier xmlns="http://uima.apache.org/resourceSpecifier">
> <resourceClassName>org.apache.uima.aae.jms_adapter.JmsAnalysisEngineServiceAdapter</resourceClassName>
> <parameters>
> <parameter name="brokerURL" value="tcp://localhost:61616"/>
> <parameter name="endpoint" value="DaveDetector"/>
> </parameters>
> </customResourceSpecifier>
>
> 5. use cvd to connect to the service, put a Dave and David in the
> text, call it, and inspect the results.
>
> Please do provide the modifications to DaveDetector and a details
> description of the scenario.
>
> Regards,
> Eddie
>