You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by "Peter Klügl (JIRA)" <de...@uima.apache.org> on 2018/11/30 13:00:00 UTC

[jira] [Commented] (UIMA-5757) Unable to extract features when annotation ends with HTML tag

    [ https://issues.apache.org/jira/browse/UIMA-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16704697#comment-16704697 ] 

Peter Klügl commented on UIMA-5757:
-----------------------------------

Does the comment resolve your problem?

> Unable to extract features when annotation ends with HTML tag
> -------------------------------------------------------------
>
>                 Key: UIMA-5757
>                 URL: https://issues.apache.org/jira/browse/UIMA-5757
>             Project: UIMA
>          Issue Type: Bug
>          Components: Ruta
>    Affects Versions: 2.6.1ruta
>         Environment: RUTA 2.6.1, Windows 10, Eclipse Mars, JDK 1.8.0_144
>            Reporter: Miguel Alvarez
>            Assignee: Peter Klügl
>            Priority: Minor
>
> If there is an annotation that covers the whole sofa string, and the sofa string ends with an HTML tag, it seems like RUTA isn't able to extract the features for that annotation. For instance, lets suppose this document (represented as XMI):
>  
> {code:java}
> // XMI document
> <?xml version="1.0" encoding="UTF-8"?>
> <xmi:XMI xmlns:xmi="http://www.omg.org/XMI" xmlns:cas="http:///uima/cas.ecore" xmlns:tcas="http:///uima/tcas.ecore" xmlns:types="http:///com/acme/uima/types.ecore" xmi:version="2.0">
> <cas:NULL xmi:id="0"/>
> <tcas:DocumentAnnotation xmi:id="8" sofa="1" begin="0" end="12" language="es"/>
> <types:MyDocument xmi:id="14" sofa="1" begin="0" end="12" documentId="test_docsize_39d5541c-5e7f-391c-95af-c82ce6306644"/>
> <cas:Sofa xmi:id="1" sofaNum="1" sofaID="_InitialView" mimeType="text" sofaString="ABCDEFGHIJ&lt;p&gt;"/>
> <cas:View sofa="1" members="8 14"/>
> </xmi:XMI>
> {code}
> And the following RUTA script:
>  
>  
> {code:java}
> // RUTA script
> STRING documentId = "Unknown";
> com.acme.uima.types.MyDocument{-> GETFEATURE("documentId", documentId)};
> LOG("Starting to process document: " + documentId);
> {code}
> The LOG action will output Unknown. But as soon as the string doesn't end with an HTML tag, it works fine.
>  
> Any ideas what could be going on?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)