You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by "Dale Lane (JIRA)" <de...@uima.apache.org> on 2016/10/19 12:20:58 UTC

[jira] [Updated] (UIMA-5147) RUTA leaves the contents of STYLE tags in plaintext

     [ https://issues.apache.org/jira/browse/UIMA-5147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dale Lane updated UIMA-5147:
----------------------------
    Priority: Minor  (was: Major)

> RUTA leaves the contents of STYLE tags in plaintext
> ---------------------------------------------------
>
>                 Key: UIMA-5147
>                 URL: https://issues.apache.org/jira/browse/UIMA-5147
>             Project: UIMA
>          Issue Type: Bug
>          Components: Ruta
>            Reporter: Dale Lane
>            Priority: Minor
>
> I'm using RUTA HtmlAnnotator and HtmlConverter to turn an HTML document into the plain text extracted from it, with annotations to represent the markup that were in the original HTML. 
> The contents of <STYLE> tags are showing up in the plaintext view, which isn't helpful. As STYLE isn't part of the document contents, I think it'd be better for this not to be added to plaintext, or at least for there to be an option to allow this to be excluded. 
> (Apologies if I've missed a way to do this using the existing options)
> As an example of a simple recreate, a document like this can be used:
> {code:xml}
> <html><head>
>     <style>
>         /*  */
>         .test {
>             text-align: left;
>         }
>     </style>
> </head><body>Hello world</body></html>
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)