You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by "Dale Lane (JIRA)" <de...@uima.apache.org> on 2016/10/19 12:20:58 UTC
[jira] [Updated] (UIMA-5147) RUTA leaves the contents of STYLE tags
in plaintext
[ https://issues.apache.org/jira/browse/UIMA-5147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dale Lane updated UIMA-5147:
----------------------------
Priority: Minor (was: Major)
> RUTA leaves the contents of STYLE tags in plaintext
> ---------------------------------------------------
>
> Key: UIMA-5147
> URL: https://issues.apache.org/jira/browse/UIMA-5147
> Project: UIMA
> Issue Type: Bug
> Components: Ruta
> Reporter: Dale Lane
> Priority: Minor
>
> I'm using RUTA HtmlAnnotator and HtmlConverter to turn an HTML document into the plain text extracted from it, with annotations to represent the markup that were in the original HTML.
> The contents of <STYLE> tags are showing up in the plaintext view, which isn't helpful. As STYLE isn't part of the document contents, I think it'd be better for this not to be added to plaintext, or at least for there to be an option to allow this to be excluded.
> (Apologies if I've missed a way to do this using the existing options)
> As an example of a simple recreate, a document like this can be used:
> {code:xml}
> <html><head>
> <style>
> /* */
> .test {
> text-align: left;
> }
> </style>
> </head><body>Hello world</body></html>
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)