You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by "Peter Klügl (JIRA)" <de...@uima.apache.org> on 2013/12/20 13:52:14 UTC

[jira] [Assigned] (UIMA-3512) Add additional engine parameter for Ruta HtmlConverter to configure linebreak replacement.

     [ https://issues.apache.org/jira/browse/UIMA-3512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Peter Klügl reassigned UIMA-3512:
---------------------------------

    Assignee: Peter Klügl

> Add additional engine parameter for Ruta HtmlConverter to configure linebreak replacement.
> ------------------------------------------------------------------------------------------
>
>                 Key: UIMA-3512
>                 URL: https://issues.apache.org/jira/browse/UIMA-3512
>             Project: UIMA
>          Issue Type: Improvement
>          Components: ruta
>    Affects Versions: 2.1.1ruta
>            Reporter: Philip-Daniel Beck
>            Assignee: Peter Klügl
>             Fix For: 2.1.1ruta
>
>         Attachments: linebreakReplacementEngineParameter.core_patch, linebreakReplacementEngineParameter.docbook_patch
>
>
> When converting an HTML file to plain text with HtmlConverter engine in Ruta, there exists an engine parameter "replaceLinebreaks" of type boolean to decide if text linebreaks should be replaced or not. If set to true, all linebreaks are kept in the document. If set to false, all linebreaks are deleted. Therefore, the last word of a line and the first word of the next line are put together without whitespace in between. It would often be better if a linebreak is replaced by a whitespace. To configure this, another engine parameter that defines the String, the linebreak is replaced with, would be useful.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)