You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/09/11 10:59:00 UTC

[jira] [Commented] (NUTCH-2397) Parser to add paragraph line breaks

    [ https://issues.apache.org/jira/browse/NUTCH-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16161061#comment-16161061 ] 

ASF GitHub Bot commented on NUTCH-2397:
---------------------------------------

sebastian-nagel closed pull request #198: NUTCH-2397: Parser to add paragraph line breaks
URL: https://github.com/apache/nutch/pull/198
 
 
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Parser to add paragraph line breaks
> -----------------------------------
>
>                 Key: NUTCH-2397
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2397
>             Project: Nutch
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 2.3.1, 1.13
>            Reporter: Sebastian Nagel
>            Priority: Minor
>             Fix For: 2.4, 1.14
>
>
> (initially reported with patch/pull-request by Vipul Behl, see [#190|https://github.com/apache/nutch/pull/190])
> The parser (parse-tika and parse-html) could be improved to add line breaks between paragraphs, instead of writing the whole document into a single line.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)