You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Matt Parker (JIRA)" <ji...@apache.org> on 2011/06/11 03:43:59 UTC

[jira] [Created] (TIKA-673) BoilerPipe Integration

BoilerPipe Integration
----------------------

                 Key: TIKA-673
                 URL: https://issues.apache.org/jira/browse/TIKA-673
             Project: Tika
          Issue Type: Improvement
          Components: parser
            Reporter: Matt Parker


Found a library that might be worth considering for integration into your package. It provides one of the best open source text extraction algorithms to find the main text within an HTML page.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Closed] (TIKA-673) BoilerPipe Integration

Posted by "Matt Parker (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matt Parker closed TIKA-673.
----------------------------

    Resolution: Duplicate

I see this is already added.

> BoilerPipe Integration
> ----------------------
>
>                 Key: TIKA-673
>                 URL: https://issues.apache.org/jira/browse/TIKA-673
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>            Reporter: Matt Parker
>
> Found a library that might be worth considering for integration into your package. It provides one of the best open source text extraction algorithms to find the main text within an HTML page.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TIKA-673) BoilerPipe Integration

Posted by "Matt Parker (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047817#comment-13047817 ] 

Matt Parker commented on TIKA-673:
----------------------------------

http://code.google.com/p/boilerpipe/

> BoilerPipe Integration
> ----------------------
>
>                 Key: TIKA-673
>                 URL: https://issues.apache.org/jira/browse/TIKA-673
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>            Reporter: Matt Parker
>
> Found a library that might be worth considering for integration into your package. It provides one of the best open source text extraction algorithms to find the main text within an HTML page.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira