You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Matt Parker (JIRA)" <ji...@apache.org> on 2011/06/11 03:43:59 UTC
[jira] [Created] (TIKA-673) BoilerPipe Integration
BoilerPipe Integration
----------------------
Key: TIKA-673
URL: https://issues.apache.org/jira/browse/TIKA-673
Project: Tika
Issue Type: Improvement
Components: parser
Reporter: Matt Parker
Found a library that might be worth considering for integration into your package. It provides one of the best open source text extraction algorithms to find the main text within an HTML page.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Closed] (TIKA-673) BoilerPipe Integration
Posted by "Matt Parker (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matt Parker closed TIKA-673.
----------------------------
Resolution: Duplicate
I see this is already added.
> BoilerPipe Integration
> ----------------------
>
> Key: TIKA-673
> URL: https://issues.apache.org/jira/browse/TIKA-673
> Project: Tika
> Issue Type: Improvement
> Components: parser
> Reporter: Matt Parker
>
> Found a library that might be worth considering for integration into your package. It provides one of the best open source text extraction algorithms to find the main text within an HTML page.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TIKA-673) BoilerPipe Integration
Posted by "Matt Parker (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047817#comment-13047817 ]
Matt Parker commented on TIKA-673:
----------------------------------
http://code.google.com/p/boilerpipe/
> BoilerPipe Integration
> ----------------------
>
> Key: TIKA-673
> URL: https://issues.apache.org/jira/browse/TIKA-673
> Project: Tika
> Issue Type: Improvement
> Components: parser
> Reporter: Matt Parker
>
> Found a library that might be worth considering for integration into your package. It provides one of the best open source text extraction algorithms to find the main text within an HTML page.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira