You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by "Martin Perez (JIRA)" <ji...@apache.org> on 2005/12/01 13:40:30 UTC
[jira] Updated: (JCR-281) textfilters module patch: Support for text extraction for HTML,XML and RTF files
[ http://issues.apache.org/jira/browse/JCR-281?page=all ]
Martin Perez updated JCR-281:
-----------------------------
Attachment: patch.diff
I have updated the HTMLTextFilter class.
Now it uses NekoHTML (http://www.apache.org/~andyc/neko/doc/html/) parser and Xerces. Both are on iblibio so there is no need to manually download any jar.
> textfilters module patch: Support for text extraction for HTML,XML and RTF files
> --------------------------------------------------------------------------------
>
> Key: JCR-281
> URL: http://issues.apache.org/jira/browse/JCR-281
> Project: Jackrabbit
> Type: Improvement
> Components: query
> Reporter: Martin Perez
> Attachments: patch.diff, patch.diff
>
> This patch adds text extraction support form XML, RTF and HTML files.
> The unique dependency is htmlparser library for handling HTML text extraction.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira