You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by Jukka Zitting <ju...@gmail.com> on 2008/04/13 18:20:18 UTC

Tika text extractor (Was: [jira] Commented: (JCR-1530) MsPowerPointTextExtractor does not extract from PPTs with € sign)

Hi,

On Fri, Apr 11, 2008 at 12:20 PM, Marcel Reutegger (JIRA)
<ji...@apache.org> wrote:
>  We might want to provide an adapter, which implements the Jackrabbit
> TextExtractor interface and uses Tika to extract the text. Users then can
> decide if they want to use it and therefore need to use Java 1.5.

I created a sandbox component called jackrabbit-tika that uses the
latest Tika 0.2 snapshot to implement the TextExtractor interface.

BR,

Jukka Zitting