You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by Jukka Zitting <ju...@gmail.com> on 2009/02/14 13:22:12 UTC

Re: Is jackrabbit textFilterClasses able to handle office 2007 documents.

Hi,

On Sat, Feb 14, 2009 at 12:59 PM, Akil Ali <Ak...@cognizant.com> wrote:
> i can see that there are numbers of filters available in the latest version.
> But will it be able to extract the contents of office 2007 documents. is
> anyone tested with indexing contents of office 2007 documents.

See JCR-1887 [1] for a patch that adds support for indexing Office
2007 documents.

Alternatively, the latest trunk of Apache Tika [2] also supports
Office 2007, and you can the jackrabbit-tika sandbox component [3]
allows you to set up Tika as a text extractor in Jackrabbit.

We will most likely have Office 2007 support built in when Jackrabbit
1.6 is released.

[1] https://issues.apache.org/jira/browse/JCR-1887
[2] http://lucene.apache.org/tika/
[3] http://svn.apache.org/repos/asf/jackrabbit/sandbox/jackrabbit-tika/

BR,

Jukka Zitting