You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jackrabbit.apache.org by Akil Ali <Ak...@cognizant.com> on 2009/02/14 12:59:22 UTC

Is jackrabbit textFilterClasses able to handle office 2007 documents.

i can see that there are numbers of filters available in the latest version.
But will it be able to extract the contents of office 2007 documents. is
anyone tested with indexing contents of office 2007 documents.

Please help me to resolve this as soon as possible.

Thanks in advance.

Akil
-- 
View this message in context: http://www.nabble.com/Is-jackrabbit-textFilterClasses-able-to-handle-office-2007-documents.-tp22011942p22011942.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.

Re: Is jackrabbit textFilterClasses able to handle office 2007 documents.

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On Sat, Feb 14, 2009 at 12:59 PM, Akil Ali <Ak...@cognizant.com> wrote:
> i can see that there are numbers of filters available in the latest version.
> But will it be able to extract the contents of office 2007 documents. is
> anyone tested with indexing contents of office 2007 documents.

See JCR-1887 [1] for a patch that adds support for indexing Office
2007 documents.

Alternatively, the latest trunk of Apache Tika [2] also supports
Office 2007, and you can the jackrabbit-tika sandbox component [3]
allows you to set up Tika as a text extractor in Jackrabbit.

We will most likely have Office 2007 support built in when Jackrabbit
1.6 is released.

[1] https://issues.apache.org/jira/browse/JCR-1887
[2] http://lucene.apache.org/tika/
[3] http://svn.apache.org/repos/asf/jackrabbit/sandbox/jackrabbit-tika/

BR,

Jukka Zitting