You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by Nabil Shuhaiber <na...@shuhaiber.com> on 2013/01/20 13:54:17 UTC

Office 2007 documents not being indexed in Jackrabbit 2.4.3

I am unable to full text search the contents of any .docx, .pptx, .xlsx
files using the standalone version of 2.4.3.

pdf's and .doc (pre 2007) work fine. Looking at the index through Luke
Lucene shows the content as empty for office 2007 files.

No errors in debug logs at all.

Any ideas? Am I able to turn on some detailed logging for tika extraction?

Thanks
Nabil

Re: AW: Office 2007 documents not being indexed in Jackrabbit 2.4.3

Posted by Neo <n....@gmail.com>.
Thanks Robert.

On the same line, I have observed commons-compress-1.5.jar is required by
Tika parser in case of OOXML types of documents (i.e. office 2007
documents). 

Now, I am able to index & search most of types of documents (office 2007 -
docx, pptx, xlsx , office 2003 - doc, ppt, xls, PDF) using below 2 steps:

(1) Updated repository.xml & added 
      <SearchIndex ...>
      ...
             
      ...
      </SearchIndex>
     Further details can be found at
https://issues.apache.org/jira/browse/JCR-3287
(2) Added commons-compress-1.5.jar classpath while running
jackrabbit-standalone-2.6.2.jar



--
View this message in context: http://jackrabbit.510166.n4.nabble.com/Office-2007-documents-not-being-indexed-in-Jackrabbit-2-4-3-tp4657380p4658815.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.

AW: Office 2007 documents not being indexed in Jackrabbit 2.4.3

Posted by "Seidel. Robert" <Ro...@aeb.de>.
Hi,

Apache POI is used by tika to parse such documents. Maybe you need to add/update the dependency/library.

Regards, Robert

-----Ursprüngliche Nachricht-----
Von: Neo [mailto:n.the.xtreme@gmail.com]
Gesendet: Freitag, 31. Mai 2013 15:08
An: users@jackrabbit.apache.org
Betreff: Re: Office 2007 documents not being indexed in Jackrabbit 2.4.3

Did you get any solution?

I am facing the similar problem & unable to index/search docx (OOXML
formats) with Jackrabbit 2.6.2 (while using as standalone server).



--
View this message in context: http://jackrabbit.510166.n4.nabble.com/Office-2007-documents-not-being-indexed-in-Jackrabbit-2-4-3-tp4657380p4658800.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.
________________________________

AEB treffen Sie im Juni auf diesen Veranstaltungen:
transport logistic | 4.-7. Juni 2013 | München
EXCHAiNGE | 18.-19. Juni 2013 | Frankfurt am Main
Weitere Informationen und Terminreservierung unter: www.aeb.de/events<http://logi4.xiti.com/gopc.url?xts=487638&xtor=AD-5-[aeb%20mails]-[link%20in%20mailsignatur]-[intext]-[e-mail-signatur]-[0]-[]&url=http://www.aeb.de/de/events/index.php>

Re: Office 2007 documents not being indexed in Jackrabbit 2.4.3

Posted by Neo <n....@gmail.com>.
Did you get any solution? 

I am facing the similar problem & unable to index/search docx (OOXML
formats) with Jackrabbit 2.6.2 (while using as standalone server). 



--
View this message in context: http://jackrabbit.510166.n4.nabble.com/Office-2007-documents-not-being-indexed-in-Jackrabbit-2-4-3-tp4657380p4658800.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.