You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Nguyen, Vincent (CDC/OD/OADS) (CTR)" <vn...@cdc.gov> on 2012/08/22 21:57:35 UTC

Full Text Indexing for DOCX files

Has anyone been able to index DOCX files?  I get this error message when using office 2007 documents

(Location of error unknown)org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. POI only supports OLE2 Office documents

We're currently using SOLR1.3

Vincent Vu Nguyen



RE: Full Text Indexing for DOCX files

Posted by "Nguyen, Vincent (CDC/OD/OADS) (CTR)" <vn...@cdc.gov>.
Thanks Jack, I'll give that version of SOLR a try.

Vincent Vu Nguyen
Web Applications Developer
Division of Science Quality and Translation
Office of the Associate Director for Science
Centers for Disease Control and Prevention (CDC)
404-498-0384 vng0@cdc.gov
Century Bldg 2400
Atlanta, GA 30329 


-----Original Message-----
From: Jack Krupansky [mailto:jack@basetechnology.com] 
Sent: Wednesday, August 22, 2012 4:07 PM
To: solr-user@lucene.apache.org
Subject: Re: Full Text Indexing for DOCX files

I've indexed Office 2007 .docx using Solr 3.6.

It sounds as if Solr 1.3 had an old release of Tika/POI. No big surprise there.

-- Jack Krupansky

-----Original Message-----
From: Nguyen, Vincent (CDC/OD/OADS) (CTR)
Sent: Wednesday, August 22, 2012 3:57 PM
To: solr-user@lucene.apache.org
Subject: Full Text Indexing for DOCX files

Has anyone been able to index DOCX files?  I get this error message when using office 2007 documents

(Location of error
unknown)org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. POI only supports OLE2 Office documents

We're currently using SOLR1.3

Vincent Vu Nguyen



Re: Full Text Indexing for DOCX files

Posted by Jack Krupansky <ja...@basetechnology.com>.
I've indexed Office 2007 .docx using Solr 3.6.

It sounds as if Solr 1.3 had an old release of Tika/POI. No big surprise 
there.

-- Jack Krupansky

-----Original Message----- 
From: Nguyen, Vincent (CDC/OD/OADS) (CTR)
Sent: Wednesday, August 22, 2012 3:57 PM
To: solr-user@lucene.apache.org
Subject: Full Text Indexing for DOCX files

Has anyone been able to index DOCX files?  I get this error message when 
using office 2007 documents

(Location of error 
unknown)org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied 
data appears to be in the Office 2007+ XML. POI only supports OLE2 Office 
documents

We're currently using SOLR1.3

Vincent Vu Nguyen