You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Bracken, Patrick" <Pa...@finra.org> on 2010/07/30 16:43:12 UTC

Word95 and earlier versions

Hi, I am attempting to use Tika to extract content from .doc files for
search indexing purposes. I have run into some exceptions thrown when
looking at documents mad in an old version of word. Is there any plan to
add support for this or a way to get around it?

 

Thanks,

 

Patrick Bracken

FINRA



Confidentiality Notice:  This email, including attachments, may include non-public, proprietary, confidential or legally privileged information.  If you are not an intended recipient or an authorized agent of an intended recipient, you are hereby notified that any dissemination, distribution or copying of the information contained in or transmitted with this e-mail is unauthorized and strictly prohibited.  If you have received this email in error, please notify the sender by replying to this message and permanently delete this e-mail, its attachments, and any copies of it immediately.  You should not retain, copy or use this e-mail or any attachment for any purpose, nor disclose all or any part of the contents to any other person. Thank you


Re: Word95 and earlier versions

Posted by Nick Burch <ni...@alfresco.com>.
On Fri, 30 Jul 2010, Bracken, Patrick wrote:
> Hi, I am attempting to use Tika to extract content from .doc files for 
> search indexing purposes. I have run into some exceptions thrown when 
> looking at documents mad in an old version of word. Is there any plan to 
> add support for this or a way to get around it?

See TIKA-408 <https://issues.apache.org/jira/browse/TIKA-408> for details. 
Support is now in POI, and Tika will handle Word 7 and Word 95 documents 
once the next POI beta release is out. That'll hopefully be within the 
next fortnight

Nick