You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "David Pilato (JIRA)" <ji...@apache.org> on 2016/12/14 08:29:58 UTC
[jira] [Created] (TIKA-2208) Catch missing libraires
David Pilato created TIKA-2208:
----------------------------------
Summary: Catch missing libraires
Key: TIKA-2208
URL: https://issues.apache.org/jira/browse/TIKA-2208
Project: Tika
Issue Type: Improvement
Components: parser
Reporter: David Pilato
Hi there
We have decided to remove support for some formats when using Tika to extract text and metadata.
We defined our list of Parsers:
{code:java}
private static final Parser PARSERS[] = new Parser[] {
// documents
new org.apache.tika.parser.html.HtmlParser(),
new org.apache.tika.parser.rtf.RTFParser(),
new org.apache.tika.parser.pdf.PDFParser(),
new org.apache.tika.parser.txt.TXTParser(),
new org.apache.tika.parser.microsoft.OfficeParser(),
new org.apache.tika.parser.microsoft.OldExcelParser(),
new org.apache.tika.parser.microsoft.ooxml.OOXMLParser(),
new org.apache.tika.parser.odf.OpenDocumentParser(),
new org.apache.tika.parser.iwork.IWorkPackageParser(),
new org.apache.tika.parser.xml.DcXMLParser(),
new org.apache.tika.parser.epub.EpubParser(),
};
private static final AutoDetectParser PARSER_INSTANCE = new AutoDetectParser(PARSERS);
private static final Tika TIKA_INSTANCE = new Tika(PARSER_INSTANCE.getDetector(), PARSER_INSTANCE);
{code}
But when a MS Office Word document embeds another non supported document (Like a Visio Schema) an {{NoClassDefFoundError}} is raised.
Would it be possible to catch such a case and throw in that case a {{TikaException}} so it behaves as an Exception and not as a Throwable?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)