You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Andrzej Bialecki (JIRA)" <ji...@apache.org> on 2011/06/13 19:00:52 UTC
[jira] [Created] (TIKA-674) CompositeParser should indicate which
parser was actually selected for parsing
CompositeParser should indicate which parser was actually selected for parsing
------------------------------------------------------------------------------
Key: TIKA-674
URL: https://issues.apache.org/jira/browse/TIKA-674
Project: Tika
Issue Type: Improvement
Components: parser
Affects Versions: 1.0
Reporter: Andrzej Bialecki
If multiple parsers exist that support the same mime type, and AutoDetectParser (or another CompositeParser) is used, then the parse output does not indicate which of the alternative parsers was actually used. I think that the name of the parser (FQCN?) should be added to the metadata.
Something like this trivial patch:
{code}
Index: tika-core/src/main/java/org/apache/tika/parser/CompositeParser.java
===================================================================
--- tika-core/src/main/java/org/apache/tika/parser/CompositeParser.java (revision 1135167)
+++ tika-core/src/main/java/org/apache/tika/parser/CompositeParser.java (working copy)
@@ -238,6 +238,7 @@
try {
TikaInputStream taggedStream = TikaInputStream.get(stream, tmp);
TaggedContentHandler taggedHandler = new TaggedContentHandler(handler);
+ metadata.add("X-Parsed-By", parser.getClass().getName());
try {
parser.parse(taggedStream, taggedHandler, metadata, context);
} catch (RuntimeException e) {
{code}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira