You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "VENU (JIRA)" <ji...@apache.org> on 2017/05/02 12:38:04 UTC

[jira] [Created] (TIKA-2351) Getting error while parsing documents

VENU created TIKA-2351:
--------------------------

             Summary: Getting error while parsing documents
                 Key: TIKA-2351
                 URL: https://issues.apache.org/jira/browse/TIKA-2351
             Project: Tika
          Issue Type: Bug
          Components: general
    Affects Versions: 1.14
         Environment: Red Hat Enterprise Linux Server release 7.3
ElasticSearch 5.2.1
ingest-attachment 5.2.1
            Reporter: VENU


Hi Everyone,

I am using Ingest-attachment for indexing documents. I am able to parse text documents (.txt files). When I try to parse .doc or pdf files getting this error.

FILE = /elastic/files/englishAnalyzer.doc
ID = 6

"error" : {
"root_cause" : [
{
"type" : "exception",
"reason" : "java.lang.IllegalArgumentException: ElasticsearchParseException[Error parsing document in field [data]]; nested: TikaExc
eption[Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@28992079]; nested: ArrayIndexOutOfBoundsException[-1];
",
"header" : {
"processor_type" : "attachment"
}
}
],
"type" : "exception",
"reason" : "java.lang.IllegalArgumentException: ElasticsearchParseException[Error parsing document in field [data]]; nested: TikaExcepti
on[Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@28992079]; nested: ArrayIndexOutOfBoundsException[-1];",
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "ElasticsearchParseException[Error parsing document in field [data]]; nested: TikaException[Unexpected RuntimeException fro
m org.apache.tika.parser.microsoft.OfficeParser@28992079]; nested: ArrayIndexOutOfBoundsException[-1];",
"caused_by" : {
"type" : "parse_exception",
"reason" : "Error parsing document in field [data]",
"caused_by" : {
"type" : "tika_exception",
"reason" : "Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@28992079",
"caused_by" : {
"type" : "array_index_out_of_bounds_exception",
"reason" : "-1"
}
}
}
},
"header" : {
"processor_type" : "attachment"
}
},
"status" : 500
}

Please help me to resolve the issue



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)