You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "VENU (JIRA)" <ji...@apache.org> on 2017/05/02 12:51:04 UTC

[jira] [Updated] (TIKA-2351) Getting error while parsing documents

     [ https://issues.apache.org/jira/browse/TIKA-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

VENU updated TIKA-2351:
-----------------------
    Attachment: 03 - Json_creat_code.txt
                02 - Pipeline.txt
                01 - Templete.txt
                englishAnalyzer.doc

Required files attached

> Getting error while parsing documents
> -------------------------------------
>
>                 Key: TIKA-2351
>                 URL: https://issues.apache.org/jira/browse/TIKA-2351
>             Project: Tika
>          Issue Type: Bug
>          Components: general
>    Affects Versions: 1.14
>         Environment: Red Hat Enterprise Linux Server release 7.3
> ElasticSearch 5.2.1
> ingest-attachment 5.2.1
>            Reporter: VENU
>              Labels: starter
>         Attachments: 01 - Templete.txt, 02 - Pipeline.txt, 03 - Json_creat_code.txt, englishAnalyzer.doc
>
>
> Hi Everyone,
> I am using Ingest-attachment for indexing documents. I am able to parse text documents (.txt files). When I try to parse .doc or pdf files getting this error.
> FILE = /elastic/files/englishAnalyzer.doc
> ID = 6
> "error" : {
> "root_cause" : [
> {
> "type" : "exception",
> "reason" : "java.lang.IllegalArgumentException: ElasticsearchParseException[Error parsing document in field [data]]; nested: TikaExc
> eption[Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@28992079]; nested: ArrayIndexOutOfBoundsException[-1];
> ",
> "header" : {
> "processor_type" : "attachment"
> }
> }
> ],
> "type" : "exception",
> "reason" : "java.lang.IllegalArgumentException: ElasticsearchParseException[Error parsing document in field [data]]; nested: TikaExcepti
> on[Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@28992079]; nested: ArrayIndexOutOfBoundsException[-1];",
> "caused_by" : {
> "type" : "illegal_argument_exception",
> "reason" : "ElasticsearchParseException[Error parsing document in field [data]]; nested: TikaException[Unexpected RuntimeException fro
> m org.apache.tika.parser.microsoft.OfficeParser@28992079]; nested: ArrayIndexOutOfBoundsException[-1];",
> "caused_by" : {
> "type" : "parse_exception",
> "reason" : "Error parsing document in field [data]",
> "caused_by" : {
> "type" : "tika_exception",
> "reason" : "Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@28992079",
> "caused_by" : {
> "type" : "array_index_out_of_bounds_exception",
> "reason" : "-1"
> }
> }
> }
> },
> "header" : {
> "processor_type" : "attachment"
> }
> },
> "status" : 500
> }
> Please help me to resolve the issue



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)