You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Yatin Baraiya <ya...@highqsolutions.net> on 2011/09/29 09:34:42 UTC
Re: error parsing .XLS file
Hy roland
i get same issue when i parse the Microsoft office doc.
i have poi-3.6 version jar and tika 0.6 file in my project.
we get the following exception
Caused by: java.lang.ArrayIndexOutOfBoundsException: 221433
at org.apache.poi.util.LittleEndian.getShort(LittleEndian.java:45)
at org.apache.poi.hwpf.model.ListLevel.<init>(ListLevel.java:120)
at org.apache.poi.hwpf.model.ListFormatOverrideLevel.<init>
(ListFormatOverrideLevel.java:48)
at org.apache.poi.hwpf.model.ListTables.<init>(ListTables.java:88)
at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:267)
at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:157)
at org.apache.poi.hwpf.extractor.WordExtractor.<init>(WordExtractor.java:62)
at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:87)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
i had tried with opening the tika-parsers-0.6.jar in winrar and find the pom.xml
from the jar and edit the pom.xml as per ur suggestion
edited pom.xml snippets of the file is below
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi</artifactId>
<version>3.6</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-scratchpad</artifactId>
<version>3.6</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>3.6</version>
<exclusions>
<exclusion>
<groupId>stax</groupId>
<artifactId>stax-api</artifactId>
</exclusion>
</exclusions>
</dependency>
can u tell me exactly how would u get the solution?
can u help me to solve the said issue?
how to modify the POEM in order to use POI 3.7 with TIKA?
Thanks
Yatin Baraiya
Re: error parsing .XLS file
Posted by Yatin Baraiya <ya...@highqsolutions.net>.
Nick Burch <ni...@...> writes:
>
> On Thu, 29 Sep 2011, Yatin Baraiya wrote:
> > i have tried with latest tika 0.9 and poi 3.7 jar. but same exception is
> > getting
>
> What about with the release candidate of tika 0.10 (which uses POI 3.8
> beta 4)?
>
> Nick
>
>
will provide me the apache toka 0.10 jar download link?
yatin
Re: error parsing .XLS file
Posted by Nick Burch <ni...@alfresco.com>.
On Thu, 29 Sep 2011, Yatin Baraiya wrote:
> i have tried with latest tika 0.9 and poi 3.7 jar. but same exception is
> getting
What about with the release candidate of tika 0.10 (which uses POI 3.8
beta 4)?
Nick
Re: error parsing .XLS file
Posted by Yatin Baraiya <ya...@highqsolutions.net>.
i have tried with latest tika 0.9 and poi 3.7 jar.
but same exception is getting
Regards
Yatin
RE: error parsing .XLS file
Posted by Uwe Schindler <uw...@thetaphi.de>.
How about using a newer TIKA version?
-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de
> -----Original Message-----
> From: Yatin Baraiya [mailto:yatin.baraiya@highqsolutions.net]
> Sent: Thursday, September 29, 2011 9:35 AM
> To: user@tika.apache.org
> Subject: Re: error parsing .XLS file
>
> Hy roland
>
> i get same issue when i parse the Microsoft office doc.
> i have poi-3.6 version jar and tika 0.6 file in my project.
>
> we get the following exception
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 221433 at
> org.apache.poi.util.LittleEndian.getShort(LittleEndian.java:45)
> at org.apache.poi.hwpf.model.ListLevel.<init>(ListLevel.java:120)
> at org.apache.poi.hwpf.model.ListFormatOverrideLevel.<init>
> (ListFormatOverrideLevel.java:48)
> at org.apache.poi.hwpf.model.ListTables.<init>(ListTables.java:88)
> at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:267)
> at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:157)
> at
org.apache.poi.hwpf.extractor.WordExtractor.<init>(WordExtractor.java:62)
> at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:87)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
>
> i had tried with opening the tika-parsers-0.6.jar in winrar and find the
pom.xml
> from the jar and edit the pom.xml as per ur suggestion edited pom.xml
> snippets of the file is below <dependency>
> <groupId>org.apache.poi</groupId>
> <artifactId>poi</artifactId>
> <version>3.6</version>
> </dependency>
> <dependency>
> <groupId>org.apache.poi</groupId>
> <artifactId>poi-scratchpad</artifactId>
> <version>3.6</version>
> </dependency>
> <dependency>
> <groupId>org.apache.poi</groupId>
> <artifactId>poi-ooxml</artifactId>
> <version>3.6</version>
> <exclusions>
> <exclusion>
> <groupId>stax</groupId>
> <artifactId>stax-api</artifactId>
> </exclusion>
> </exclusions>
> </dependency>
>
> can u tell me exactly how would u get the solution?
>
> can u help me to solve the said issue?
>
> how to modify the POEM in order to use POI 3.7 with TIKA?
>
> Thanks
> Yatin Baraiya
>