You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Mark Kerzner <ma...@shmsoft.com> on 2012/03/08 07:11:33 UTC
OutOfMemoryError in Tika
Hi,
I am getting an OutOfMemoryError exception with Tika on some MS file. I
even know the file, it is attached, but I can't figure out how to best
search for the solution.
Thank you. Sincerely,
Mark
Here is the log:
2012-03-07 23:50:11,265 FATAL org.apache.hadoop.mapred.Child: Error
running child : java.lang.OutOfMemoryError: Java heap space
at org.apache.poi.hwpf.usermodel.Picture.fillRawImageContent(Picture.java:362)
at org.apache.poi.hwpf.usermodel.Picture.getRawContent(Picture.java:203)
at org.apache.poi.hwpf.usermodel.Picture.fillImageContent(Picture.java:372)
at org.apache.poi.hwpf.usermodel.Picture.getContent(Picture.java:191)
at org.apache.poi.hwpf.usermodel.Picture.suggestPictureType(Picture.java:330)
at org.apache.poi.hwpf.usermodel.Picture.suggestFileExtension(Picture.java:315)
at org.apache.poi.hwpf.usermodel.Picture.suggestFullFileName(Picture.java:150)
at org.apache.tika.parser.microsoft.WordExtractor$PicturesSource.<init>(WordExtractor.java:538)
at org.apache.tika.parser.microsoft.WordExtractor$PicturesSource.<init>(WordExtractor.java:522)
at org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:91)
at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:204)
at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:177)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at org.apache.tika.Tika.parseToString(Tika.java:380)
at org.freeeed.main.DocumentParser.parse(DocumentParser.java:31)
at org.freeeed.main.FileProcessor.extractMetadata(FileProcessor.java:336)
at org.freeeed.main.FileProcessor.processFileEntry(FileProcessor.java:110)
at org.freeeed.main.ZipFileProcessor.processArchivesRecursively(ZipFileProcessor.java:133)
at org.freeeed.main.ZipFileProcessor.processArchivesRecursively(ZipFileProcessor.java:119)
at org.freeeed.main.ZipFileProcessor.processArchivesRecursively(ZipFileProcessor.java:119)
at org.freeeed.main.ZipFileProcessor.processArchivesRecursively(ZipFileProcessor.java:119)
at org.freeeed.main.ZipFileProcessor.processArchivesRecursively(ZipFileProcessor.java:119)
at org.freeeed.main.ZipFileProcessor.processWithTrueZip(ZipFileProcessor.java:100)
at org.freeeed.main.ZipFileProcessor.process(ZipFileProcessor.java:55)
at org.freeeed.main.Map.map(Map.java:65)
at org.freeeed.main.Map.map(Map.java:21)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
Re: OutOfMemoryError in Tika
Posted by Nick Burch <ni...@alfresco.com>.
On Fri, 9 Mar 2012, Mark Kerzner wrote:
> How do I get 1.1 in Maven?
You'll need to wait, it's only a RC at the moment. If the vote passes,
it'll be published. (Depending on your maven skills, you may or may not be
able to pull in the RC artifacts to test with)
Nick
Re: OutOfMemoryError in Tika
Posted by Mark Kerzner <ma...@shmsoft.com>.
How do I get 1.1 in Maven?
On Fri, Mar 9, 2012 at 7:21 AM, Nick Burch <ni...@alfresco.com> wrote:
> On Fri, 9 Mar 2012, Mark Kerzner wrote:
>
>> Standard 1.0 of Tika, with whatever POI is included in it by default
>>
>
> It's probably worth re-testing with the Tika 1.1 release candidate, and
> seeing if that fixes it (it has a newer POI version in it)
>
> Nick
>
Re: OutOfMemoryError in Tika
Posted by Nick Burch <ni...@alfresco.com>.
On Fri, 9 Mar 2012, Mark Kerzner wrote:
> Standard 1.0 of Tika, with whatever POI is included in it by default
It's probably worth re-testing with the Tika 1.1 release candidate, and
seeing if that fixes it (it has a newer POI version in it)
Nick
Re: OutOfMemoryError in Tika
Posted by Mark Kerzner <ma...@shmsoft.com>.
Standard 1.0 of Tika, with whatever POI is included in it by default
On Fri, Mar 9, 2012 at 6:36 AM, Nick Burch <ni...@alfresco.com> wrote:
> On Thu, 8 Mar 2012, Mark Kerzner wrote:
>
>> I am getting an OutOfMemoryError exception with Tika on some MS file.
>>
>
> What version of Tika? (And what version of POI does that include, if it's
> not a final Tika release - the error is coming from POI)
>
> Nick
>
Re: OutOfMemoryError in Tika
Posted by Nick Burch <ni...@alfresco.com>.
On Thu, 8 Mar 2012, Mark Kerzner wrote:
> I am getting an OutOfMemoryError exception with Tika on some MS file.
What version of Tika? (And what version of POI does that include, if it's
not a final Tika release - the error is coming from POI)
Nick