You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Mark Kerzner <ma...@shmsoft.com> on 2012/03/08 07:11:33 UTC

OutOfMemoryError in Tika

Hi,

I am getting an OutOfMemoryError exception with Tika on some MS file. I
even know the file, it is attached, but I can't figure out how to best
search for the solution.

Thank you. Sincerely,
Mark

Here is the log:

2012-03-07 23:50:11,265 FATAL org.apache.hadoop.mapred.Child: Error
running child : java.lang.OutOfMemoryError: Java heap space
	at org.apache.poi.hwpf.usermodel.Picture.fillRawImageContent(Picture.java:362)
	at org.apache.poi.hwpf.usermodel.Picture.getRawContent(Picture.java:203)
	at org.apache.poi.hwpf.usermodel.Picture.fillImageContent(Picture.java:372)
	at org.apache.poi.hwpf.usermodel.Picture.getContent(Picture.java:191)
	at org.apache.poi.hwpf.usermodel.Picture.suggestPictureType(Picture.java:330)
	at org.apache.poi.hwpf.usermodel.Picture.suggestFileExtension(Picture.java:315)
	at org.apache.poi.hwpf.usermodel.Picture.suggestFullFileName(Picture.java:150)
	at org.apache.tika.parser.microsoft.WordExtractor$PicturesSource.<init>(WordExtractor.java:538)
	at org.apache.tika.parser.microsoft.WordExtractor$PicturesSource.<init>(WordExtractor.java:522)
	at org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:91)
	at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:204)
	at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:177)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
	at org.apache.tika.Tika.parseToString(Tika.java:380)
	at org.freeeed.main.DocumentParser.parse(DocumentParser.java:31)
	at org.freeeed.main.FileProcessor.extractMetadata(FileProcessor.java:336)
	at org.freeeed.main.FileProcessor.processFileEntry(FileProcessor.java:110)
	at org.freeeed.main.ZipFileProcessor.processArchivesRecursively(ZipFileProcessor.java:133)
	at org.freeeed.main.ZipFileProcessor.processArchivesRecursively(ZipFileProcessor.java:119)
	at org.freeeed.main.ZipFileProcessor.processArchivesRecursively(ZipFileProcessor.java:119)
	at org.freeeed.main.ZipFileProcessor.processArchivesRecursively(ZipFileProcessor.java:119)
	at org.freeeed.main.ZipFileProcessor.processArchivesRecursively(ZipFileProcessor.java:119)
	at org.freeeed.main.ZipFileProcessor.processWithTrueZip(ZipFileProcessor.java:100)
	at org.freeeed.main.ZipFileProcessor.process(ZipFileProcessor.java:55)
	at org.freeeed.main.Map.map(Map.java:65)
	at org.freeeed.main.Map.map(Map.java:21)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:270)

Re: OutOfMemoryError in Tika

Posted by Nick Burch <ni...@alfresco.com>.
On Fri, 9 Mar 2012, Mark Kerzner wrote:
> How do I get 1.1 in Maven?

You'll need to wait, it's only a RC at the moment. If the vote passes, 
it'll be published. (Depending on your maven skills, you may or may not be 
able to pull in the RC artifacts to test with)

Nick

Re: OutOfMemoryError in Tika

Posted by Mark Kerzner <ma...@shmsoft.com>.
How do I get 1.1 in Maven?

On Fri, Mar 9, 2012 at 7:21 AM, Nick Burch <ni...@alfresco.com> wrote:

> On Fri, 9 Mar 2012, Mark Kerzner wrote:
>
>> Standard 1.0 of Tika, with whatever POI is included in it by default
>>
>
> It's probably worth re-testing with the Tika 1.1 release candidate, and
> seeing if that fixes it (it has a newer POI version in it)
>
> Nick
>

Re: OutOfMemoryError in Tika

Posted by Nick Burch <ni...@alfresco.com>.
On Fri, 9 Mar 2012, Mark Kerzner wrote:
> Standard 1.0 of Tika, with whatever POI is included in it by default

It's probably worth re-testing with the Tika 1.1 release candidate, and 
seeing if that fixes it (it has a newer POI version in it)

Nick

Re: OutOfMemoryError in Tika

Posted by Mark Kerzner <ma...@shmsoft.com>.
Standard 1.0 of Tika, with whatever POI is included in it by default

On Fri, Mar 9, 2012 at 6:36 AM, Nick Burch <ni...@alfresco.com> wrote:

> On Thu, 8 Mar 2012, Mark Kerzner wrote:
>
>> I am getting an OutOfMemoryError exception with Tika on some MS file.
>>
>
> What version of Tika? (And what version of POI does that include, if it's
> not a final Tika release - the error is coming from POI)
>
> Nick
>

Re: OutOfMemoryError in Tika

Posted by Nick Burch <ni...@alfresco.com>.
On Thu, 8 Mar 2012, Mark Kerzner wrote:
> I am getting an OutOfMemoryError exception with Tika on some MS file.

What version of Tika? (And what version of POI does that include, if it's 
not a final Tika release - the error is coming from POI)

Nick