You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (JIRA)" <ji...@apache.org> on 2013/10/23 11:48:42 UTC

[jira] [Commented] (TIKA-1187) java.lang.OutOfMemoryError: Java heap space

    [ https://issues.apache.org/jira/browse/TIKA-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13802750#comment-13802750 ] 

Nick Burch commented on TIKA-1187:
----------------------------------

This looks a lot like TIKA-1182 but as you're on an old version of Tika it could be already fixed in PDFBox.

Can you try upgrading to the latest PDFBox? If it still fails there, please report the bug upstream much as for TIKA-1182

> java.lang.OutOfMemoryError: Java heap space
> -------------------------------------------
>
>                 Key: TIKA-1187
>                 URL: https://issues.apache.org/jira/browse/TIKA-1187
>             Project: Tika
>          Issue Type: Bug
>          Components: general
>    Affects Versions: 1.3
>         Environment: Ubuntu 
>            Reporter: GURFAN
>            Priority: Critical
>   Original Estimate: 612h
>  Remaining Estimate: 612h
>
> Hi,
> While parsing the content we are getting below exception in parse method.
> The file which we are parsing is 1 mb.
> TIKA JAR:  tika-core-1.3.jar
> File size: 1 MB.
> Parser parser = new AutoDetectParser();
> parser.parse(is, handler, metaData, new ParseContext());
> java.lang.OutOfMemoryError: Java heap space
> 	at java.util.Arrays.copyOf(Arrays.java:2734)
> 	at java.util.ArrayList.ensureCapacity(ArrayList.java:167)
> 	at java.util.ArrayList.add(ArrayList.java:351)
> 	at org.apache.fontbox.ttf.GlyfCompositeDescript.(GlyfCompositeDescript.java:60)
> 	at org.apache.fontbox.ttf.GlyphData.initData(GlyphData.java:63)
> 	at org.apache.fontbox.ttf.GlyphTable.initData(GlyphTable.java:71)
> 	at org.apache.fontbox.ttf.AbstractTTFParser.parseTables(AbstractTTFParser.java:163)
> 	at org.apache.fontbox.ttf.TTFParser.parseTables(TTFParser.java:61)
> 	at org.apache.fontbox.ttf.AbstractTTFParser.parseTTF(AbstractTTFParser.java:90)
> 	at org.apache.fontbox.ttf.TTFParser.parseTTF(TTFParser.java:26)
> 	at org.apache.fontbox.ttf.AbstractTTFParser.parseTTF(AbstractTTFParser.java:66)
> 	at org.apache.fontbox.ttf.TTFParser.parseTTF(TTFParser.java:26)
> 	at org.apache.tika.parser.font.TrueTypeParser.parse(TrueTypeParser.java:65)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> 	at com.impetus.vajra.parser.tika.TikaParser.processContent(TikaParser.java:96)
> 	at com.impetus.vajra.storm.helper.TextAnalyserBoltHelper.execute(TextAnalyserBoltHelper.java:283)
> 	at com.impetus.vajra.storm.TextAnalyserBolt.execute(TextAnalyserBolt.java:182)
> 	at backtype.storm.daemon.executor$fn__4050$tuple_action_fn__4052.invoke(executor.clj:566)
> 	at backtype.storm.daemon.executor$mk_task_receiver$fn__3976.invoke(executor.clj:345)
> 	at backtype.storm.disruptor$clojure_handler$reify__1606.onEvent(disruptor.clj:43)
> 	at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:84)
> 	at backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:58)
> 	at backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:62)
> 	at backtype.storm.daemon.executor$fn__4050$fn__4059$fn__4106.invoke(executor.clj:658)
> 	at backtype.storm.util$async_loop$fn__465.invoke(util.clj:377)
> 	at clojure.lang.AFn.run(AFn.java:24)
> 	at java.lang.Thread.run(Thread.java:662)



--
This message was sent by Atlassian JIRA
(v6.1#6144)