You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/08/24 14:45:01 UTC

[jira] [Commented] (TIKA-2447) PSDParser creates unnecessary large byte array and discards it

    [ https://issues.apache.org/jira/browse/TIKA-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16140107#comment-16140107 ] 

ASF GitHub Bot commented on TIKA-2447:
--------------------------------------

bjrke opened a new pull request #200: TIKA-2447 reduce memory consumption of PSDParser
URL: https://github.com/apache/tika/pull/200
 
 
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> PSDParser creates unnecessary large byte array and discards it
> --------------------------------------------------------------
>
>                 Key: TIKA-2447
>                 URL: https://issues.apache.org/jira/browse/TIKA-2447
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.15, 1.16
>         Environment: openjdk version "1.8.0_131"
> few memory (currently using 256M xmx)
>            Reporter: Jan Burkhardt
>            Priority: Critical
>
> PSD files (Adobe Photoshop) are split into ResourceBlock's which contain different data, but only Caption Blocks are currently extracted into the description.
> Parsing a file with very big blocks, i.e. for image data, a byte array of the size of the block is allocated:
> https://github.com/apache/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/image/PSDParser.java#L191
> even if it is discarded after that:
> https://github.com/apache/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/image/PSDParser.java#L116 and following lines
> This causes huge memory consumption and finally killed the App with an OutOfMemoryError.
> {noformat}
> java.lang.OutOfMemoryError: Java heap space
>         at org.apache.tika.parser.image.PSDParser$ResourceBlock.<init>(PSDParser.java:191) ~[tika-parsers-1.15.jar!/:1.15]
>         at org.apache.tika.parser.image.PSDParser$ResourceBlock.<init>(PSDParser.java:141) ~[tika-parsers-1.15.jar!/:1.15]
>         at org.apache.tika.parser.image.PSDParser.parse(PSDParser.java:116) ~[tika-parsers-1.15.jar!/:1.15]
>         at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[tika-core-1.15.jar!/:1.15]
>         at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ~[tika-core-1.15.jar!/:1.15]
>         at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135) ~[tika-core-1.15.jar!/:1.15]
> {noformat}
> I am not able to deliver a file to reproduce that, since the file which caused that issue is owned by one of our customers.
> I will prepare a pull request to fix that.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)