You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Kenneth William Krugler (Jira)" <ji...@apache.org> on 2020/11/30 14:38:00 UTC

[jira] [Commented] (TIKA-3239) TikaException: data length must be < 1000000

    [ https://issues.apache.org/jira/browse/TIKA-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17240795#comment-17240795 ] 

Kenneth William Krugler commented on TIKA-3239:
-----------------------------------------------

Hi [~harirehm] - this is the expected behavior. There's no way to communicate back that data was dropped due to the limit being hit, thus an exception is thrown.

As a side comment, please ask questions like this on the mailing list, as that's a lighter-weight way of handling, and others can benefit from the exchange. See https://tika.apache.org/mail-lists.html#:~:text=The%20user%20mailing%20list%20at,in%20contributing%20to%20Tika%20development.

> TikaException: data length must be < 1000000
> --------------------------------------------
>
>                 Key: TIKA-3239
>                 URL: https://issues.apache.org/jira/browse/TIKA-3239
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.24.1
>            Reporter: HARI RAM
>            Priority: Major
>
> Tika exception is thrown when trying to parse PSD files using the latest tika version (1.24.1). 
>  
>  
> {code:java}
> org.apache.tika.exception.TikaException: data length must be < 1000000: 7108276
> 	at org.apache.tika.parser.image.PSDParser$ResourceBlock.<init>(PSDParser.java:233)
> 	at org.apache.tika.parser.image.PSDParser$ResourceBlock.<init>(PSDParser.java:167)
> 	at org.apache.tika.parser.image.PSDParser.parse(PSDParser.java:135)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> 	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
> 	at org.apache.tika.Tika.parseToString(Tika.java:527)
> 	at org.apache.tika.Tika.parseToString(Tika.java:602)
> {code}
>  
> Is this limit configurable? Shouldn't that be parsing up to the limit and return the parsed data?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)