You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Kenneth William Krugler (Jira)" <ji...@apache.org> on 2020/11/30 14:38:00 UTC
[jira] [Commented] (TIKA-3239) TikaException: data length must be <
1000000
[ https://issues.apache.org/jira/browse/TIKA-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17240795#comment-17240795 ]
Kenneth William Krugler commented on TIKA-3239:
-----------------------------------------------
Hi [~harirehm] - this is the expected behavior. There's no way to communicate back that data was dropped due to the limit being hit, thus an exception is thrown.
As a side comment, please ask questions like this on the mailing list, as that's a lighter-weight way of handling, and others can benefit from the exchange. See https://tika.apache.org/mail-lists.html#:~:text=The%20user%20mailing%20list%20at,in%20contributing%20to%20Tika%20development.
> TikaException: data length must be < 1000000
> --------------------------------------------
>
> Key: TIKA-3239
> URL: https://issues.apache.org/jira/browse/TIKA-3239
> Project: Tika
> Issue Type: Bug
> Affects Versions: 1.24.1
> Reporter: HARI RAM
> Priority: Major
>
> Tika exception is thrown when trying to parse PSD files using the latest tika version (1.24.1).
>
>
> {code:java}
> org.apache.tika.exception.TikaException: data length must be < 1000000: 7108276
> at org.apache.tika.parser.image.PSDParser$ResourceBlock.<init>(PSDParser.java:233)
> at org.apache.tika.parser.image.PSDParser$ResourceBlock.<init>(PSDParser.java:167)
> at org.apache.tika.parser.image.PSDParser.parse(PSDParser.java:135)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
> at org.apache.tika.Tika.parseToString(Tika.java:527)
> at org.apache.tika.Tika.parseToString(Tika.java:602)
> {code}
>
> Is this limit configurable? Shouldn't that be parsing up to the limit and return the parsed data?
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)