You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tilman Hausherr (Jira)" <ji...@apache.org> on 2022/04/12 16:24:00 UTC

[jira] [Comment Edited] (TIKA-3718) Special PDF document causes Tika parser to hang

    [ https://issues.apache.org/jira/browse/TIKA-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521236#comment-17521236 ] 

Tilman Hausherr edited comment on TIKA-3718 at 4/12/22 4:23 PM:
----------------------------------------------------------------

It's on page 10. Probably some loop / recursion. The XObject resource looks suspicious and goes deep. PDFBox has a protection against this... and java should throw a stack overflow but nothing happens.


was (Author: tilman):
It's on page 10. Probably some loop / recursion. The XObject resource looks suspicious and goes deep. But PDFBox has a protection against this... and java should throw a stack overflow but nothing happens.

> Special PDF document causes Tika parser to hang
> -----------------------------------------------
>
>                 Key: TIKA-3718
>                 URL: https://issues.apache.org/jira/browse/TIKA-3718
>             Project: Tika
>          Issue Type: Bug
>          Components: app
>    Affects Versions: 1.28.1, 2.3.0
>         Environment: The problem can be reproduced under (Windows + Java8).   However, the problem does not appear to be environment specific.   
>            Reporter: David Avant
>            Priority: Major
>         Attachments: map.pdf
>
>
> Attempting to parse the attached "map.pdf" causes the Tika parser to hang due to an infinite loop involving "PDFStreamParser" logic.
> This problem occurs in both tika-app 1.28.1 and 2.3.0.
> It is also worth noting that Acrobat itself will become unresponsive if attempting to open this document.
> To reproduce the problem, just run:
> java -jar tika-app-1.28.1.jar map.pdf



--
This message was sent by Atlassian Jira
(v8.20.1#820001)