You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by "Tilman Hausherr (Jira)" <ji...@apache.org> on 2021/04/10 11:40:00 UTC

[jira] [Comment Edited] (PDFBOX-5161) Content stream parse error that doesn't happen when content stream is parsed alone

    [ https://issues.apache.org/jira/browse/PDFBOX-5161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17318477#comment-17318477 ] 

Tilman Hausherr edited comment on PDFBOX-5161 at 4/10/21, 11:39 AM:
--------------------------------------------------------------------

There is one difference: when it works, the input is a RandomAccessReadBuffer, and when it doesn't, it is a SequenceRandomAccessRead.

In SequenceRandomAccessRead.read() there is this code
{code}
        int maxAvailBytes = Math.min(available(), length);
        if (maxAvailBytes == 0)
        {
            return -1;
        }
{code}
that part of the code gets hit long before EOF.


was (Author: tilman):
There is one difference: when it works, the input is a RandomAccessReadBuffer, and it doesn't, it is a SequenceRandomAccessRead.

In SequenceRandomAccessRead.read() there is this code
{code}
        int maxAvailBytes = Math.min(available(), length);
        if (maxAvailBytes == 0)
        {
            return -1;
        }
{code}
that part of the code gets hit long before EOF.

> Content stream parse error that doesn't happen when content stream is parsed alone
> ----------------------------------------------------------------------------------
>
>                 Key: PDFBOX-5161
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5161
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 3.0.0 PDFBox
>            Reporter: Tilman Hausherr
>            Priority: Major
>              Labels: regression
>             Fix For: 3.0.0 PDFBox
>
>         Attachments: 179212.pdf, cs.txt
>
>
> {noformat}
> java.io.IOException: Unknown dir object c=')' cInt=41 peek=')' peekInt=41 at offset 12287
>     org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:865)
>     org.apache.pdfbox.pdfparser.BaseParser.parseCOSArray(BaseParser.java:634)
>     org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:130)
> {noformat}
> This code doesn't reproduce the problem:
> {code}
>         byte[] bytes = Files.readAllBytes(Paths.get("cs.txt"));
>         PDFStreamParser parser = new PDFStreamParser(bytes);
>         parser.parse();
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org