You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (JIRA)" <ji...@apache.org> on 2009/12/05 17:23:20 UTC

[jira] Closed: (PDFBOX-344) PushbackInputStream returns partial strings

     [ https://issues.apache.org/jira/browse/PDFBOX-344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Lehmkühler closed PDFBOX-344.
-------------------------------------


> PushbackInputStream returns partial strings
> -------------------------------------------
>
>                 Key: PDFBOX-344
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-344
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 0.7.3
>         Environment: Mac OS X 10.5
>            Reporter: John F. Walsh
>             Fix For: 0.8.0-incubator
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> When org.pdfbox.pdfparser.BaseParser.parseDirObject() checks to see if it's reading the string "false" from pdfSource, that check can fail if there's a pause in the underlying read of the PDF file. org.pdfbox.io.PushBackInputStream extends java.io.PushBackInputStream. java.io.PushBackInputStream.read(byte[] b, int off, int len) will return a string like "fals" instead of "false" if there's a pause in the read of the pdf file being processed. (The PDF file that caused this problem can't be shared because it contains customer data.) 
> The solution is to try the read again to read again until either enough bytes have been read or an EOF has been reached, in which case the read files should be returned. Adding the function override, below, to org.pdfbox.io.PushBackInputStream fixes the problem.
> I rated this bug Major because, though it's a show stopper when it happens, I suspect it's quite rare. But, in a production system, it matters.
> -------------------------------------
>     /**
>      * Reads up to <code>len</code> bytes of data from this input stream into
>      * an array of bytes.  This method first reads any pushed-back bytes; after
>      * that, if fewer than <code>len</code> bytes have been read then it
>      * reads from the underlying input stream.  This method blocks until the
>      * requested number of bytes have been read, or until the end of the stream
>      * has been reached in which case it returns the number of bytes actually 
>      * read, or -1 if zero bytes were read.
>      * 
>      * This overridden function enables <tt>org.pdfbox.pdfparser.BaseParser</tt>
>      * to be assured that it has the entire string it's checking for (typically
>      * "true" or "false" instead of returning a part of the string due to a 
>      * pause in the underlying stream read.
>      *
>      * @param      b     the buffer into which the data is read.
>      * @param      off   the start offset of the data.
>      * @param      len   the maximum number of bytes read.
>      * @return     the total number of bytes read into the buffer, or
>      *             <code>-1</code> if there is no more data because the end of
>      *             the stream has been reached.
>      * @exception  IOException  if an I/O error occurs.
>      * @see        java.io.PushbackInputStream#read(byte[], int, int)
>      */
>     public int read(byte[] b, int off, int len) throws IOException {
>         int bytesRead = super.read(b, off, len);
>         /* if we received the expected number of bytes, or an EOF, return what we got: */
>         if ((bytesRead == len) || (bytesRead == -1)){
>             return bytesRead;
>         }
>         
>         int byteRead = 0;
>         while (bytesRead < len){
>             /* if we're missing some bytes, read them one at a time
>                 until we have the required number or an EOF is read. */
>             byteRead = super.read();
>             if (byteRead == -1){
>                 /* If it's an EOF, return what we got and report the EOF
>                     on the next read: */
>                 return bytesRead;
>             }
>             /* Add the byte to the array and loop. */
>             b[bytesRead] = (byte)byteRead;
>             bytesRead++;
>         }
>         /* Report the full read complete: */
>         return bytesRead;
>     }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.