You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Maruan Sahyoun (JIRA)" <ji...@apache.org> on 2013/05/03 17:38:16 UTC

[jira] [Commented] (PDFBOX-1582) Issues with available() and skip() on RandomAccessFileInputStream

    [ https://issues.apache.org/jira/browse/PDFBOX-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648501#comment-13648501 ] 

Maruan Sahyoun commented on PDFBOX-1582:
----------------------------------------

Hi,

thanks for providing these tests - a very useful exercise. Running your tests and some others RandomAccessFileInputStream behaves inconsistently if during instantiation you pass an offset and a length which 'reads' behind the actual length of the file. e.g. if you pass in = new RandomAccessFileInputStream(raFile, 98, 2) instead of in = new RandomAccessFileInputStream(raFile, 98, 10) available() always returns 0 at EOF and skip() also (for skip that's different to java.io.FileInputStream btw.).

So from that I think you would expect available() to return 0 after EOF and skip() too but the behavior for handling the instantiation for cases where offset,length go beyond the actual data needs some review. In addition maybe skip() could be changed to be inline with java.io.FIS.

Having said all that this may or may not be the root cause of the issues you are getting. Could you open another case explaining what you are trying to achieve, sample code and sample PDF(s) to review that.

BR
Maruan
 
                
> Issues with available() and skip() on RandomAccessFileInputStream
> -----------------------------------------------------------------
>
>                 Key: PDFBOX-1582
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1582
>             Project: PDFBox
>          Issue Type: Test
>          Components: Parsing
>    Affects Versions: 1.8.1
>            Reporter: Fredrik Kjellberg
>            Priority: Minor
>         Attachments: TestRandomAccessFileInputStream_diff.txt
>
>
> I'm trying to track down a strange bug when parsing PDF files on the IBM JDK that sometimes is giving me stack traces from RandomAccessFile classes. I started by writing unit tests for the PDFBox classes to verify their behavior and found a few issues. Can someone more familiar with the PDFBox code base please check the unit test I wrote and give advise on how it is supposed to work? I've added a TODO for each line where I'm in doubt what should be returned.
> This unit test is for RandomAccessFileInputStream where I've found a few issues. The first is what available() is supposed to return if the input stream tries to go beyond the EOF of the underlying file? When reading single bytes it count down while still returning -1 and when reading a buffer, it is returning what it think is left. The JDK documentation states that available() may not return the absolute truth, so perhaps returning what it think is left is okay, but it shouldn't count down in single reads beyond EOF? Maybe it should be set to zero once a read beyond the EOF is detected?
> Another issue is with skip() where the JDK documentation states that it should return the actual number of bytes skipped. When skipping beyond the EOF of the file, it does not return the actual number of skipped bytes. Also the underlying file is not updated with the new position. Is this correct behavior?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira