You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by "Adam Nichols (JIRA)" <ji...@apache.org> on 2011/01/19 23:52:48 UTC

[jira] Created: (PDFBOX-944) number of pages returns the incorrect number for some PDFs

number of pages returns the incorrect number for some PDFs
----------------------------------------------------------

Key: PDFBOX-944
URL: https://issues.apache.org/jira/browse/PDFBOX-944
Project: PDFBox
Issue Type: Bug
Reporter: Adam Nichols

This is a regression bug which appeared between 1.3.1 and 1.4.0, as the former returns the correct page count while the latter does not. Unfortunately, the PDF which demonstrates this problem is confidential, so I can not attach it here, however I will describe the things which may be causing this problem as best I can.

The problem does not occur after using the "uncompress" feature of pdftk. The problem does not occur after using PdfDecompressor from PDFBox. The original file which was given to me is Linearized. In Adobe Acrobat Standard -> File -> Properties, it says the Application was "Adobe Photoshop CS4 Windows", the PDF Producer was "Adobe Photoshop for Windows -- Image Conversion Plug-in" and the PDF Version is 1.7 (Acrobat 8.x). Fast Web View is set to "No". I suspect that the problem has to do with the fact that it's Lineraized or the fact that it uses ObjStm. I don't have enough time to trace through this, so I'm either going to revert back to PDFBox 1.3.1 or pre-process all the ObjStm objects, save the uncompressed file, and then process that. The latter is less efficient, but I think it'll handle more cases. I just wanted to make sure to open an issue here on JIRA so we can eventually get a proper solution to this problem.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] [Commented] (PDFBOX-944) number of pages returns the incorrect number for some PDFs

Posted by "Sheila Morrissey (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/PDFBOX-944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13454145#comment-13454145 ] 

Sheila Morrissey commented on PDFBOX-944:
-----------------------------------------

See similar problem in 1.6, 1.7 -- though NonSequentialParser class seems to handle same files correctly
                
> number of pages returns the incorrect number for some PDFs
> ----------------------------------------------------------
>
>                 Key: PDFBOX-944
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-944
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>            Reporter: Adam Nichols
>
> This is a regression bug which appeared between 1.3.1 and 1.4.0, as the former returns the correct page count while the latter does not.  Unfortunately, the PDF which demonstrates this problem is confidential, so I can not attach it here, however I will describe the things which may be causing this problem as best I can.
> The problem does not occur after using the "uncompress" feature of pdftk.  The problem does not occur after using PdfDecompressor from PDFBox.  The original file which was given to me is Linearized.  In Adobe Acrobat Standard -> File -> Properties, it says the Application was "Adobe Photoshop CS4 Windows", the PDF Producer was "Adobe Photoshop for Windows -- Image Conversion Plug-in" and the PDF Version is 1.7 (Acrobat 8.x).  Fast Web View is set to "No".  I suspect that the problem has to do with the fact that it's Lineraized or the fact that it uses ObjStm.  I don't have enough time to trace through this, so I'm either going to revert back to PDFBox 1.3.1 or pre-process all the ObjStm objects, save the uncompressed file, and then process that.  The latter is less efficient, but I think it'll handle more cases.  I just wanted to make sure to open an issue here on JIRA so we can eventually get a proper solution to this problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (PDFBOX-944) number of pages returns the incorrect number for some PDFs

Posted by "Adam Nichols (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/PDFBOX-944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adam Nichols updated PDFBOX-944:
--------------------------------

    Affects Version/s: 1.4.0

> number of pages returns the incorrect number for some PDFs
> ----------------------------------------------------------
>
>                 Key: PDFBOX-944
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-944
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>            Reporter: Adam Nichols
>
> This is a regression bug which appeared between 1.3.1 and 1.4.0, as the former returns the correct page count while the latter does not.  Unfortunately, the PDF which demonstrates this problem is confidential, so I can not attach it here, however I will describe the things which may be causing this problem as best I can.
> The problem does not occur after using the "uncompress" feature of pdftk.  The problem does not occur after using PdfDecompressor from PDFBox.  The original file which was given to me is Linearized.  In Adobe Acrobat Standard -> File -> Properties, it says the Application was "Adobe Photoshop CS4 Windows", the PDF Producer was "Adobe Photoshop for Windows -- Image Conversion Plug-in" and the PDF Version is 1.7 (Acrobat 8.x).  Fast Web View is set to "No".  I suspect that the problem has to do with the fact that it's Lineraized or the fact that it uses ObjStm.  I don't have enough time to trace through this, so I'm either going to revert back to PDFBox 1.3.1 or pre-process all the ObjStm objects, save the uncompressed file, and then process that.  The latter is less efficient, but I think it'll handle more cases.  I just wanted to make sure to open an issue here on JIRA so we can eventually get a proper solution to this problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.