You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by "Maruan Sahyoun (JIRA)" <ji...@apache.org> on 2013/05/06 22:32:15 UTC

[jira] [Closed] (PDFBOX-1510) PDF gets corrupted when extracting it from the embedded files

     [ https://issues.apache.org/jira/browse/PDFBOX-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Maruan Sahyoun closed PDFBOX-1510.
----------------------------------

    Resolution: Won't Fix

PDDocument.load won't be fixed as PDDocument.loadNonSeq was introduced parse PDFs closer in line with the spec follwing the Xref information. PDFBOX-1581 tracks InputStream as a parameter for PDDocument.loadNonSeq
                
> PDF gets corrupted when extracting it from the embedded files
> -------------------------------------------------------------
>
>                 Key: PDFBOX-1510
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1510
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 1.7.1
>            Reporter: Andriy
>            Priority: Critical
>         Attachments: doesnt_work.pdf, PDFEmbeddedFiles.java, works2.pdf
>
>
> When a PDF is attached to another PDF it gets corrupted when retrieved through PDEmbeddedFile.getByteArray() method call. For some reason the returned array has less data than the original file that has been attached to the PDF.
> This affects some of the documents and not another (see attachments for working/non-working files), source code reproducing the issue has been attached as well.
> Please note: the issue is not occurring when using PDDocument.loadNonSeq, it's on when using PDDocument.load

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira