You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (JIRA)" <ji...@apache.org> on 2015/02/05 13:14:35 UTC
[jira] [Closed] (PDFBOX-362) ZipException occuring upon importing a page

     [ https://issues.apache.org/jira/browse/PDFBOX-362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tilman Hausherr closed PDFBOX-362.
----------------------------------
       Resolution: Invalid
    Fix Version/s:     (was: 2.0.0)
         Assignee: Tilman Hausherr

The file is definitively corrupt. The "empty" file has a lot, including a corrupt stream. But don't believe us - I just checked it with PDF-Tools
http://www.pdf-tools.com/pdf/validate-pdfa-online.aspx
and although I didn't expect it to validate, I got several errors telling the file is corrupt, including the flate-compressed stream:
{quote}Validating file "real-empty-page.pdf" for conformance level pdfa-1b
The separator after an 'obj' must be an EOL. (2)
The property 'pdf:CreationDate--Text' is not defined in schema 'Adobe PDF Schema'.
The schema description for namespace 'pdfx:' (http://ns.adobe.com/pdfx/1.3/) is missing.
The required XMP property 'pdfaid:part' is missing.
The required XMP property 'pdfaid:conformance' is missing.
The required XMP property 'xap:CreateDate' for the document information entry 'CreationDate' is missing.
Error in Flate stream: data error.
The document does not conform to the requested standard.
The file format (header, trailer, objects, xref, streams) is corrupted.
The document's meta data is either missing or inconsistent or corrupt.
Done.{quote}

> ZipException occuring upon importing a page
> -------------------------------------------
>
>                 Key: PDFBOX-362
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-362
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 1.8.7, 2.0.0
>            Reporter: Jukka Zitting
>            Assignee: Tilman Hausherr
>         Attachments: real-empty-page.pdf
>
>
> [Issue from SourceForge]
> http://sourceforge.net/tracker/index.php?func=detail&aid=2019925&group_id=78314&atid=552832
> copying page from source document to other doc, if the source page has no content, a ZipException occurs.
> the following sample code exhibits the problem with such source pdf:
> {code}
> public class LoadSaveSample
> {
>     /**
>      * @param args holds the name of a file to copy
>      * @throws IOException
>      * @throws COSVisitorException
>      */
>     public static void main(String[] args) throws IOException, COSVisitorException
>     {
>         String name = args[0];
>         File file = new File(name);
>         System.out.println("loading file " + file.getPath());
>         PDDocument doc = PDDocument.load(file);
>         ClassLoader loader = doc.getClass().getClassLoader();
>         System.out.println("loader: " + loader);
>         try
>         {
>             PDDocument doc2 = new PDDocument();
>             List all = doc.getDocumentCatalog().getAllPages();
>             Iterator it = all.iterator();
>             while (true == it.hasNext())
>             {
>                 PDPage page = (PDPage) it.next();
>                 // now do the copy through import...
>                 PDPage imported = doc2.importPage(page);
>                 imported.setCropBox(page.findCropBox());
>                 imported.setMediaBox(page.findMediaBox());
>                 imported.setResources(page.findResources());
>                 imported.setRotation(page.findRotation());
>             }
>             String outName = file.getPath() + ".saved.pdf";
>             doc2.save(outName);
>             System.out.println("saved as " + outName);
>         }
>         finally
>         {
>             doc.close();
>         }
>     }
> }
> {code}
> (Edited on 8.6.14 by [~tilman] for clarity)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org