You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (JIRA)" <ji...@apache.org> on 2010/11/24 21:24:17 UTC
[jira] Resolved: (PDFBOX-28) Spliiting a PDF creates unnecessarily large chunks

     [ https://issues.apache.org/jira/browse/PDFBOX-28?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Lehmkühler resolved PDFBOX-28.
--------------------------------------

       Resolution: Fixed
    Fix Version/s: 1.4.0

Copy only the page resources instead of copying all resources of a document for every page.

Fixed in revision 1038796

> Spliiting a PDF creates unnecessarily large chunks
> --------------------------------------------------
>
>                 Key: PDFBOX-28
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-28
>             Project: PDFBox
>          Issue Type: Bug
>            Assignee: Andreas Lehmkühler
>             Fix For: 1.4.0
>
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1052458
> Originally submitted by bryang1 on 2004-10-22 13:23.
> Using PDFBox 0.6.7a, some PDFs contain objects that are
> inherited when the PDF is split into a smaller document
> using the Splitter class (even if the children
> documents are compressed).
> The linked PDF splits into chunks approximately the
> same size as the original.  The first several pages
> will be smaller because I recreated them for debugging.
>  The rest of the document will reflect the problem
> however.  Try splitting after page 5, or at every page
> to recreate.
> PDF (13MB):
> http://esis.infofoundry.com:8080/audi/pdf/audi.ns.ssp.951903.pdf
> Opening and using the 'Save As' feature in Acrobat
> removes the unnecessary objects, but I can find no way
> to do this programmatically using PDFBox.
> Here are the messages from Acrobat when using 'Save As':
> "Consolidating duplicate images"
> "Consolidating duplicate page backgrounds"
> "Removeing unused objects and saving"
> Here is some sample code:
> // splitting:
> splitter.setSplitAtPage( split );
> documents = splitter.split( document );
> for( int i=0; i<documents.size(); i++ )
> {
>   PDDocument doc = (PDDocument)documents.get( i );
>   String fileName = pdfFile.substring(0,
> pdfFile.length()-4 ) + "-" + i + ".pdf";
>   writeCompressedDocument( doc, fileName );
> }
> // saving w/ compression:
> fileOut = new FileOutputStream( fileName );
> COSStream stream = new COSStream( 
>      doc.getDocument().getScratchFile() ); 
> OutputStream output = stream.createUnfilteredStream();
> int length = new
> Long(doc.getDocument().getScratchFile().length()).intValue();
> byte[] bytes = new byte[length];
> doc.getDocument().getScratchFile().readFully(bytes, 0,
> length);
> output.write(bytes);
> stream.setFilters( COSName.FLATE_DECODE );

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.