You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (JIRA)" <ji...@apache.org> on 2014/12/13 13:42:13 UTC

[jira] [Comment Edited] (PDFBOX-785) Spliting a PDF creates unnecessarily large files

    [ https://issues.apache.org/jira/browse/PDFBOX-785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14245308#comment-14245308 ] 

Tilman Hausherr edited comment on PDFBOX-785 at 12/13/14 12:41 PM:
-------------------------------------------------------------------

java -jar pdfbox-app-2.0.0-SNAPSHOT.jar PDFSplit -endPage 2300 Default_Table_Formatting-merged.pdf

brings a result file with size 36MB :-(

The content stream is not compressed.  Is there a reason that PDDocument.importPage() does not use compression for the content stream? With compression (adding dest.addCompression(); ), I get a size of 15MB.


was (Author: tilman):
java -jar pdfbox-app-2.0.0-SNAPSHOT.jar PDFSplit -endPage 2300 Default_Table_Formatting-merged.pdf

brings a result file with size 36MB :-(

The content stream is not compressed.  Is there a reason that PDDocument.importPage() does not use compression? With compression, I get a size of 15MB.

> Spliting a PDF creates unnecessarily large files
> ------------------------------------------------
>
>                 Key: PDFBOX-785
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-785
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Utilities
>    Affects Versions: 0.8.0-incubator, 1.1.0, 1.2.1
>         Environment: Windows XP, openOffice3.0.0, pdfsam
>            Reporter: mathieu radiguet
>            Assignee: Andreas Lehmkühler
>             Fix For: 2.0.0
>
>         Attachments: fileSizeIssue.zip
>
>
> Using PDFBox 0.8.0 (also tried on 1.1.0 and 1.2.1) to split files result in bigger parts than the original.
> Concerned files were made from openOffice .odt documents in version 3.0.0 using openOffice pdf Export and then merging several copies with pdfsam (http://www.pdfsam.org/)
> In joined eclipse project the test file size is 10 712 749  bytes for 2812 pages and the result file sizes after splitting in two at page 2300 are : 8 812 515  bytes and 10 701 142 bytes.
> Using pdfSplit in command line as result we have all single result files bigger than the original. An example is also attached. An error tells the original file is corrupted, but we tried it on a file (using pdfsam and without using it) with no error and with similar result, so I think it's not related. 
> This issue seems similar to PDFBOX-28.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)