You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (JIRA)" <ji...@apache.org> on 2015/02/21 13:28:12 UTC

[jira] [Commented] (PDFBOX-2690) Filesize becomes extremely large after saving

    [ https://issues.apache.org/jira/browse/PDFBOX-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14330194#comment-14330194 ] 

Tilman Hausherr commented on PDFBOX-2690:
-----------------------------------------

Workaround until saving in object streams is implemented: use qpdf with the option "--object-streams=generate".

> Filesize becomes extremely large after saving
> ---------------------------------------------
>
>                 Key: PDFBOX-2690
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2690
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Writing
>    Affects Versions: 1.8.8, 1.8.9, 2.0.0
>         Environment: PDFBox 1.8.8, Java8u25, Windows 8.1
>            Reporter: Brian Liu
>         Attachments: input2-after-save.pdf, input2.pdf
>
>
> I am using PDFBox 1.8.8 to manipulate existing PDF files. After saving a document, the output file becomes several times larger than the original. This is undesirable.
> *How to reproduce my problem:*
> In the following code, PDFBox simply loads an existing PDF and then save it. Nothing else is done. Yet the file size still becomes several times larger.
> {code}
> import java.io.*;
> import org.apache.pdfbox.pdmodel.*;
> import org.apache.pdfbox.exceptions.*;
> class Test 
> {
>     public static void main(String[] args) throws IOException, COSVisitorException {
>     PDDocument document = PDDocument.load("input2.pdf");
>     document.save("input2-after-save.pdf");
>     document.close();       
>     }
> }   
> {code}
> Attached are two sample PDF files.  input2.pdf is an original, unprocessed PDF.   input2-after-save.pdf is processed by the code above.  After processing, file size increases from 416kB to 1.25MB.
> *Possible reason:*
>  Tilman Hausherr suggests that there is an enormous amount of "structure" information / object stream that is compressed in the input file, but not in the output file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org