You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Thomas Sörensen (JIRA)" <ji...@apache.org> on 2014/01/10 13:52:51 UTC

[jira] [Comment Edited] (PDFBOX-1618) Split PDF file to single page files, some files are inflated in size

    [ https://issues.apache.org/jira/browse/PDFBOX-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13867714#comment-13867714 ] 

Thomas Sörensen edited comment on PDFBOX-1618 at 1/10/14 12:50 PM:
-------------------------------------------------------------------

Hi 

So I noticed that each splitted pdf page that was large contained a link to another page.
So i tried removing all annotations from each page. PDPage.setAnnotations(emptyList).
The total size went down from over 100Mb to 17Mb. The original file is 3MB.
If I then try merge the pages again none of the links work anymore of course.
Can someone give an explanation for this?

EDIT:
I added compression filter to each page stream that decreased the totalsize of the splited files from 17MB to 9MB. page.getContents().addCompression()



was (Author: thomassorensen):
Hi 

So I noticed that each splitted pdf page that was large contained a link to another page.
So i tried removing all annotations from each page. PDPage.setAnnotations(emptyList).
The total size went down from over 100Mb to 17Mb. The original file is 3MB.
If I then try merge the pages again none of the links work anymore of course.
Can someone give an explanation for this?




> Split PDF file to single page files, some files are inflated in size
> --------------------------------------------------------------------
>
>                 Key: PDFBOX-1618
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1618
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 1.8.1
>         Environment: Windows 7, JVM 1.6.0_29
>            Reporter: Tom Taylor
>         Attachments: 112080-TECHNICAL MANUAL FOR GENERATOR NIR 7194 A-10LW OF 4038 KVA.pdf, Test_PDFs.zip, internalstructure.png
>
>
> A PDF file is split into single pages for inclusion within another document (pdfbox.utils.Splitter within our code but same phenomenon observed when splitting using command line PDFSplit tool). Som of the pages are almost as large as the original file which causes performance problems for our customers.
> Again, I have a sample pdf to attach.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)