You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@pdfbox.apache.org by Ilya Sterin <st...@gmail.com> on 2018/12/27 21:55:28 UTC

Extracting a page

I'm trying to break a PDF down into individual pages.  Although it
functionally works, the pdf for each page ends up being almost the size of
the original PDF (250MB).  I've seen some references in deleting
annotations which might include links to other pages/resources.  I've tried
the below, but no luck.  Can someone let me know what I'm doing wrong?

(Below code is in Kotlin).  I've also tried using addPage vs. importPage,
since the later creates a deep copy.  Same result.

doc.pages.forEachIndexed { idx: Int, p: PDPage ->
            val newDoc = PDDocument()
            val newPage = newDoc.importPage(p)
            newPage.annotations = null
            newPage.resources = null
            newDoc.save("/tmp/$idx.pdf")
            newDoc.close()
        }

Re: Extracting a page

Posted by Marc Kaufman <ma...@eeph.com>.

Threads tie non-contiguous text blocks together in reading order (think 
of a magazine with articles continued on later pages)

On 12/30/2018 7:53 PM, Ilya Sterin wrote:
> Tilman, thanks for the reply.  I replied to you on stackoverflow, but
> thought I'd add here as well, in case someone else has the same problem and
> searches the archive.
>
> I figured this out. I had to remove the PDThreadBead (setThreadBeads) on
> each page after importPage and before saving it. I went through the process
> of elimination here. Page looks great. I'm not sure what a thread bead is
> (can't seem to find it anywhere on the web) and class docs don't help.
> Thanks!

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

Re: Extracting a page

Posted by Ilya Sterin <st...@gmail.com>.

Tilman, thanks for the reply.  I replied to you on stackoverflow, but
thought I'd add here as well, in case someone else has the same problem and
searches the archive.

I figured this out. I had to remove the PDThreadBead (setThreadBeads) on
each page after importPage and before saving it. I went through the process
of elimination here. Page looks great. I'm not sure what a thread bead is
(can't seem to find it anywhere on the web) and class docs don't help.
Thanks!

On Fri, Dec 28, 2018 at 12:53 AM Tilman Hausherr <TH...@t-online.de>
wrote:

> I could look at it but I'd need the PDF. Please upload to a sharehoster.
>
> Tilman
>
> Am 27.12.2018 um 22:55 schrieb Ilya Sterin:
> > I'm trying to break a PDF down into individual pages.  Although it
> > functionally works, the pdf for each page ends up being almost the size
> of
> > the original PDF (250MB).  I've seen some references in deleting
> > annotations which might include links to other pages/resources.  I've
> tried
> > the below, but no luck.  Can someone let me know what I'm doing wrong?
> >
> > (Below code is in Kotlin).  I've also tried using addPage vs. importPage,
> > since the later creates a deep copy.  Same result.
> >
> > doc.pages.forEachIndexed { idx: Int, p: PDPage ->
> >              val newDoc = PDDocument()
> >              val newPage = newDoc.importPage(p)
> >              newPage.annotations = null
> >              newPage.resources = null
> >              newDoc.save("/tmp/$idx.pdf")
> >              newDoc.close()
> >          }
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>

Re: Extracting a page

Posted by Tilman Hausherr <TH...@t-online.de>.

I could look at it but I'd need the PDF. Please upload to a sharehoster.

Tilman

Am 27.12.2018 um 22:55 schrieb Ilya Sterin:
> I'm trying to break a PDF down into individual pages.  Although it
> functionally works, the pdf for each page ends up being almost the size of
> the original PDF (250MB).  I've seen some references in deleting
> annotations which might include links to other pages/resources.  I've tried
> the below, but no luck.  Can someone let me know what I'm doing wrong?
>
> (Below code is in Kotlin).  I've also tried using addPage vs. importPage,
> since the later creates a deep copy.  Same result.
>
> doc.pages.forEachIndexed { idx: Int, p: PDPage ->
>              val newDoc = PDDocument()
>              val newPage = newDoc.importPage(p)
>              newPage.annotations = null
>              newPage.resources = null
>              newDoc.save("/tmp/$idx.pdf")
>              newDoc.close()
>          }
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org