You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Ga...@sungard.com on 2015/02/18 18:05:26 UTC

How to Split PDF based on contents inside

I have 1 GB long in size PDF document. There are plenty of defect / issue data in this PDF. Every defect has number. I want to break down this large PDF into multiple smaller PDFs so that I can have one pdf for one separate defect ID.

I am looking for functionality that will allow me to pass a search string to look for inside large PDF document.
And to break it down based on start and end of this defect ID.

Can someone please suggest how this can be achieved using PDFBox?

Thanks
Ganesh

Re: How to Split PDF based on contents inside

Posted by Gilad Denneboom <gi...@gmail.com>.
AFAIK, there's no easy, out-of-the-box way of that doing that with PDFBox.
You would need to develop your own code to identify the text you're after
and then extract the pages that are associated with it as new files. The
way to do that would depend a lot on how the files are set up.

I've developed various such tools in the past for my customers, so if
you're interested in someone developing it for you feel free to contact me
privately.

On Wed, Feb 18, 2015 at 7:05 PM, <Ga...@sungard.com> wrote:

> I have 1 GB long in size PDF document. There are plenty of defect / issue
> data in this PDF. Every defect has number. I want to break down this large
> PDF into multiple smaller PDFs so that I can have one pdf for one separate
> defect ID.
>
> I am looking for functionality that will allow me to pass a search string
> to look for inside large PDF document.
> And to break it down based on start and end of this defect ID.
>
> Can someone please suggest how this can be achieved using PDFBox?
>
> Thanks
> Ganesh
>