You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@pdfbox.apache.org by Sent <re...@yahoo.de> on 2011/09/11 18:18:20 UTC

Feature request

Hello,

I don't see a way to conclude from the extracted text at which page a certain text/keyword is located. I have written a script that will use -startPage and -endPage with the same number so that I get only one page. However, doing this 100 times in a row for a 100 page document is very slow.

Would it be possible to add an option that will add a page number indicator, e.g. <PAGENUM=n> at the beginning of each page during text extraction?

Thank you for considering this.

Cheers
Ralf

Re: Feature request

Posted by Andreas Lehmkuehler <an...@lehmi.de>.

Hi,

Am 11.09.2011 18:18, schrieb Sent:
> Hello,
>
> I don't see a way to conclude from the extracted text at which page a certain text/keyword is located. I have written a script that will use -startPage and -endPage with the same number so that I get only one page. However, doing this 100 times in a row for a 100 page document is very slow.
>
> Would it be possible to add an option that will add a page number indicator, e.g.<PAGENUM=n>  at the beginning of each page during text extraction?
>
> Thank you for considering this.
Hmm, probably it'll help if you define your own page separator using 
PDFTextStripper#getPageSeparator. Ok, you can't include the pagenumber but as a 
starter ...

> Cheers
> Ralf

BR
Andreas Lehmkühler