You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Lorena Leishman <lo...@yahoo.com.INVALID> on 2015/06/05 16:55:23 UTC

PDFTextStripper question

Is there a way to use PDFTextStripper and return the text in the position they were at in the pdf? or Is there a way to return the position where words were at?
Lorena 

Re: PDFTextStripper question

Posted by John Hewson <jo...@jahewson.com>.
If you’re also interested in getting the bounding boxes of individual glyphs then check out:

https://github.com/apache/pdfbox/blob/trunk/examples/src/main/java/org/apache/pdfbox/examples/rendering/CustomPageDrawer.java <https://github.com/apache/pdfbox/blob/trunk/examples/src/main/java/org/apache/pdfbox/examples/rendering/CustomPageDrawer.java>

— John

> On 5 Jun 2015, at 09:31, Lorena Leishman <lo...@yahoo.com.INVALID> wrote:
> 
> I'll do. Thanks! 
>      From: Tilman Hausherr <TH...@t-online.de>
> To: users@pdfbox.apache.org 
> Sent: Friday, June 5, 2015 10:13 AM
> Subject: Re: PDFTextStripper question
> 
> Yes, see the PrintTextLocations.java example.
> 
> See also
> https://stackoverflow.com/questions/11873801/using-pdfbox-to-determine-the-coordinates-of-words-in-a-document
> https://stackoverflow.com/questions/16579146/pdfbox-1-8-printtextlocations-wrong-textposition-height-for-a-multi-page-pdf
> https://stackoverflow.com/questions/21207943/pdfbox-text-extraction-with-bold-italic-info-does-not-work-on-some-files
> for possible problems / solutions.
> 
> Tilman
> 
> 
> 
> Am 05.06.2015 um 16:55 schrieb Lorena Leishman:
>> Is there a way to use PDFTextStripper and return the text in the position they were at in the pdf? or Is there a way to return the position where words were at?
>> Lorena
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
> 
> 
> 


Re: PDFTextStripper question

Posted by Lorena Leishman <lo...@yahoo.com.INVALID>.
I'll do. Thanks! 
      From: Tilman Hausherr <TH...@t-online.de>
 To: users@pdfbox.apache.org 
 Sent: Friday, June 5, 2015 10:13 AM
 Subject: Re: PDFTextStripper question
   
Yes, see the PrintTextLocations.java example.

See also
https://stackoverflow.com/questions/11873801/using-pdfbox-to-determine-the-coordinates-of-words-in-a-document
https://stackoverflow.com/questions/16579146/pdfbox-1-8-printtextlocations-wrong-textposition-height-for-a-multi-page-pdf
https://stackoverflow.com/questions/21207943/pdfbox-text-extraction-with-bold-italic-info-does-not-work-on-some-files
for possible problems / solutions.

Tilman



Am 05.06.2015 um 16:55 schrieb Lorena Leishman:
> Is there a way to use PDFTextStripper and return the text in the position they were at in the pdf? or Is there a way to return the position where words were at?
> Lorena


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org



  

Re: PDFTextStripper question

Posted by Tilman Hausherr <TH...@t-online.de>.
Yes, see the PrintTextLocations.java example.

See also
https://stackoverflow.com/questions/11873801/using-pdfbox-to-determine-the-coordinates-of-words-in-a-document
https://stackoverflow.com/questions/16579146/pdfbox-1-8-printtextlocations-wrong-textposition-height-for-a-multi-page-pdf
https://stackoverflow.com/questions/21207943/pdfbox-text-extraction-with-bold-italic-info-does-not-work-on-some-files
for possible problems / solutions.

Tilman

Am 05.06.2015 um 16:55 schrieb Lorena Leishman:
> Is there a way to use PDFTextStripper and return the text in the position they were at in the pdf? or Is there a way to return the position where words were at?
> Lorena


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org