You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@pdfbox.apache.org by Felix Hermann <fe...@gmx.de> on 2016/04/04 17:36:17 UTC

Aw: Re: Extract Text of Document with coordinates

thanks for the answer,

problem is: how can I get the extracted characters and coordinates with pdfbox version 2.0.0.

The example reffers to an older version of pdfbox.
 
 

Gesendet: Donnerstag, 31. März 2016 um 19:58 Uhr
Von: "Tilman Hausherr" <TH...@t-online.de>
An: users@pdfbox.apache.org
Betreff: Re: Extract Text of Document with coordinates
Am 31.03.2016 um 12:51 schrieb Felix Hermann:
> Hello,
>
> how can I extract the text + coordinates of a PDF document?
>
> To be more precise: I would like to extract all words of the document. And for each word I need the coordinates of this word.
>
> If PDFBox does not support this: How can I get the coordinates of each character?
>
> I tried to adapt the code of this example: https://gist.github.com/DavidYKay/82f20ba67c50c499ebb3

Yes, the printtextlocations (or DrawPrintTextLocations) example is a
good start. Look for the blanks and build words from there.

Tilman

> However, I was not successful, as I use the new PDFBox version. (2.0.0)
>
> Regards
>
> Felix
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org
 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

Re: Extract Text of Document with coordinates

Posted by Tilman Hausherr <TH...@t-online.de>.

Am 04.04.2016 um 17:36 schrieb Felix Hermann:
> thanks for the answer,
>
> problem is: how can I get the extracted characters and coordinates with pdfbox version 2.0.0.
>
> The example reffers to an older version of pdfbox.

printtextlocations (or DrawPrintTextLocations) are also available in the 2.0 version, although slightly changed.

Tilman



>   
>   
>
> Gesendet: Donnerstag, 31. März 2016 um 19:58 Uhr
> Von: "Tilman Hausherr" <TH...@t-online.de>
> An: users@pdfbox.apache.org
> Betreff: Re: Extract Text of Document with coordinates
> Am 31.03.2016 um 12:51 schrieb Felix Hermann:
>> Hello,
>>
>> how can I extract the text + coordinates of a PDF document?
>>
>> To be more precise: I would like to extract all words of the document. And for each word I need the coordinates of this word.
>>
>> If PDFBox does not support this: How can I get the coordinates of each character?
>>
>> I tried to adapt the code of this example: https://gist.github.com/DavidYKay/82f20ba67c50c499ebb3
> Yes, the printtextlocations (or DrawPrintTextLocations) example is a
> good start. Look for the blanks and build words from there.
>
> Tilman
>
>> However, I was not successful, as I use the new PDFBox version. (2.0.0)
>>
>> Regards
>>
>> Felix
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>   
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org