You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@pdfbox.apache.org by huangchangan <hu...@cninfo.com.cn> on 2011/04/27 09:17:17 UTC

Why the text content extracted by PDFBOX is not as the same as it is displayed in Adobe reader

Hello, 

I useed pdfbox extract text content from the PDF document in the appendix, founded the extracted text is "年预" but the text displayed in Adobe reader is "年期".  I want to know how to get the correct text content (as Adobe reader showing) from this kind of PDF documents by PDFBOX.

Thank you! 



 huang, changan
Department of Data Department 
Shenzhen Securities Information Co., Ltd. 6/F, 10 Building, Shangbu Industrial Zone, 
Hongli West Rd.,Futian,Shenzhen, P.R. China 518028 
Tel：86-0755-83990104

Re: Why the text content extracted by PDFBOX is not as the same as it is displayed in Adobe reader

Posted by Andreas Lehmkuehler <an...@lehmi.de>.

Hi,

Am 28.04.2011 01:47, schrieb Zachary Mitchell:
> How may I unsubscribe from this users email list, for PDFBOX?

Send a mail to users-unsubscribe@pdfbox.apache.org and be sure to use the mail
adress which is subscribed to the list. See [1] for further details.

BR
Andreas Lehmkühler

[1] http://pdfbox.apache.org/mail-lists.html

Re: Why the text content extracted by PDFBOX is not as the same as it is displayed in Adobe reader

Posted by Zachary Mitchell <za...@internode.on.net>.

How may I unsubscribe from this users email list, for PDFBOX?