You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@pdfbox.apache.org by Srinivaas_Venkatarayan <Sr...@mahindrasatyam.com> on 2011/10/12 13:42:04 UTC

Issue while extracting chinese chars from pdf

Hi,

I'm trying to extract the text contents of a PDF file and store it in a txt file using PDFBox (ver 1.6.0). I have issues extracting the content of a PDF that has Chinese characters in it. Attached is the PDF and the java code. I'm not sure what encoding is being used in this PDF. Can you pls help?

Thanks
Srini



________________________________
DISCLAIMER:
This email (including any attachments) is intended for the sole use of the intended recipient/s and may contain material that is CONFIDENTIAL AND PRIVATE COMPANY INFORMATION. Any review or reliance by others or copying or distribution or forwarding of any or all of the contents in this message is STRICTLY PROHIBITED. If you are not the intended recipient, please contact the sender by email and delete all copies; your cooperation in this regard is appreciated.