You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Renaud Billen <re...@nic.be> on 2015/01/06 11:59:47 UTC
Extraction of chinese characters
Hello,
fresh new user of pdfbox, I’ve got some problems extracting the text of pdfs with Chinese characters in it.
I use pdfbox from the command line with the command : java -jar C:/pdfbox-app.jar ExtractText C:/Test_Pdfbox.pdf C:/Test_Pdfbox.txt
Result text only contains question marks..
Here is the document :
Thanks for your help,
Renaud
Re: Extraction of chinese characters
Posted by Renaud Billen <re...@nic.be>.
Thanks a lot, works like a charm now :)
> Le 6 janv. 2015 à 12:14, Gilad Denneboom <gi...@gmail.com> a écrit :
>
> Try specifying the encoding parameter... See:
> https://pdfbox.apache.org/1.8/commandline.html#extractText
>
> On Tue, Jan 6, 2015 at 11:59 AM, Renaud Billen <re...@nic.be> wrote:
>
>> Hello,
>>
>> fresh new user of pdfbox, I’ve got some problems extracting the text of
>> pdfs with Chinese characters in it.
>>
>> I use pdfbox from the command line with the command : *java -jar
>> C:/pdfbox-app.jar ExtractText C:/Test_Pdfbox.pdf C:/Test_Pdfbox.txt*
>>
>> Result text only contains question marks..
>>
>>
>> Here is the document :
>>
>>
>>
>>
>>
>> Thanks for your help,
>> Renaud
>>
>>
Re: Extraction of chinese characters
Posted by Gilad Denneboom <gi...@gmail.com>.
Try specifying the encoding parameter... See:
https://pdfbox.apache.org/1.8/commandline.html#extractText
On Tue, Jan 6, 2015 at 11:59 AM, Renaud Billen <re...@nic.be> wrote:
> Hello,
>
> fresh new user of pdfbox, I’ve got some problems extracting the text of
> pdfs with Chinese characters in it.
>
> I use pdfbox from the command line with the command : *java -jar
> C:/pdfbox-app.jar ExtractText C:/Test_Pdfbox.pdf C:/Test_Pdfbox.txt*
>
> Result text only contains question marks..
>
>
> Here is the document :
>
>
>
>
>
> Thanks for your help,
> Renaud
>
>