You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Supun Nakandala <su...@gmail.com> on 2013/06/20 12:26:19 UTC

Extracting local language (Sinhala Unicode) from a pdf

Hi,
I want to extract Sinhala (local language) from a pdf file. I am not
familiar with pdfbox. I would like to know whether is this possible and how
can I do it using pdfbox

Thank you.
Regards Supun

Re: Extracting local language (Sinhala Unicode) from a pdf

Posted by Andreas Lehmkuehler <an...@lehmi.de>.
Hi,


Am 20.06.2013 12:26, schrieb Supun Nakandala:
> Hi,
> I want to extract Sinhala (local language) from a pdf file. I am not
> familiar with pdfbox. I would like to know whether is this possible and how
> can I do it using pdfbox
I depends on the pdfs and the used kind of fonts. I suggest to give it a try.
There are some easy to use command line tools such as ExtractText, see [1]
for further details.

> Thank you.
> Regards Supun

BR
Andreas Lehmkühler

[1] http://pdfbox.apache.org/commandline/