You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by "Amir H. Jadidinejad" <am...@yahoo.com.INVALID> on 2014/08/05 11:27:00 UTC

How to manage semi-space characters in PDFTextStripper?

In some right-to-left languages, compound words are separated using "semi-space" (please take a look at Unicode spaces). When the input document contains these words, PDFTextStripper neglects semi-space character and concatenates words together. 

Would you please give me some hint to extend which function of PDFTextStripper to manage semi-space characters?
Kind regards,
Amir