You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by Graham Statham <gr...@hotmail.com> on 2008/08/19 17:18:19 UTC
Question marks in PDFTextStripper output
The string created by "stripper.writeText(pdfDoc, writer)" converts
"(e.g., x + y = 7 and x − y = 1)" to
"(e.g., x + y = 7 and x ? y = 1)".
The minus (-) was changed to a question mark (?). I've identified the font of the minus as Adobe Garamond. I've found a variety of Garamond ttf's but none seem to work.
I loaded the ttfs using:
PDFont font = PDTrueTypeFont.loadTTF( pdfDoc, new File("garamond.ttf" ) );
As you can see from my example I'm trying to parse mathematical equations. For the most part everything parses nicely, but when every there is a character the parser doesn't recognize it returns a question mark.
Any suggestions?
Thanks,
-Graham