You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by Graham Statham <gr...@hotmail.com> on 2008/08/19 17:18:19 UTC

Question marks in PDFTextStripper output

The string created by "stripper.writeText(pdfDoc, writer)" converts  
"(e.g., x + y = 7 and x − y = 1)" to  
"(e.g., x + y = 7 and x ? y = 1)".  
 
The minus (-) was changed to a question mark (?). I've identified the font of the minus as Adobe Garamond. I've found a variety of Garamond ttf's but none seem to work. 
 
I loaded the ttfs using: 
PDFont font = PDTrueTypeFont.loadTTF( pdfDoc, new File("garamond.ttf" ) ); 
 
As you can see from my example I'm trying to parse mathematical equations. For the most part everything parses nicely, but when every there is a character the parser doesn't recognize it returns a question mark. 
 
Any suggestions? 
 
Thanks, 
-Graham