You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Yogesh <yo...@gmail.com> on 2011/02/07 23:29:00 UTC
Text extracted from only 1st page, not the rest
Hello,
I am trying to extract Text from PDFs, mostly scientific literature. Average
number of pages the documents have is 10.
When I run the extraction code, I get text for only the 1st page. For the
rest, I get the following error
Feb 7, 2011 5:18:13 PM org.apache.pdfbox.pdmodel.font.PDSimpleFont
extractToUnicodeEncoding
SEVERE: Error: Could not load embedded CMAP
The handle is invalid
What might be wrong. Please help. Thanks
-Yogesh