You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@pdfbox.apache.org by Yogesh <yo...@gmail.com> on 2011/02/07 23:29:00 UTC

Text extracted from only 1st page, not the rest

Hello,

I am trying to extract Text from PDFs, mostly scientific literature. Average
number of pages the documents have is 10.
When I run the extraction code, I get text for only the 1st page. For the
rest, I get the following error

Feb 7, 2011 5:18:13 PM org.apache.pdfbox.pdmodel.font.PDSimpleFont
extractToUnicodeEncoding
SEVERE: Error: Could not load embedded CMAP
The handle is invalid

What might be wrong. Please help. Thanks

-Yogesh