You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Michael Klink (Jira)" <ji...@apache.org> on 2020/08/12 10:06:00 UTC
[jira] [Commented] (PDFBOX-4934) Could not find referenced cmap
stream Adobe-Japan1-XXXX
[ https://issues.apache.org/jira/browse/PDFBOX-4934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17176223#comment-17176223 ]
Michael Klink commented on PDFBOX-4934:
---------------------------------------
As far as I know the currently newest (highest) Adobe-Japan1 supplement number is 7, so defaulting to Adobe-Japan1-1 seems a bit restricting,
> Could not find referenced cmap stream Adobe-Japan1-XXXX
> -------------------------------------------------------
>
> Key: PDFBOX-4934
> URL: https://issues.apache.org/jira/browse/PDFBOX-4934
> Project: PDFBox
> Issue Type: Bug
> Components: FontBox
> Affects Versions: 2.0.20
> Environment: Windows10, 64bit
> Reporter: Shigeru Okada
> Priority: Major
> Attachments: JP.pdf, Korea.pdf
>
>
> The IOException exception occurs when attached pdf feeded into PDFBox.
> The attached pdf (JP.pdf) file include Adobe-Japan1-65534 cmap.
> source code is as below.
> ---
> {code:java}
> import javax.imageio.ImageIO;
> import org.apache.commons.io.FileUtils;
> import org.apache.pdfbox.pdmodel.PDDocument;
> import org.apache.pdfbox.pdmodel.PDPage;
> import org.apache.pdfbox.rendering.ImageType;
> import org.apache.pdfbox.rendering.PDFRenderer;
> import org.apache.pdfbox.text.PDFTextStripper;
> import org.apache.pdfbox.text.TextPosition;
> public class pdfBoxTest {
> public static void main(String[] args) throws Exception {
> pdfBoxTest sample = new pdfBoxTest();
> String pdfname = "D:/tmp/jp.pdf";
> File pdf = FileUtils.getFile(pdfname);
> sample.extractTextFromPDF(pdf);
> sample.load(pdf);
> }
> public void load(File pdf) throws Exception {
> PDDocument document = PDDocument.load(pdf);
> PDFRenderer renderer = new PDFRenderer(document);
> BufferedImage bufImage = renderer.renderImageWithDPI(0, 300, ImageType.RGB);
> ImageIO.write(bufImage, "jpg", new File("D:/tmp/jp.jpg"));
> }
> }
> {code}
> getExternalCMap mehod in CMapParse.class tries to find external CMap, but
> it couldn't find Japan1-65534 and throws exception.
> I know that there is no such a CMap, but it is no problem to open this PDF file,
> so I think it is better not to throw exception and use another CMap.
> I modified source code as below temporarily. it works well.
> {code:java}
> protected InputStream getExternalCMap(String name) throws IOException {
> InputStream is = this.getClass().getResourceAsStream(name);
> if(is == null) {
> if(name.startsWith("Adobe-Japan1")) {
> name = "Adobe-Japan1-1";
> } else if(name.startsWith("Adobe-Korea1")) {
> name = "Adobe-Korea1-1";
> }
> is = this.getClass().getResourceAsStream(name);
> if(is == null) {
> throw new IOException("Error: Could not find referenced cmap stream " + name);
> }
> }
> return is;
> }
> {code}
> But it is not essential one.
> If possiblećI would like to ask you to modify source code not to throw exception if
> it cannot find Cmap.
> I found another Korean pdf file, it includes Adode-Korea1-3 Cmap.
> Please refer to attached file.
> Thanks!
> //Okada
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org