You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Michael Klink (Jira)" <ji...@apache.org> on 2020/08/12 10:06:00 UTC

[jira] [Commented] (PDFBOX-4934) Could not find referenced cmap stream Adobe-Japan1-XXXX

    [ https://issues.apache.org/jira/browse/PDFBOX-4934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17176223#comment-17176223 ] 

Michael Klink commented on PDFBOX-4934:
---------------------------------------

As far as I know the currently newest (highest) Adobe-Japan1 supplement number is 7, so defaulting to Adobe-Japan1-1 seems a bit restricting, 

> Could not find referenced cmap stream Adobe-Japan1-XXXX
> -------------------------------------------------------
>
>                 Key: PDFBOX-4934
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4934
>             Project: PDFBox
>          Issue Type: Bug
>          Components: FontBox
>    Affects Versions: 2.0.20
>         Environment: Windows10, 64bit
>            Reporter: Shigeru Okada
>            Priority: Major
>         Attachments: JP.pdf, Korea.pdf
>
>
> The IOException exception occurs when attached pdf feeded into PDFBox.
> The attached pdf (JP.pdf) file include Adobe-Japan1-65534 cmap.
> source code is as below.
> ---
> {code:java}
> import javax.imageio.ImageIO;
> import org.apache.commons.io.FileUtils;
> import org.apache.pdfbox.pdmodel.PDDocument;
> import org.apache.pdfbox.pdmodel.PDPage;
> import org.apache.pdfbox.rendering.ImageType;
> import org.apache.pdfbox.rendering.PDFRenderer;
> import org.apache.pdfbox.text.PDFTextStripper;
> import org.apache.pdfbox.text.TextPosition;
> public class pdfBoxTest {
> 	public static void main(String[] args) throws Exception {
> 		pdfBoxTest sample = new pdfBoxTest();
> 		String pdfname = "D:/tmp/jp.pdf";
> 		File pdf = FileUtils.getFile(pdfname);
> 		sample.extractTextFromPDF(pdf);
> 		sample.load(pdf);
> 	}
> 	public void load(File pdf) throws Exception {
> 		PDDocument document = PDDocument.load(pdf);
> 		PDFRenderer renderer = new PDFRenderer(document);
> 		BufferedImage bufImage = renderer.renderImageWithDPI(0, 300, ImageType.RGB);
> 		ImageIO.write(bufImage, "jpg", new File("D:/tmp/jp.jpg"));
> 	}
> }
> {code}
> getExternalCMap mehod in CMapParse.class tries to find external CMap, but
> it couldn't find Japan1-65534 and throws exception.
> I know that there is no such a CMap, but it is no problem to open this PDF file,
> so I think it is better not to throw exception and use another CMap.
> I modified source code as below temporarily. it works well.
> {code:java}
> protected InputStream getExternalCMap(String name) throws IOException {
>       InputStream is = this.getClass().getResourceAsStream(name);
>        if(is == null) {
>           if(name.startsWith("Adobe-Japan1")) {
>              name = "Adobe-Japan1-1";
>           } else if(name.startsWith("Adobe-Korea1")) {
>              name = "Adobe-Korea1-1";
>           }
>           is = this.getClass().getResourceAsStream(name);
>           if(is == null) {
>              throw new IOException("Error: Could not find referenced cmap stream " + name);
>           }  
>       }
>        return is;
>  }
> {code}
> But it is not essential one.
> If possible态I would like to ask you to modify source code not to throw exception if
> it cannot find Cmap.
> I found another Korean pdf file, it includes Adode-Korea1-3 Cmap.
> Please refer to attached file.
> Thanks!
> //Okada



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org