You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Filip Bellander (JIRA)" <ji...@apache.org> on 2016/03/29 10:58:25 UTC

[jira] [Commented] (PDFBOX-922) True type PDFont subclass only supports WinAnsiEncoding (hardcoded!)

    [ https://issues.apache.org/jira/browse/PDFBOX-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15215705#comment-15215705 ] 

Filip Bellander commented on PDFBOX-922:
----------------------------------------

I tried to update to 2.0.0 today. Now suddenly my tests no longer works with the following error: {noformat}U+00A0 ('nbspace') is not available in this font's encoding: WinAnsiEncoding{noformat}

Worth noting here is that I don't have any of the Type1 fonts installed on my machine (I'm on a Linux-box and just haven't installed them). This results in the following information being printed before the tests are run (ie, when I start using PDFBox)

{noformat}
10:41:30.174 [main] WARN  o.a.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for base font Times-Roman
10:41:30.178 [main] WARN  o.a.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for base font Times-Bold
10:41:30.178 [main] WARN  o.a.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for base font Times-Italic
10:41:30.179 [main] WARN  o.a.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for base font Times-BoldItalic
10:41:30.179 [main] WARN  o.a.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for base font Helvetica
10:41:30.180 [main] WARN  o.a.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for base font Helvetica-Bold
10:41:30.180 [main] WARN  o.a.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for base font Helvetica-Oblique
10:41:30.181 [main] WARN  o.a.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for base font Helvetica-BoldOblique
10:41:30.181 [main] WARN  o.a.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for base font Courier
10:41:30.182 [main] WARN  o.a.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for base font Courier-Bold
10:41:30.183 [main] WARN  o.a.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for base font Courier-Oblique
10:41:30.184 [main] WARN  o.a.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for base font Courier-BoldOblique
10:41:30.199 [main] DEBUG o.a.p.p.font.FileSystemFontProvider - Loaded StandardSymL from /usr/share/fonts/Type1/s050000l.pfb
10:41:30.215 [main] DEBUG o.a.p.p.font.FileSystemFontProvider - Loaded Dingbats from /usr/share/fonts/Type1/d050000l.pfb
10:41:30.422 [main] WARN  o.a.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for Helvetica
Tests run: 10, Failures: 0, Errors: 6, Skipped: 0, Time elapsed: 0.968 sec <<< FAILURE!
{noformat}

This problem was not present in 1.8.11, so I'm wondering what's really going on here.
What this gets triggered on, from what I can tell, is when you do something like
{code:java}
pdFont.getStringWidth(StringEspaceUtils.unescapeHtml4("&nbsp;"));
{code}
That is at least what it fails on for me.
I dirty work-around would be to replace all non-breaking spaces with breaking spaces, but that defeats the purpose of having non-breaking ones.
Suggestions on how this might be solved?

> True type PDFont subclass only supports WinAnsiEncoding (hardcoded!)
> --------------------------------------------------------------------
>
>                 Key: PDFBOX-922
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-922
>             Project: PDFBox
>          Issue Type: New Feature
>          Components: Writing
>    Affects Versions: 1.3.1
>         Environment: JDK 1.6 / OS irrelevant, tried against 1.3.1 and 1.2.0
>            Reporter: Thanos Agelatos
>            Priority: Blocker
>             Fix For: 2.0.0
>
>         Attachments: pdfbox-unicode.diff, pdfbox-unicode2.diff
>
>
> PDFBox cannot embed Identity-H or Identity-V type TTF fonts in the PDF it creates, making it impossible to create PDFs in any language apart from English and ones supported in WinAnsiEncoding. This behaviour is caused because method PDTrueTypeFont.loadTTF has hardcoded WinAnsiEncoding inside, and there is no Identity-H or Identity-V Encoding classes provided (to set afterwards via PDFont.setFont() )
> This excludes the following languages plus many others:
> - Greek
> - Bulgarian
> - Swedish
> - Baltic languages
> - Malteze 
> The PDF created contains garbled characters and/or squares.
> Simple test case:
> {code}
>                 PDDocument doc = null;
> 		try {
> 			doc = new PDDocument();
> 			PDPage page = new PDPage();
> 			doc.addPage(page);
> 			// extract fonts for fields
> 			byte[] arialNorm = extractFont("arial.ttf");
> 			//byte[] arialBold = extractFont("arialbd.ttf"); 
> 			//PDFont font = PDType1Font.HELVETICA;
> 			PDFont font = PDTrueTypeFont.loadTTF(doc, new ByteArrayInputStream(arialNorm));
> 			
> 			PDPageContentStream contentStream = new PDPageContentStream(doc, page);
> 			contentStream.beginText();
> 			contentStream.setFont(font, 12);
> 			contentStream.moveTextPositionByAmount(100, 700);
> 			contentStream.drawString("Hello world from PDFBox ελληνικά"); // text here may appear garbled; insert any text in Greek or Bulgarian or Malteze
> 			contentStream.endText();
> 			contentStream.close();
> 			doc.save("pdfbox.pdf");
> 			System.out.println(" created!");
> 		} catch (Exception ioe) {
> 			ioe.printStackTrace();
> 		} finally {
> 			if (doc != null) {
> 				try { doc.close(); } catch (Exception e) {}
> 			}
> 		}
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org