You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Volker Kunert (Jira)" <ji...@apache.org> on 2020/09/04 15:56:00 UTC

[jira] [Comment Edited] (PDFBOX-4951) Sequences with combining letters are rendered incorrectly

    [ https://issues.apache.org/jira/browse/PDFBOX-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17190776#comment-17190776 ] 

Volker Kunert edited comment on PDFBOX-4951 at 9/4/20, 3:55 PM:
----------------------------------------------------------------

Appearently Notepad++ does not render sequences of Unicode characters correctly.
We have one base Unicode letter and one or more combining Unicode 
letters which have to be combined by the rendering software.
In my understanding the combination with the preceding base letter is
the sole purpose of combining characters.
If composition is not intended, no combining character should be used.
E.g. instead of the combining letter 030B COMBINING DOUBLE ACUTE ACCENT
you would use the non combining letter 02DD DOUBLE ACUTE ACCENT.
You could also apply the combining letter to U+00A0 NO-BREAK SPACE.

See also:
https://www.unicode.org/versions/Unicode13.0.0/ch02.pdf
	2.11 Combining Characters
http://unicode.org/faq/char_combmark.html




was (Author: vk.lists@gmail.com):
Appearently Notepad++ does not render sequences of Unicode characters correctly.
We have one base Unicode letter and one or more combining Unicode 
letters which have to be combined by the rendering software.
In my understanding the combination with the preceding base letter is
the sole purpose of combining characters.
If composition is not intended, no combining character should be used.
E.g. instead of the combining letter 030B COMBINING DOUBLE ACUTE ACCENT
you would use the non combining letter 02DD DOUBLE ACUTE ACCENT.

See also:
https://www.unicode.org/versions/Unicode13.0.0/ch02.pdf
	2.11 Combining Characters
http://unicode.org/faq/char_combmark.html



> Sequences with combining letters are rendered incorrectly
> ---------------------------------------------------------
>
>                 Key: PDFBOX-4951
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4951
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Rendering
>    Affects Versions: 2.0.21
>            Reporter: Volker Kunert
>            Priority: Major
>         Attachments: DIN_SPEC_91379_Sequences-aa.pdf, DIN_SPEC_91379_Sequences-ab.pdf, DIN_SPEC_91379_Sequences-ac.pdf, DIN_SPEC_91379_Sequences.txt, TestPdfbox.java, pdfbox.pdf, screenshot-1.png
>
>
> Accented Letters composed of Unicode base letter and combining accent are rendered wrong. E.g. with 0041 030B LATIN CAPITAL LETTER A WITH COMBINING DOUBLE ACUTE ACCENT the accent appears at the right hand side of the letter A, not above the letter A.
> The position is wrong for most of the sequences defined in the following spec:
> DIN SPEC 91379: Characters in Unicode for the electronic processing of names and data 
>  exchange in Europe; with digital attachment
>  [https://www.xoev.de/downloads-2316#StringLatin]
>  [https://www.din.de/de/wdc-beuth:din21:301228458]
>  
> The correct rendering should look like the output of hb-view 2.6.8, see files DIN_SPEC_91379_Sequences*.pdf.
> The output of PDFBox is appended in pdfbox.pdf, which is created by running TestPdfbox.java. The sequences are read from file DIN_SPEC_91379_Sequences.txt.
>  
> Font used for testing: NotoSansMono-Regular.ttf, see [https://www.google.com/get/noto/] 
> download: [https://noto-website-2.storage.googleapis.com/pkgs/NotoSansMono-hinted.zip]
>  See also FOP-2969
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org