You are viewing a plain text version of this content. The canonical link for it is here.

Posted to fop-users@xmlgraphics.apache.org by ruud grosmann <r....@gmail.com> on 2010/06/01 20:16:57 UTC

problem with unicode stacks using Tibetan

hi list,

I have installed debian package 1:0.95.dfsg
and use the TibetanMachineUni.ttf font from debian package ttf-tmuni 1.901b-1

I have a problem with this font when I use fop to create a pdf. In
tibetan, letters can be stacked. Unicode supports this; when I put
&#x0F62;&#3984; in a HTML page, I get the result I expect. But when I
use fop, it displays a wrong version of it. Fop seems to select a
wrong character.

I have enclosed the fo source to test this, the resulting pdf and two
screen shots from a font editor.

Does anybody know why this happens?

attachments: rka.png contains the letter I expect (browsers show this one);
rka3.png contains the letter that is displayed instead. tib.fo is the
source of the test, tib.pdf the output.

thanks in advance, Ruud

Re: problem with unicode stacks using Tibetan

Posted by ruud grosmann <r....@gmail.com>.

OK, Pascal,

I see. Thanks for this explanation. In the meantime I'll take care of
the shaping in the style sheet myself. That is, I will make a table of
the shaped characters and use them instead of character stacking. I
don't know if this will be a font specific solution. If not, I can put
the table in this thread when I finish it so it can be reused.

regards, Ruud

On 02/06/2010, Pascal Sancho <pa...@takoma.fr> wrote:
> According to [1]:
>
> uni0F620F90 is a mix of uni0F62 plus uni0F90, IOW:
> RA plus SUBJOINED KA
>
>  and the other uni0F6A0F90 is a mix of uni0F6A plus uni0F90, IOW:
> FIXED-FORM RA plus SUBJOINED KA
>
> The latter shows no character shaping and should only be used for
> transliteration or transcription.
> Since current FOP doesn't implement character shaping, the only layout
> you get is the one with no character shaping.
>
> I have not sufficient knowledge to determine if the initial text should
> take Tibetan letters in a reduced range (0F00-0F6F) or not, delegating
> to user-agent (= FOP) the character selection, depending on its place in
> the word. But I think that a such mechanism should be implemented.

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org

Re: problem with unicode stacks using Tibetan

Posted by Pascal Sancho <pa...@takoma.fr>.

According to [1]:

uni0F620F90 is a mix of uni0F62 plus uni0F90, IOW:
RA plus SUBJOINED KA

 and the other uni0F6A0F90 is a mix of uni0F6A plus uni0F90, IOW:
FIXED-FORM RA plus SUBJOINED KA

The latter shows no character shaping and should only be used for
transliteration or transcription.
Since current FOP doesn't implement character shaping, the only layout
you get is the one with no character shaping.

I have not sufficient knowledge to determine if the initial text should
take Tibetan letters in a reduced range (0F00-0F6F) or not, delegating
to user-agent (= FOP) the character selection, depending on its place in
the word. But I think that a such mechanism should be implemented.

[1] http://www.unicode.org/charts/PDF/U0F00.pdf

Pascal

Le 02/06/2010 12:46, ruud grosmann a écrit :
> Hi Pascal,
>
> on the other hand: when a ra character is followed by a low ka character, fop
> - recognizes they are belonging to a stack (one result character)
> - creates a more or less sensible result, namely  a ra on a ka.
>
> Does fop just put them on top of each other, using two characters of
> the font, or does it select from the font a combined character? In
> that case, it might be so that it just picks the wrong ra (the
> topmost). From the screenshots one can see that one has the name
> uni0F620F90 and the other uni0F6A0F90. If fop uses these names to
> select the appropriate character, then it might be a simple fix.
>
> regards, Ruud
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org

Re: problem with unicode stacks using Tibetan

Posted by ruud grosmann <r....@gmail.com>.

Hi Pascal,

on the other hand: when a ra character is followed by a low ka character, fop
- recognizes they are belonging to a stack (one result character)
- creates a more or less sensible result, namely  a ra on a ka.

Does fop just put them on top of each other, using two characters of
the font, or does it select from the font a combined character? In
that case, it might be so that it just picks the wrong ra (the
topmost). From the screenshots one can see that one has the name
uni0F620F90 and the other uni0F6A0F90. If fop uses these names to
select the appropriate character, then it might be a simple fix.

regards, Ruud

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org

Re: problem with unicode stacks using Tibetan

Posted by Pascal Sancho <pa...@takoma.fr>.

Reading back your initial message:
Character stacking seems to be correct in your PDF, the problem is
definitively the lake of character shaping.

--
Pascal

Le 02/06/2010 09:59, Pascal Sancho a écrit :
> Hi,
> After some googlezing:
> reading [1]
> &#x0f40; is the consonant KA.
> IIUC, consonants can be stacked; in this case, KA becomes the sujoined
> consonant &#x0f90; (AKA &#3984;).
>
> reading [2]
> in addition, there can be a character shaping mechanism, probably as Arabic.
>
> Unfortunately, FOP supports neither character stacking nor character
> shaping.
>
> Note that some contributors are working on implementing Arabic (witch
> needs character shaping and right-to-left mode) (see [3]).
>
> That said, any help is welcome to extend FOP support for not Latin scripts.
>
> [1] http://www.unicode.org/charts/PDF/U0F00.pdf
> [2]
> http://www.thlib.org/tools/#wiki=/access/wiki/site/26a34146-33a6-48ce-001e-f16ce7908a6a/encoding%20model%20of%20the%20tibetan%20script%20in%20the%20ucs.html
> [3] https://issues.apache.org/bugzilla/show_bug.cgi?id=32789
>
> --
> Pascal
>
> Le 01/06/2010 20:16, ruud grosmann a écrit :
>   
>> hi list,
>>
>> I have installed debian package 1:0.95.dfsg
>> and use the TibetanMachineUni.ttf font from debian package ttf-tmuni 1.901b-1
>>
>> I have a problem with this font when I use fop to create a pdf. In
>> tibetan, letters can be stacked. Unicode supports this; when I put
>> &#x0F62;&#3984; in a HTML page, I get the result I expect. But when I
>> use fop, it displays a wrong version of it. Fop seems to select a
>> wrong character.
>>
>> I have enclosed the fo source to test this, the resulting pdf and two
>> screen shots from a font editor.
>>
>> Does anybody know why this happens?
>>
>> attachments: rka.png contains the letter I expect (browsers show this one);
>> rka3.png contains the letter that is displayed instead. tib.fo is the
>> source of the test, tib.pdf the output.
>>
>> thanks in advance, Ruud
>>   
>>     
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
> For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org
>
> .
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org

Re: problem with unicode stacks using Tibetan

Posted by Pascal Sancho <pa...@takoma.fr>.

Hi,
After some googlezing:
reading [1]
&#x0f40; is the consonant KA.
IIUC, consonants can be stacked; in this case, KA becomes the sujoined
consonant &#x0f90; (AKA &#3984;).

reading [2]
in addition, there can be a character shaping mechanism, probably as Arabic.

Unfortunately, FOP supports neither character stacking nor character
shaping.

Note that some contributors are working on implementing Arabic (witch
needs character shaping and right-to-left mode) (see [3]).

That said, any help is welcome to extend FOP support for not Latin scripts.

[1] http://www.unicode.org/charts/PDF/U0F00.pdf
[2]
http://www.thlib.org/tools/#wiki=/access/wiki/site/26a34146-33a6-48ce-001e-f16ce7908a6a/encoding%20model%20of%20the%20tibetan%20script%20in%20the%20ucs.html
[3] https://issues.apache.org/bugzilla/show_bug.cgi?id=32789

--
Pascal

Le 01/06/2010 20:16, ruud grosmann a écrit :
> hi list,
>
> I have installed debian package 1:0.95.dfsg
> and use the TibetanMachineUni.ttf font from debian package ttf-tmuni 1.901b-1
>
> I have a problem with this font when I use fop to create a pdf. In
> tibetan, letters can be stacked. Unicode supports this; when I put
> &#x0F62;&#3984; in a HTML page, I get the result I expect. But when I
> use fop, it displays a wrong version of it. Fop seems to select a
> wrong character.
>
> I have enclosed the fo source to test this, the resulting pdf and two
> screen shots from a font editor.
>
> Does anybody know why this happens?
>
> attachments: rka.png contains the letter I expect (browsers show this one);
> rka3.png contains the letter that is displayed instead. tib.fo is the
> source of the test, tib.pdf the output.
>
> thanks in advance, Ruud
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org