You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-users@xmlgraphics.apache.org by Ryan Lortie <de...@desrt.ca> on 2008/08/30 12:52:33 UTC

# characters for Base-14 fonts

Hello.

I have a FO document that contains some symbols like ∀.  Nothing that's
not covered by a font in the Base-14 (like Symbol).

When I send this document through FOP to generate PDF output I get the
'#' characters showing up, accompanied by the following output:

30-Aug-2008 6:45:31 AM org.apache.fop.fonts.FontInfo
notifyFontReplacement
WARNING: Font 'Symbol,normal,700' not found. Substituting with
'Symbol,normal,400'.
30-Aug-2008 6:45:31 AM org.apache.fop.fonts.FontInfo
notifyFontReplacement
WARNING: Font 'ZapfDingbats,normal,700' not found. Substituting with
'ZapfDingbats,normal,400'.
30-Aug-2008 6:45:31 AM org.apache.fop.hyphenation.Hyphenator
getHyphenationTree
SEVERE: Couldn't find hyphenation pattern en


The expected behaviour is that the symbols (like ∀) would be put in the
PDF and marked as coming from 'Symbol'.

I'm on Ubuntu and I don't have fonts called "Symbol" or "ZapfDingbats"
installed.

I've searched for information about this problem and I've found a page
about FOP fonts and some mailing list archives.  The two things that
I've discovered are:

 - You need font metrics so FOP knows how to space stuff out[1].

 - For the Base-14 you shouldn't need font metrics because they're 
   already built-in[2].

This isn't consistent with the results that I'm encountering...

Can anyone explain to me what's going on here?

Cheers



[1] http://xmlgraphics.apache.org/fop/trunk/fonts.html
[2] http://www.mail-archive.com/fop-users@xmlgraphics.apache.org/msg09046.html


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: # characters for Base-14 fonts

Posted by Vincent Hennebert <vh...@gmail.com>.
Hi Ryan,

Ryan Lortie wrote:
> On Sun, 2008-08-31 at 22:22 +0200, Andreas Delmelle wrote:
>> Not sure what the cause of the issue is (yet), but as for a final  
>> try, the following seems to produce desirable results:
>>
>> <fo:block font-family="serif, Symbol">
>>    (<fo:character character="&#x2191;"/>1) (+2) (<fo:character  
>> character="&#x2200;"/>3) (text)
>> </fo:block>
> 
> This is a fascinating discovery.  It's very strange that that works
> properly.
> 
> In any case, I can probably write an XSLT template to scan all text
> regions for 'special' characters and emit markup like that for them.
> That's a sufficient workaround for my tastes.

If using a non-Base14 font is an option for you, then you may be able to
avoid that hassle. For instance, your example rendered fine with the
DejaVu Sans font:
    <fo:block font-family="DejaVuSans">(↑1) (+2) (↑3) (+4)
      (↑5)</fo:block>
If all of the glyphs come from the same font then the baseline handling
should be more consistent.
Have a look here to know how to configure fonts:
http://xmlgraphics.apache.org/fop/0.95/fonts.html

HTH,
Vincent

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: # characters for Base-14 fonts

Posted by Ryan Lortie <de...@desrt.ca>.
Very sorry.  I meant to reply to this much sooner.

On Tue, 2008-09-02 at 21:20 +0200, Andreas Delmelle wrote:
> If you want, you can open a Bugzilla(*) entry for this, so that the  
> issue is tracked.

I have done so.  The bug is here:
https://issues.apache.org/bugzilla/show_bug.cgi?id=45733 for anyone who
is interested.

Thanks for your help verifying this problem and tracking down its cause.

Cheers


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: # characters for Base-14 fonts

Posted by Andreas Delmelle <an...@telenet.be>.
On Sep 1, 2008, at 01:06, Ryan Lortie wrote:

> <snip />

> When you don't use <character> then FOP makes its decision about the
> height of the line based solely on the first listed font family
> (ignoring all of the others, irrespective of if they are used for font
> substitution in that line).

Having taken a quick, closer look at the related code, it goes in  
this direction indeed.

Technically, the story is that, without fo:character or fo:inline a  
combined text-area is generated for each separate 'word' (in the  
sense of: uninterrupted sequence of non-white-space characters,  
regardless of whether they can be rendered in the same font).

Those areas are currently all based on a single alignment-context  
(which seems to correspond to the first font-family in the list; this  
explains why we get a different result when putting the Symbol font  
first). AFAICT, it does not seem like a real tough problem to  
solve... I do seem to remember Max pointing out this issue at some  
time while implementing font-selection (?)
If we place the characters in an fo:inline or an fo:character, the  
only big difference is that a new alignment-context is created  
automatically, which later on triggers correct baseline alignment of  
the two pieces.

If you want, you can open a Bugzilla(*) entry for this, so that the  
issue is tracked.


Thanks

Andreas

(*) https://issues.apache.org/bugzilla

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: # characters for Base-14 fonts

Posted by Ryan Lortie <de...@desrt.ca>.
On Sun, 2008-08-31 at 17:36 -0400, Ryan Lortie wrote:
> On Sun, 2008-08-31 at 22:22 +0200, Andreas Delmelle wrote:
> > Not sure what the cause of the issue is (yet), but as for a final  
> > try, the following seems to produce desirable results:
> > 
> > <fo:block font-family="serif, Symbol">
> >    (<fo:character character="&#x2191;"/>1) (+2) (<fo:character  
> > character="&#x2200;"/>3) (text)
> > </fo:block>

One additional thing that I've noticed since I wrote my reply:

The <character> tag actually has no effect on the character that it
prints -- instead it affects the layout of the rest of the line, causing
the 'serif' parts of the line to be aligned properly with character in
the tag.

If you put a letter inside <character> then the 'serif' parts of the
line end up lined up like normal.  If you put a symbol then they end up
lined up properly with that symbol.

Look at this:

      <block font-family='serif,Symbol' text-align='center'>
        (∀1) (+2) (<character character='↑'/>3) (+4) (∀5) (ouch)
      </block>

In this case everything looks great -- and it appears that the (+2) and
(+4) that have moved down -- not the ∀ or the ↑ up.

I also did some tests with several lines of text.  Only the physical
line containing the <character> tag renders properly -- the others have
the same issue.

When looking at the several lines of text, however, something
immediately jumps out at you -- any line containing a <character>
rendered with Symbol is taller than all the other lines.  I guess this
is what "makes room" for the 'Symbol' characters to be rendered properly
inline with the 'serif' characters.

So my new wild guess about the nature of this problem is something like:

'Symbol' is a taller font than 'serif'.

FOP decides how tall each line should be and then places characters
within that line.  In the event that a character is "too tall" to fit
then it ends up being aligned to the top (and hanging out the bottom).
This is what happens when you have UTF8 character inline.

When you use <character> you somehow get FOP to notice that you're
placing taller characters in the line and it takes this into account
when deciding the proper height of the line.

When you don't use <character> then FOP makes its decision about the
height of the line based solely on the first listed font family
(ignoring all of the others, irrespective of if they are used for font
substitution in that line).

Not knowing Java and being completely unfamiliar with the FOP codebase I
have no way to verify that this is the actual nature of the problem or
to know how I'd go about fixing it.  Can anyone help? :)

Thanks for tuning into the saga thus far...

Cheers


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: # characters for Base-14 fonts

Posted by Ryan Lortie <de...@desrt.ca>.
On Sun, 2008-08-31 at 22:22 +0200, Andreas Delmelle wrote:
> Not sure what the cause of the issue is (yet), but as for a final  
> try, the following seems to produce desirable results:
> 
> <fo:block font-family="serif, Symbol">
>    (<fo:character character="&#x2191;"/>1) (+2) (<fo:character  
> character="&#x2200;"/>3) (text)
> </fo:block>

This is a fascinating discovery.  It's very strange that that works
properly.

In any case, I can probably write an XSLT template to scan all text
regions for 'special' characters and emit markup like that for them.
That's a sufficient workaround for my tastes.

> Strictly speaking, semantically, this comes down to the same thing as  
> omitting the fo:character object, and inserting the characters  
> directly, so FOP definitely has some issue here.

Is there an issue tracker that I should report this to?


> HTH!

It has a lot.  Thanks :)


Cheers


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: # characters for Base-14 fonts

Posted by Andreas Delmelle <an...@telenet.be>.
On Aug 31, 2008, at 21:37, Ryan Lortie wrote:

> Thanks for your continued patience.
>
> On Sun, 2008-08-31 at 13:04 +0200, Andreas Delmelle wrote:
>> Can I suggest using &#x200B; (zero-width space), like so:
>> <snip />

> This is a good step, yes.  It works OK with arrows because, being an
> oddly-shaped character it's somewhat difficult to judge where they
> 'should be', but doesn't pan out with ∀ (which is displayed
> significantly below where you would expect).

Right, this is indeed very undesirable.
Playing a bit more with it myself, I'm seeing that the only way the  
relative alignment is handled properly, is by using fo:inline or  
fo:character.

Using Symbol as the first font-family also has yet another effect. In  
that case, only the expression containing letters appears higher.

> <snip />
> Does anyone know why FOP gets this wrong where xmlroff gets it right?
> Is it really xmlroff that gets it wrong and my expectation of what is
> 'right' is incorrect as per the PDF standard?

Your expectation is correct, I think. I'm suspecting that, since the  
baseline of the Symbol font is a mathematical one, this somehow  
(improperly?) gets aligned to the baseline of the serif font, which  
is an alphabetic one.
See also: http://www.w3.org/TR/xsl/#area-alignment

Not sure what the cause of the issue is (yet), but as for a final  
try, the following seems to produce desirable results:

<fo:block font-family="serif, Symbol">
   (<fo:character character="&#x2191;"/>1) (+2) (<fo:character  
character="&#x2200;"/>3) (text)
</fo:block>

Strictly speaking, semantically, this comes down to the same thing as  
omitting the fo:character object, and inserting the characters  
directly, so FOP definitely has some issue here.


HTH!

Cheers

Andreas


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: # characters for Base-14 fonts

Posted by Ryan Lortie <de...@desrt.ca>.
hi Andreas

Thanks for your continued patience.

On Sun, 2008-08-31 at 13:04 +0200, Andreas Delmelle wrote:
> Can I suggest using &#x200B; (zero-width space), like so:
> 
> <block font-family='serif,Symbol' text-align='center'>
>    (&#x200B;↑&#x200B;1) (+2) (&#x200B;↑&#x200B;3) (+4)  
> (&#x200B;↑&#x200B;5)
> </block>
> 
> The result produced when using these as separators, already looks  
> much better.

This is a good step, yes.  It works OK with arrows because, being an
oddly-shaped character it's somewhat difficult to judge where they
'should be', but doesn't pan out with ∀ (which is displayed
significantly below where you would expect).

I've also tried using other fonts (such as sans-serif, Times, Helvetica,
etc) for the non-symbol characters, but they line up the same as 'serif'
does.

I tried installing xmlroff and its output is what I would expect to see
(see attached, the output of my original test document).  Unfortunately,
xmlroff is nowhere near FOP in terms of completeness so its use is not
really an option for me.

Does anyone know why FOP gets this wrong where xmlroff gets it right?
Is it really xmlroff that gets it wrong and my expectation of what is
'right' is incorrect as per the PDF standard?

Cheers

Re: # characters for Base-14 fonts

Posted by Andreas Delmelle <an...@telenet.be>.
On Aug 31, 2008, at 04:59, Ryan Lortie wrote:

Hi Ryan

> On Sun, 2008-08-31 at 03:04 +0200, Andreas Delmelle wrote:
>> FOP 0.95 (latest binary release) does not yet handle font-selection-
>> strategy, and simply uses the first specified family in the list
>> (which would be 'serif' in your example).
>
> I just checked out trunk and built it.  It was actually pretty
> straight-forward.
>
> The font selection is working but it's producing some rather awful
> results.
<snip />
>
> The first line (the 'ouch' one) produces extremely ugly results:  
> because
> the ↑ is touching the ( 1) FOP treats it as one word and renders the
> whole thing using Symbol.  Because (+2) contains no unsupported
> characters it's rendered as serif.  The fonts are substantially  
> similar,
> but their baselines (or something? -- I'm no font guy) are completely
> out of whack with respect to each other.  The result is that the terms
> bounce up and down with respect to each other (see attached PDF).
>
> The second line is something of a workaround for this problem -- only
> the arrows are out of place, which doesn't look so bad since they're
> just arrows anyway -- but I can't have the spaces there.

Can I suggest using &#x200B; (zero-width space), like so:

<block font-family='serif,Symbol' text-align='center'>
   (&#x200B;↑&#x200B;1) (+2) (&#x200B;↑&#x200B;3) (+4)  
(&#x200B;↑&#x200B;5)
</block>

The result produced when using these as separators, already looks  
much better.



Cheers

Andreas
---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: # characters for Base-14 fonts

Posted by Ryan Lortie <de...@desrt.ca>.
On Sun, 2008-08-31 at 03:04 +0200, Andreas Delmelle wrote:
> FOP 0.95 (latest binary release) does not yet handle font-selection- 
> strategy, and simply uses the first specified family in the list  
> (which would be 'serif' in your example).

I just checked out trunk and built it.  It was actually pretty
straight-forward.

The font selection is working but it's producing some rather awful
results.

Consider the following document:

<root xmlns='http://www.w3.org/1999/XSL/Format'>
  <layout-master-set>
    <simple-page-master master-name='master'
                        page-width='8cm' page-height='2cm'>
      <region-body/>
    </simple-page-master>
  </layout-master-set>

  <page-sequence master-reference='master'>
    <flow flow-name='xsl-region-body'>
      <block font-family='serif,Symbol' text-align='center'>
        (↑1) (+2) (↑3) (+4) (↑5) (ouch)
      </block>

      <block font-family='serif,Symbol' text-align='center'>
        ( ↑ 1) (+2) ( ↑ 3) (+4) ( ↑ 5)
      </block>
    </flow>
  </page-sequence>
</root>

The first line (the 'ouch' one) produces extremely ugly results: because
the ↑ is touching the ( 1) FOP treats it as one word and renders the
whole thing using Symbol.  Because (+2) contains no unsupported
characters it's rendered as serif.  The fonts are substantially similar,
but their baselines (or something? -- I'm no font guy) are completely
out of whack with respect to each other.  The result is that the terms
bounce up and down with respect to each other (see attached PDF).

The second line is something of a workaround for this problem -- only
the arrows are out of place, which doesn't look so bad since they're
just arrows anyway -- but I can't have the spaces there.

The "Character-by-Character is NOT yet supported!" statement on the
fonts page appears to prevent me from doing what I want to do without
spaces, so are there any other workarounds for this?  Is it a bug that
the fonts are so far offset with respect to each other?

(ps: I'd consider using just 'Symbol' except my equations also contain
variable names too and Symbol doesn't have letters).

Cheers

Re: # characters for Base-14 fonts

Posted by Andreas Delmelle <an...@telenet.be>.
On Aug 31, 2008, at 02:35, Ryan Lortie wrote:

> I was expecting that FOP would do substitution when it encounters a
> glyph for which it has no font.  There's a note about that at
> http://xmlgraphics.apache.org/fop/trunk/fonts.html#substitution that
> implied to me that FOP will try to choose the correct font  
> automatically
> based on the content of the word that it is rendering.

Ah, OK, that's FOP Trunk, where the substitution indeed happens  
automatically.
FOP 0.95 (latest binary release) does not yet handle font-selection- 
strategy, and simply uses the first specified family in the list  
(which would be 'serif' in your example).

So, apart from specifying the desired font-family as the first/only,  
there's also the option of switching to FOP Trunk (reasonably stable  
for the moment), which does offer this additional functionality.

HTH!

Cheers

Andreas


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: # characters for Base-14 fonts

Posted by Ryan Lortie <de...@desrt.ca>.
Hi Andreas

Thanks for your rapid response.

I was expecting that FOP would do substitution when it encounters a
glyph for which it has no font.  There's a note about that at
http://xmlgraphics.apache.org/fop/trunk/fonts.html#substitution that
implied to me that FOP will try to choose the correct font automatically
based on the content of the word that it is rendering.

I dug a bit deeper and found the following:

http://www.w3.org/TR/xsl/#font-family

"""
        To deal with the problem that a single font may not contain
        glyphs to display all the characters in a document, or that not
        all fonts are available on all systems, this property allows
        authors to specify a list of fonts, all of the same style and
        size, that are tried in sequence to see if they contain a glyph
        for a certain character.
"""


On Sat, 2008-08-30 at 15:08 +0200, Andreas Delmelle wrote: 
> Can you show us the FO? How exactly are you inserting the character?
> 
> Something like
> 
> <fo:block font-family="Symbol">&#x2200;</fo:block>
> 
> works nicely on my end.

I cooked up the following test case:

<?xml version='1.0' encoding='utf-8'?>

<root xmlns='http://www.w3.org/1999/XSL/Format'>

  <layout-master-set>
    <simple-page-master master-name='master'
                        page-width='5cm' page-height='2cm'>
      <region-body/>
    </simple-page-master>
  </layout-master-set>

  <page-sequence master-reference='master'>
    <flow flow-name='xsl-region-body'>

      <block font-family='serif'>
        some text with ∀ a symbol.
      </block>

      <block font-family='Symbol'>
        some text with ∀ a symbol.
      </block>

      <block font-family='serif,Symbol'>
        some text with ∀ a symbol.
      </block>

    </flow>
  </page-sequence>

</root>

On my FOP (0.95 binary directly from the site) I get the following
output:


                        some text with # a symbol.
                        #### #### #### ∀ # ######.
                        some text with # a symbol.
                        
                        
The first two lines make perfect sense, but I would expect the last line
to contain all the characters properly rendered.

It might be that I'm misreading/misunderstanding FO since I'm quite new
to all of this but the (real) output that (originally) triggered the
problem is actually from the Docbook XSL stylesheets.  There's a note
there, too:

http://docbook.sourceforge.net/release/xsl/current/doc/fo/symbol.font.family.html

"""
        A typical body or title font does not contain all the character
        glyphs that DocBook supports. This parameter specifies
        additional fonts that should be searched for special characters
        not in the normal font. These symbol font names are
        automatically appended to the body or title font family name
        when fonts are specified in a font-family property in the FO
        output.
"""

This results in all of my blocks being marked with
font-family='serif,Symbol,ZapfDingbats' which, just like the small test
case given above, fails to work.

Any additional insight is definitely appreciated. :)

Thanks again



---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: # characters for Base-14 fonts

Posted by Andreas Delmelle <an...@telenet.be>.
On Aug 30, 2008, at 12:52, Ryan Lortie wrote:

Hi

> I have a FO document that contains some symbols like ∀.  Nothing  
> that's
> not covered by a font in the Base-14 (like Symbol).
>
> When I send this document through FOP to generate PDF output I get the
> '#' characters showing up, accompanied by the following output:
<snip />

Can you show us the FO? How exactly are you inserting the character?

Something like

<fo:block font-family="Symbol">&#x2200;</fo:block>

works nicely on my end.

As for the warning messages:

>
> 30-Aug-2008 6:45:31 AM org.apache.fop.fonts.FontInfo
> notifyFontReplacement
> WARNING: Font 'Symbol,normal,700' not found. Substituting with
> 'Symbol,normal,400'.
> 30-Aug-2008 6:45:31 AM org.apache.fop.fonts.FontInfo
> notifyFontReplacement
> WARNING: Font 'ZapfDingbats,normal,700' not found. Substituting with
> 'ZapfDingbats,normal,400'.

These are simply indications that you have specified font-family  
'Symbol' or 'ZapfDingbats' in combination with (most likely  
inherited) font-weight 'bold'. You can safely ignore these, or  
otherwise, override the font-weight to 'normal' on the related blocks/ 
inlines, if you want to avoid them altogether.

> 30-Aug-2008 6:45:31 AM org.apache.fop.hyphenation.Hyphenator
> getHyphenationTree
> SEVERE: Couldn't find hyphenation pattern en

This indicates you have no hyphenation pattern file that is reachable  
at runtime.
See http://xmlgraphics.apache.org/fop/0.95/hyphenation.html#install  
for a variety of mechanisms to make them available to FOP.

None of the messages are related to the reported problem, though...

Cheers

Andreas
---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org