You are viewing a plain text version of this content. The canonical link for it is here.

Posted to fop-dev@xmlgraphics.apache.org by Loran Kary <lk...@apple.com> on 2007/06/27 20:49:35 UTC

Character-by-character font selection strategy

I see there was a recent thread on fop-user called "Mixing Languages  
and Unicode".  I have the same problem.  The PDFs that I create with  
FOP could potentially contain any mix of languages and no one font  
will support them all.

I do not believe it is practical to try to implement a character-by- 
character font selection strategy in XSLT, even if I could figure out  
how to do it.  Nor do I believe it is practical to try to create some  
custom font that supports all languages and embed that in all my PDFs.

So my question is, what would it take to implement support for the  
font-selection-strategy property?  Has anyone looked at this or taken  
a crack at it?  Is there any chance that it might be implemented in  
the foreseeable future?

Thanks,
Loran Kary

Re: Character-by-character font selection strategy

Posted by Max Berger <ma...@berger.name>.

Dear Fop-Devs,

I've started some work on that in a patch I've submitted a while ago
(which needs cleanup - lots of cleanup)

http://issues.apache.org/bugzilla/show_bug.cgi?id=39422

I've also implemented character-by-character font selection for JEuclid,
which may serve as a reference. Please look at:

http://jeuclid.sourceforge.net/jeuclid-core/xref/net/sourceforge/jeuclid/elements/support/text/StringUtil.html#102
http://jeuclid.sourceforge.net/jeuclid-core/xref/net/sourceforge/jeuclid/elements/support/attributes/MathVariant.html#194

I'll add all three links to the wiki.

Of course the algorithms would have to be modified to work with fop's
font system instead of AWTs.

I'd be very willing to test / enhance a patch, because I really need
this feature (hence my original patch).

One quick wish while you're at it: AFAIK FOP still does not even print a
warning when it replaces a character with the # sign. Please fix this!
(Part of the patch).


mfG

Max Berger
e-mail: max@berger.name

-- 
OpenPG ID: E81592BC   Print: F489F8759D4132923EC4 BC7E072AB73AE81592BC
For information about me and my work please see http://max.berger.name

Re: Character-by-character font selection strategy

Posted by Andreas L Delmelle <a_...@pandora.be>.

On Jun 28, 2007, at 18:32, Andreas L Delmelle wrote:

> ... calls an implementation of FontInfo.fontLookup() that just  
> returns the first family in the list...

Of course this should be supplemented: "... that is supported in the  
current configuration."

Re: Character-by-character font selection strategy

Posted by Andreas L Delmelle <a_...@pandora.be>.

On Jun 27, 2007, at 23:31, Jeremias Maerki wrote:

> I've noted some things here:
> http://wiki.apache.org/xmlgraphics-fop/FontSelectionStrategy
>
> It's best to use that page to gather thoughts and come up with a  
> plan to
> implement that feature.

Sorry, completely missed that Wiki update.

One thing there I don't feel like commenting on on the Wiki (since I  
don't consider it a forum to have Q&A sessions):

What improvements to the font-family property did you have in mind?
AFAICT, the property resolution is already as complete as can be and  
generates a list of font-families, only Font.getFontState() calls an  
implementation of FontInfo.fontLookup() that just returns the first  
family in the list...

Do these improvements refer to moving the fontLookup()-logic partly  
to FontFamilyProperty? Say, for instance an additional method with a  
signature like
getFontFamily(char c)
that returns the first font-family in the list containing a glyph for  
the given char?

Also, I agree that FOText does not need to be touched, but even the  
TextLM can be left alone. In fact, it is currently already so that if  
a FOText contains more than Short.MAX_VALUE characters, it is split  
into separate instances (because TextLM uses shorts as indices to the  
ArrayInfos). Two separate TextLMs are automatically instantiated in  
that case...

The only thing that would need to be altered in this scenario, is  
moving the calls to getFontState() from the LM's initialize() method  
to somewhere in FObjMixed (?)

I'll try to document this approach further in the days to come.

Cheers

Andreas

Re: Character-by-character font selection strategy

Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.

I've noted some things here:
http://wiki.apache.org/xmlgraphics-fop/FontSelectionStrategy

It's best to use that page to gather thoughts and come up with a plan to
implement that feature.

On 27.06.2007 22:54:44 Andreas L Delmelle wrote:
> On Jun 27, 2007, at 20:49, Loran Kary wrote:
> 
> Hi,
> 
> > I see there was a recent thread on fop-user called "Mixing  
> > Languages and Unicode".  I have the same problem.  The PDFs that I  
> > create with FOP could potentially contain any mix of languages and  
> > no one font will support them all.
> >
> > I do not believe it is practical to try to implement a character-by- 
> > character font selection strategy in XSLT, even if I could figure  
> > out how to do it.  Nor do I believe it is practical to try to  
> > create some custom font that supports all languages and embed that  
> > in all my PDFs.
> 
> I agree. Implementing something like this in XSLT is indeed a  
> complete waste here. As for custom fonts supporting all possible  
> Unicode-characters, they tend to blow up the resulting PDF's size  
> when embedded...
> 
> Font-selection is precisely why --I think-- the Recommendation states  
> that in the first step of formatting, all 'characters are converted  
> in character FOs' (in 1.1.2 Formatting). One could even argue that  
> FOP is not compliant to the Rec here, as it treats contiguous blocks  
> of text in the source XML internally as one character array.
> 
> > So my question is, what would it take to implement support for the  
> > font-selection-strategy property?
> 
> Now, /there/ is a GOOD question to ask first! 8-)
> Always make /us/ think about that first, before nagging about  
> possible plans to implement. Nice strategy. I like your style. :-)
> 
> Come to think of it, a little understanding of FOP's internals and  
> the implied Java-knowledge should be enough. No need to touch the  
> layoutengine for this, if I judge correctly. We could implement this  
> at the point where the FOText instances are processed, since the font- 
> family info is already available at that point (and hence, there is  
> the possibility to check for codepoints that have no glyph).
> 
> We could do a substitution to character FO's for codepoints outside  
> the default font's mapping...
> 
> 
> Cheers
> 
> Andreas



Jeremias Maerki

Re: Character-by-character font selection strategy

Posted by Andreas L Delmelle <a_...@pandora.be>.

On Jun 27, 2007, at 20:49, Loran Kary wrote:

Hi,

> I see there was a recent thread on fop-user called "Mixing  
> Languages and Unicode".  I have the same problem.  The PDFs that I  
> create with FOP could potentially contain any mix of languages and  
> no one font will support them all.
>
> I do not believe it is practical to try to implement a character-by- 
> character font selection strategy in XSLT, even if I could figure  
> out how to do it.  Nor do I believe it is practical to try to  
> create some custom font that supports all languages and embed that  
> in all my PDFs.

I agree. Implementing something like this in XSLT is indeed a  
complete waste here. As for custom fonts supporting all possible  
Unicode-characters, they tend to blow up the resulting PDF's size  
when embedded...

Font-selection is precisely why --I think-- the Recommendation states  
that in the first step of formatting, all 'characters are converted  
in character FOs' (in 1.1.2 Formatting). One could even argue that  
FOP is not compliant to the Rec here, as it treats contiguous blocks  
of text in the source XML internally as one character array.

> So my question is, what would it take to implement support for the  
> font-selection-strategy property?

Now, /there/ is a GOOD question to ask first! 8-)
Always make /us/ think about that first, before nagging about  
possible plans to implement. Nice strategy. I like your style. :-)

Come to think of it, a little understanding of FOP's internals and  
the implied Java-knowledge should be enough. No need to touch the  
layoutengine for this, if I judge correctly. We could implement this  
at the point where the FOText instances are processed, since the font- 
family info is already available at that point (and hence, there is  
the possibility to check for codepoints that have no glyph).

We could do a substitution to character FO's for codepoints outside  
the default font's mapping...

Cheers

Andreas