You are viewing a plain text version of this content. The canonical link for it is here.

Posted to fop-dev@xmlgraphics.apache.org by "Chris Bowditch (JIRA)" <ji...@apache.org> on 2013/04/22 16:23:16 UTC

[jira] [Assigned] (FOP-2210) [PATCH] Complex script IF to output missing glyphs

     [ https://issues.apache.org/jira/browse/FOP-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Bowditch reassigned FOP-2210:
-----------------------------------

    Assignee: Chris Bowditch
    
> [PATCH] Complex script IF to output missing glyphs
> --------------------------------------------------
>
>                 Key: FOP-2210
>                 URL: https://issues.apache.org/jira/browse/FOP-2210
>             Project: Fop
>          Issue Type: Bug
>            Reporter: simon steiner
>            Assignee: Chris Bowditch
>         Attachments: csspeedtrunk.patch, fop.xconf, test.fo
>
>
> fop test.fo -c fop.xconf -if application/pdf expected.if.xml
> fop -c fop.xconf -ifin expected.if.xml out.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: [jira] [Assigned] (FOP-2210) [PATCH] Complex script IF to output missing glyphs

Posted by Glenn Adams <gl...@skynav.com>.

On Sun, Apr 28, 2013 at 12:15 PM, Alexios Giotis <al...@gmail.com>wrote:

> On 26 Apr 2013, at 23:45, Glenn Adams <gl...@skynav.com> wrote:
>
> > (3) I am not (yet) convinced in the wisdom of supporting modification to
> the IF text, but I'm open to learn about use cases;
> >
>
>
>
> Hi Glenn,
>
> Interesting thread, I will just attempt to describe some use cases where I
> need to modify the IF text.
>
> 1. Printing jobs
> This is selecting documents, grouping / sorting them (e.g. group per range
> of pages so they fit in a certain envelope type and then sort by zip code)
> and then splitting them in to batches of about 20000 pages each. This is
> done by first rendering each document to IF and then concatenating them to
> the final output format. I need to first create the IF because:
>
> - The number of pages of each document is not known in advance (e.g. from
> the XSL:FO) and this is an important criterion for creating batches.
> - It is not efficient (or possible) to render documents of 20k or more
> pages.
>
> During rendering the IF to the final output format there are some SAX
> filters installed after XMLReader and before FOP that on the fly modify the
> IF. This is typically needed to:
> - Add page / sheet / document  and other counters across each printing
> batch and for the whole printing job.
> - Adding barcodes / OMRs or other symbols that drive the inserter
> (enveloping machine).
>
>
>
> 2. Fast rendering of documents
> Rendering is a resource intensive process and we need to serve documents
> fastly regardless of their size. What we do it to 'cache' IF. A user
> selects a document and then based on her permissions and the parts she
> selected, we render part of the cached IF to PDF (no other formats in that
> case). But there are some parts that we need to change, the most common
> being the total number of pages (e.g. the N in Page 1 of N). We change it
> by either replacing a text placeholder with the actual value or by
> overlaying each cached IF page with a short, dynamically generated one. The
> first approach is faster but not optimal if we assumed one or two digits
> for N but it is a four digit number.
>
>
>
> 3. Rendering really big documents
> There is customer (of one of our customers) that has a monthly invoice of
> 60k pages and he gets that printed. FOP can't render such big documents
> with a single pass and we need to modify partial IFs.
>
>
> We do have other use cases but I hope I described some of them.
>

Thanks for describing these cases. It appears that if you do make changes
to IF text it is limited to counters like page numbers, page count, and not
general or arbitrary changes to text that has already been set.


>
>
> Alexios Giotis
>
>
>
>

Re: [jira] [Assigned] (FOP-2210) [PATCH] Complex script IF to output missing glyphs

Posted by Glenn Adams <gl...@skynav.com>.

On Mon, Apr 29, 2013 at 6:16 AM, Chris Bowditch
<bo...@hotmail.com>wrote:

> Hi Glenn, Alexios,
>
> One of the key requirements when we implemented IF XML, was the ability to
> make modifications. Thanks to Alex for provide a list of business reasons
> why that is necessary. I agree with those use cases. Some of the others are
> adding barcodes, OMR marks, large file page numbering.
>
> We want to see this original requirement for IF XML maintained.
>

I can see a number of possible scenarios here, where I by "original" I mean
inclusion of original text in IF, and by "mapped" I mean inclusion of
mapped text in IF, i.e., result of char/glyph substitution/positioning
process:

*-Original, +Mapped (all)*

Current implementation. If mapped != original for some text, then won't
fully support accessibility or copy/find (of original). Also, have to deal
with use of PUA mappings.

*+Original, +Mapped (all)*
*
*
This would resolve the accessibility/copy/find issues, and support
selective re-mapping during IF rendering. However, the IF files will be
significantly larger, and the presence of mapped text will be unused in
cases where it is re-mapped. On the other hand, this option increases the
performance of IF rendering for cases where some or most text will not be
re-mapped.

*+Original, +Mapped (partial)*

A subset of the prior scenario, where mapped text is either selectively
included or excluded from the resulting IF. Given additional configuration
options or extension properties (fox:), the FO->IF process could select
which mapped text to include/exclude.

*+Original, -Mapped*

No mapped text is included in IF, thus requiring full re-mapping during IF
rendering.

*Possible Configuration Support*
*
*
Following are some ideas about additional configuration or FO extension
properties to support the above scenarios:

*Globally Exclude All Mapped Text (fop.xconf)*

<exclude-mapped-text-from-intermediate-format/>

*Globally Exclude All Original Text (fop.xconf)*

<exclude-original-text-from-intermediate-format/>

*Selectively Exclude Mapped Text (fox attribute)*

fox:exclude-mapped-text-from-intermediate-format='true'|'false'

*Selectively Exclude Original Text (fox attribute)*

fox:exclude-original-text-from-intermediate-format='true'|'false'

*Selectively Mark Modification of IF Text for Re-Mapping (IF attribute)*

<text foi:remap .../>

*Possible Optimizations*
*
*
If both original text and mapped text are output for a given IF text, then
if they are identical, only one need be present.

Re: [jira] [Assigned] (FOP-2210) [PATCH] Complex script IF to output missing glyphs

Posted by Chris Bowditch <bo...@hotmail.com>.

Hi Glenn, Alexios,

One of the key requirements when we implemented IF XML, was the ability 
to make modifications. Thanks to Alex for provide a list of business 
reasons why that is necessary. I agree with those use cases. Some of the 
others are adding barcodes, OMR marks, large file page numbering.

We want to see this original requirement for IF XML maintained.

Thanks,

Chris

On 28/04/2013 20:15, Alexios Giotis wrote:
> On 26 Apr 2013, at 23:45, Glenn Adams <gl...@skynav.com> wrote:
>
>> (3) I am not (yet) convinced in the wisdom of supporting modification to the IF text, but I'm open to learn about use cases;
>>
>
>
> Hi Glenn,
>
> Interesting thread, I will just attempt to describe some use cases where I need to modify the IF text.
>
> 1. Printing jobs
> This is selecting documents, grouping / sorting them (e.g. group per range of pages so they fit in a certain envelope type and then sort by zip code) and then splitting them in to batches of about 20000 pages each. This is done by first rendering each document to IF and then concatenating them to the final output format. I need to first create the IF because:
>
> - The number of pages of each document is not known in advance (e.g. from the XSL:FO) and this is an important criterion for creating batches.
> - It is not efficient (or possible) to render documents of 20k or more pages.
>
> During rendering the IF to the final output format there are some SAX filters installed after XMLReader and before FOP that on the fly modify the IF. This is typically needed to:
> - Add page / sheet / document  and other counters across each printing batch and for the whole printing job.
> - Adding barcodes / OMRs or other symbols that drive the inserter (enveloping machine).
>
>
>
> 2. Fast rendering of documents
> Rendering is a resource intensive process and we need to serve documents fastly regardless of their size. What we do it to 'cache' IF. A user selects a document and then based on her permissions and the parts she selected, we render part of the cached IF to PDF (no other formats in that case). But there are some parts that we need to change, the most common being the total number of pages (e.g. the N in Page 1 of N). We change it by either replacing a text placeholder with the actual value or by overlaying each cached IF page with a short, dynamically generated one. The first approach is faster but not optimal if we assumed one or two digits for N but it is a four digit number.
>
>
>
> 3. Rendering really big documents
> There is customer (of one of our customers) that has a monthly invoice of 60k pages and he gets that printed. FOP can't render such big documents with a single pass and we need to modify partial IFs.
>
>
> We do have other use cases but I hope I described some of them.
>
>
> Alexios Giotis
>
>
>
>
>

Re: [jira] [Assigned] (FOP-2210) [PATCH] Complex script IF to output missing glyphs

Posted by Alexios Giotis <al...@gmail.com>.

On 26 Apr 2013, at 23:45, Glenn Adams <gl...@skynav.com> wrote:

> (3) I am not (yet) convinced in the wisdom of supporting modification to the IF text, but I'm open to learn about use cases;
>

Hi Glenn,

Interesting thread, I will just attempt to describe some use cases where I need to modify the IF text.

1. Printing jobs
This is selecting documents, grouping / sorting them (e.g. group per range of pages so they fit in a certain envelope type and then sort by zip code) and then splitting them in to batches of about 20000 pages each. This is done by first rendering each document to IF and then concatenating them to the final output format. I need to first create the IF because:

- The number of pages of each document is not known in advance (e.g. from the XSL:FO) and this is an important criterion for creating batches.
- It is not efficient (or possible) to render documents of 20k or more pages.

During rendering the IF to the final output format there are some SAX filters installed after XMLReader and before FOP that on the fly modify the IF. This is typically needed to:
- Add page / sheet / document and other counters across each printing batch and for the whole printing job.
- Adding barcodes / OMRs or other symbols that drive the inserter (enveloping machine).

2. Fast rendering of documents
Rendering is a resource intensive process and we need to serve documents fastly regardless of their size. What we do it to 'cache' IF. A user selects a document and then based on her permissions and the parts she selected, we render part of the cached IF to PDF (no other formats in that case). But there are some parts that we need to change, the most common being the total number of pages (e.g. the N in Page 1 of N). We change it by either replacing a text placeholder with the actual value or by overlaying each cached IF page with a short, dynamically generated one. The first approach is faster but not optimal if we assumed one or two digits for N but it is a four digit number.

3. Rendering really big documents
There is customer (of one of our customers) that has a monthly invoice of 60k pages and he gets that printed. FOP can't render such big documents with a single pass and we need to modify partial IFs.

We do have other use cases but I hope I described some of them.

Alexios Giotis

Re: [jira] [Assigned] (FOP-2210) [PATCH] Complex script IF to output missing glyphs

Posted by Glenn Adams <gl...@skynav.com>.

On Fri, Apr 26, 2013 at 12:59 PM, Vincent Hennebert <vh...@gmail.com>wrote:

> On 25/04/13 22:33, Glenn Adams wrote:
> > On Thu, Apr 25, 2013 at 1:08 PM, Vincent Hennebert <vhennebert@gmail.com
> >wrote:
> >
> >> On 25/04/13 17:48, Glenn Adams wrote:
> >>> On Thu, Apr 25, 2013 at 2:31 AM, Vincent Hennebert <
> vhennebert@gmail.com
> >>> wrote:
> >>>
> >>>>
> >>>> It doesn’t shock me to store text as text in the IF and to re-do the
> >>>> glyph mapping when rendering it to the final output format. This is
> >>>> actually how it is done ATM.
> >>>>
> >>>
> >>> I think this a bad idea for the reasons that Alexios mentioned, and
> that
> >> I
> >>> previously mentioned about recreating sufficient layout context to
> repeat
> >>> the process reliably.
> >>
> >> What exactly do you mean by ‘sufficient layout context’? What would be
> >> missing from the IF that would prevent to re-do the glyph mapping?
> >>
> >
> > Off hand, we would need:
> >
> >    - language
> >    - script
> >    - font features to be applied (with parameters)
> >    - letter-spacing settings
>
> Apart from the font features, they are already available in the file.
> Regarding font features, they could be added to the font element, but
> AFAIK this is not customizable in the FO file is it? So I guess the
> default set of features is applied. So that default set can also be
> applied to text coming from the IF.
>

I don't believe language and script are specifiable at a per-text level in
IF.

      <xs:element name="text">
        <xs:complexType>
          <xs:simpleContent>
            <xs:extension base="xs:string">
              <xs:attribute name="x" use="required" type="mf:lengthType"/>
              <xs:attribute name="y" use="required" type="mf:lengthType"/>
              <xs:attribute name="letter-spacing" type="mf:lengthType"/>
              <xs:attribute name="word-spacing" type="mf:lengthType"/>
              <xs:attribute name="dx" type="mf:lengthListType"/>
              <xs:attribute name="dp" type="mf:dpListType"/>
              <xs:attribute name="hyphenated" type="xs:boolean"/>
            </xs:extension>
          </xs:simpleContent>
        </xs:complexType>
      </xs:element>

Yes, at present, the default features apply. However, I added code already
to allow an extension property to specify features, such as defined by [1],
which is on my short list of planned upgrades.

[1] http://www.w3.org/TR/css3-fonts/#font-feature-settings-prop

I agree in principle that these could be added to the IF data as well, so
we would need to add at least the following attributes:

language
script
font-feature-settings


>
>
> > There are probably others. I just don't see any reason to use this
> approach.
> >
> >
> >>
> >>
> >>>> Sure it may become more costly when you start using complex scripts,
> >>>> but
> >>>> that would have to be confirmed with some profiling first and
> foremost.
> >>>> We might be surprised.
> >>>>
> >>>> We should keep in mind that it’s a perfectly reasonable use case to
> add
> >>>> text to the IF as part of a post-processing step. That text will have
> to
> >>>> go through the glyph mapping code anyway.
> >>>>
> >>>> Also, to have copy-paste work properly from PDF the original text must
> >>>> be present in the IF.
> >>>>
> >>>
> >>> Agreed, but this is a different requirement. And doesn't entail
> >>> reconstructing part of the layout context and repeating the character
> to
> >>> glyph mapping and positioning process.
> >>
> >> You’ll have to do that for text added at post-process time anyway?
> >>
> >
> > I don't understand what this means.
>
> The IF can be manipulated in many ways by the user and, among other
> things, text can be added to it, which will have to be rendered into the
> final output.
>
> This is an important reason why I think glyph mapping should be redone.
>

Hmm, without re-layout? Seems risky, but I agree its possible.


> >>>> Storing information about the private use area in the IF is
> >>>> exposing
> >>>> internal implementation details of FOP.
> >>>
> >>>
> >>> I disagree. In fact, it is working around a bug that exists in certain
> >>> fonts which forces FOP to make use of synthesized PUA mappings. The bug
> >> is
> >>> that the font designer did not fully populate the original CMAP, i.e.,
> >>> include a mapping for every accessible glyph.
> >>
> >> I still don’t get it I’m afraid. Where in the TrueType spec is it stated
> >> that every glyph should have an entry in the cmap?
> >
> >
> > It doesn't. But if someone uses a font, wants to present a glyph that has
> > no mapping, and must use character codes, then it won't work.
>
> That’s this ‘must use character codes’ requirement that seems buggy to
> me.
>
>
> >> Why can’t FOP just
> >> use the glyph ID? Surely that information is enough?
> >>
> >
> > Well, for one thing, the IF interface for renderText uses a character
> > string, not a glyph index string,
>
> No, it uses Unicode code points. It must probably be extended to pass
> information about the glyph mapping as well.
>

character string = (Unicode code point)*

These are not glyph codes.


>
>
> > and the IF XML format uses Unicode code
> > points.
> >
> >
> >>
> >>
> >>>> When going the direct FO to PDF
> >>>> route, mapping glyphs to character codes to re-map them again into
> >>>> glyphs when creating the PDF is sub-optimal. We might as well work
> with
> >>>> the glyph indices all the way through.
> >>>>
> >>>
> >>> This is possible, but wouldn't it require two separate paths through
> the
> >> IF
> >>> layer, and would it not work for non-PDF output?
> >>
> >> I don’t think so. The original text should be passed through anyway to
> >> create the ToUnicode cmap.
> >
> >
> > Why?
>
> For copy/pasting to work in PDF. The original text must be returned.
> This is also important for accessibility (reading the text aloud).
>

I've already agreed that having the original text is important, and its not
there at present, so this is a bug waiting to be solved.


>
>
> >> So PDF can use the glyph mapping to generate
> >> the text operators and the original text for the ToUnicode cmap. The IF
> >> renderer just streams out the original text. And the other renderers
> >> just deal with the glyph mapping.
> >>
> >
> > Since the technique I suggests will work and does not require this, then
> > this (repeating the character to glyph mapping, positioning, and layout
> > process) isn't necessary. I have agreed, however, that embedding the
> > original UC text for performing copy and find operations will be useful,
> > for which there is already an open bug [1].
> >
> > [1] https://issues.apache.org/jira/browse/FOP-2204
> >
>

To summarize:

(1) I agree it is desirable to include the original unicode text so that
copy/find/accessibility can work;
(2) I agree that it is possible to re-perform the character/glyph mapping
process provided new attributes are added to the IF text element;
(3) I am not (yet) convinced in the wisdom of supporting modification to
the IF text, but I'm open to learn about use cases;
(4) I know that it is possible to satisfy (1) above without having to
re-perform the mapping;
(5) I know that the present problem can be solved without doing (1) or (2),
e.g., by adding pua children to the font element;

Re: [jira] [Assigned] (FOP-2210) [PATCH] Complex script IF to output missing glyphs

Posted by Vincent Hennebert <vh...@gmail.com>.

On 25/04/13 22:33, Glenn Adams wrote:
> On Thu, Apr 25, 2013 at 1:08 PM, Vincent Hennebert <vh...@gmail.com>wrote:
> 
>> On 25/04/13 17:48, Glenn Adams wrote:
>>> On Thu, Apr 25, 2013 at 2:31 AM, Vincent Hennebert <vhennebert@gmail.com
>>> wrote:
>>>
>>>>
>>>> It doesn’t shock me to store text as text in the IF and to re-do the
>>>> glyph mapping when rendering it to the final output format. This is
>>>> actually how it is done ATM.
>>>>
>>>
>>> I think this a bad idea for the reasons that Alexios mentioned, and that
>> I
>>> previously mentioned about recreating sufficient layout context to repeat
>>> the process reliably.
>>
>> What exactly do you mean by ‘sufficient layout context’? What would be
>> missing from the IF that would prevent to re-do the glyph mapping?
>>
> 
> Off hand, we would need:
> 
>    - language
>    - script
>    - font features to be applied (with parameters)
>    - letter-spacing settings

Apart from the font features, they are already available in the file.
Regarding font features, they could be added to the font element, but
AFAIK this is not customizable in the FO file is it? So I guess the
default set of features is applied. So that default set can also be
applied to text coming from the IF.


> There are probably others. I just don't see any reason to use this approach.
> 
> 
>>
>>
>>>> Sure it may become more costly when you start using complex scripts,
>>>> but
>>>> that would have to be confirmed with some profiling first and foremost.
>>>> We might be surprised.
>>>>
>>>> We should keep in mind that it’s a perfectly reasonable use case to add
>>>> text to the IF as part of a post-processing step. That text will have to
>>>> go through the glyph mapping code anyway.
>>>>
>>>> Also, to have copy-paste work properly from PDF the original text must
>>>> be present in the IF.
>>>>
>>>
>>> Agreed, but this is a different requirement. And doesn't entail
>>> reconstructing part of the layout context and repeating the character to
>>> glyph mapping and positioning process.
>>
>> You’ll have to do that for text added at post-process time anyway?
>>
> 
> I don't understand what this means.

The IF can be manipulated in many ways by the user and, among other
things, text can be added to it, which will have to be rendered into the
final output.

This is an important reason why I think glyph mapping should be redone.


>>>> Storing information about the private use area in the IF is 
>>>> exposing
>>>> internal implementation details of FOP.
>>>
>>>
>>> I disagree. In fact, it is working around a bug that exists in certain
>>> fonts which forces FOP to make use of synthesized PUA mappings. The bug
>> is
>>> that the font designer did not fully populate the original CMAP, i.e.,
>>> include a mapping for every accessible glyph.
>>
>> I still don’t get it I’m afraid. Where in the TrueType spec is it stated
>> that every glyph should have an entry in the cmap?
> 
> 
> It doesn't. But if someone uses a font, wants to present a glyph that has
> no mapping, and must use character codes, then it won't work.

That’s this ‘must use character codes’ requirement that seems buggy to
me.


>> Why can’t FOP just
>> use the glyph ID? Surely that information is enough?
>>
> 
> Well, for one thing, the IF interface for renderText uses a character
> string, not a glyph index string,

No, it uses Unicode code points. It must probably be extended to pass
information about the glyph mapping as well.


> and the IF XML format uses Unicode code
> points.
> 
> 
>>
>>
>>>> When going the direct FO to PDF
>>>> route, mapping glyphs to character codes to re-map them again into
>>>> glyphs when creating the PDF is sub-optimal. We might as well work with
>>>> the glyph indices all the way through.
>>>>
>>>
>>> This is possible, but wouldn't it require two separate paths through the
>> IF
>>> layer, and would it not work for non-PDF output?
>>
>> I don’t think so. The original text should be passed through anyway to
>> create the ToUnicode cmap.
> 
> 
> Why?

For copy/pasting to work in PDF. The original text must be returned.
This is also important for accessibility (reading the text aloud).


>> So PDF can use the glyph mapping to generate
>> the text operators and the original text for the ToUnicode cmap. The IF
>> renderer just streams out the original text. And the other renderers
>> just deal with the glyph mapping.
>>
> 
> Since the technique I suggests will work and does not require this, then
> this (repeating the character to glyph mapping, positioning, and layout
> process) isn't necessary. I have agreed, however, that embedding the
> original UC text for performing copy and find operations will be useful,
> for which there is already an open bug [1].
> 
> [1] https://issues.apache.org/jira/browse/FOP-2204
> 
> 
>>
>>
>> Vincent
>>
>>
>>> I suspect this falls under
>>> the category of "premature optimization", on which Knuth says "Premature
>>> optimization is the root of all evil (or at least most of it) in
>>> programming."
>>>
>>>
>>>>
>>>>
>>>> Vincent
>>>>
>>>>
>>>>> On 25 Apr 2013, at 01:52, Glenn Adams <gl...@skynav.com> wrote:
>>>>>
>>>>>> I see no option but to modify IF. We modified IF for 1.1 in the first
>>>> place.  We have recently made quite a number of backward incompatible
>>>> changes to the FOP public APIs. I expect the next release will need to
>> bump
>>>> the major version to 2 for FOP due to these changes, so there is little
>>>> risk in making a change in IF. If there are other, useful changes to IF
>>>> that have been postponed, then perhaps they should be reconsidered now
>> as
>>>> well.
>>>>>>
>>>>>>
>>>>>> On Wed, Apr 24, 2013 at 3:26 PM, Luis Bernardo <
>> lmpmbernardo@gmail.com>
>>>> wrote:
>>>>>>
>>>>>> These are good suggestions. I am fully aware of the shortcomings that
>>>> you pointed out, but the only other option seemed to be to codify the
>>>> mappings in IF, similar to your first suggestion. However that would
>> mean
>>>> changing IF which is not something we are keen to do since that impacts
>>>> applications that rely on the current format.
>>>>>>
>>>>>> Are you saying that with your second approach there is no need to
>>>> change IF?
>>>>>>
>>>>>>
>>>>>> On 4/24/13 7:38 PM, Glenn Adams wrote:
>>>>>>> Sure. One way to do this would be to add child elements to the
>> <font/>
>>>> element in IF output as follows:
>>>>>>>
>>>>>>> <font family="Lateef" style="normal" ...>
>>>>>>>   <pua code="0xE000" gid="139"/>
>>>>>>>   <pua code="0xE001" gid="481"/>
>>>>>>>   <pua code="0xE002" gid="219"/>
>>>>>>> </font>
>>>>>>>
>>>>>>> where these PUA mappings are collected by iterating over the
>>>> characters of TextAreas governed by the <font/> element. These
>> characters
>>>> might be iterated upon invoking TextArea.add{Word,Space}, and collecting
>>>> this info in text areas.
>>>>>>>
>>>>>>> Alternatively, MultiByteFont.getUsedGlyphs() could be used to (1)
>>>> determine which glyph codes were referenced by the document, (2) given
>>>> these used codes, iterate of the the CMAP mappings to find which PUA
>> codes
>>>> were generated for those glyph codes, then (3) output the <pua/>
>> elements
>>>> (above) as required.
>>>>>>>
>>>>>>> Finally, when reading an IF file, these <pua/> elements would be used
>>>> to augment the font's CMAP (keeping in mind that when reading the font,
>>>> MultiByteFont.createPrivateUseMappings() may have already been called,
>> and
>>>> thus the mappings in <pua/> elements may need to be replaced or merged.
>>>>>>>
>>>>>>> I can imagine various other optimizations on the above theme to make
>>>> this readily workable.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Apr 24, 2013 at 3:18 AM, Chris Bowditch <
>>>> bowditch_chris@hotmail.com> wrote:
>>>>>>> Hi Glenn,
>>>>>>>
>>>>>>> Can you suggest an alternative approach please?
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Chris
>>>>>>>
>>>>>>>
>>>>>>> On 24/04/2013 02:41, Glenn Adams wrote:
>>>>>>> I don't like this. It negates any additional processing that may have
>>>> occurred, such as letter spacing. It requires the IF to repeat part of
>> the
>>>> layout process. Bad idea.
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Apr 23, 2013 at 3:11 PM, Luis Bernardo <
>> lmpmbernardo@gmail.com<mailto:
>>>> lmpmbernardo@gmail.com>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>     With the approach implemented by Simon what gets written to the
>> IF
>>>>>>>     file is the original sequence, not the mapped sequence. Then when
>>>>>>>     generating PDF from IF the same code that would generate the
>>>>>>>     synthesized mappings when generating PDF straight from FO is
>>>>>>>     called to recreate the mappings. So I don't think we can say
>> there
>>>>>>>     is information about the mappings in the text nodes.
>>>>>>>
>>>>>>>
>>>>>>>     On 4/23/13 5:50 AM, Glenn Adams wrote:
>>>>>>>     Ah, I reread your earlier (private) message. I see the problem
>>>>>>>     has to do with the use of synthesized PUA mappings. Here, the
>>>>>>>     problem really is that the font should always have a CMAP entry
>>>>>>>     that maps to every glyph that can be produced by the GSUB
>>>>>>>     process. However, not all fonts do this, so in the case in point,
>>>>>>>     we have to synthesize some mapping, from which we have to turn to
>>>>>>>     PUA assignments. This works when we generate PDF since we
>>>>>>>     generate a subset font that contains the synthesized mappings.
>>>>>>>     However, I can see that if this is going to IF instead of PDF/PS,
>>>>>>>     then we need to find a way to recreate those synthesized
>> mappings.
>>>>>>>
>>>>>>>     I think this information is really font-specific, and should not
>>>>>>>     be tied to specific text nodes though. So if Simon's fix uses
>>>>>>>     text nodes, then that is probably not the best approach.
>>>>>>>
>>>>>>>
>>>>>>>     On Mon, Apr 22, 2013 at 10:45 PM, Glenn Adams <glenn@skynav.com
>>>>>>>     <ma...@skynav.com>> wrote:
>>>>>>>
>>>>>>>         I'm presently at W3C WG meetings this week, but I'll try to
>>>>>>>         get on my schedule. I'm not sure what the IF->PS/PDF problem
>>>>>>>         is, since the IF->PDF path is clearly working from my tests.
>>>>>>>
>>>>>>>
>>>>>>>         On Mon, Apr 22, 2013 at 4:27 PM, Luis Bernardo
>>>>>>>         <lmpmbernardo@gmail.com <ma...@gmail.com>>
>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>             Glenn,
>>>>>>>
>>>>>>>             Can you give your opinion about the approach used by
>>>>>>>             Simon? As I mentioned before (in a private message), the
>>>>>>>             IF -> PS/PDF route does not work in your original CS
>>>>>>>             patch (for the languages that CS targets) due to the
>>>>>>>             mapped sequences. Simon's approach works but requires
>>>>>>>             keeping the original sequences alongside the mapped ones.
>>>>>>>             I think it is a good approach but I would like to know if
>>>>>>>             you have a better suggestion before we apply the patch.
>>>>>>>
>>>>>>>             Thanks,
>>>>>>>             Luis
>>>>>>>
>>>>>>>
>>>>>>>             On 4/22/13 3:23 PM, Chris Bowditch (JIRA) wrote:
>>>>>>>
>>>>>>>                 [
>>>>>>>
>>>>
>> https://issues.apache.org/jira/browse/FOP-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>>>>>>>                 ]
>>>>>>>
>>>>>>>                 Chris Bowditch reassigned FOP-2210:
>>>>>>>                 -----------------------------------
>>>>>>>
>>>>>>>                      Assignee: Chris Bowditch
>>>>>>>
>>>>>>>                     [PATCH] Complex script IF to output missing
>> glyphs
>>>>>>>
>> --------------------------------------------------
>>>>>>>
>>>>>>>                                      Key: FOP-2210
>>>>>>>                                      URL:
>>>>>>>                     https://issues.apache.org/jira/browse/FOP-2210
>>>>>>>                                  Project: Fop
>>>>>>>                               Issue Type: Bug
>>>>>>>                                 Reporter: simon steiner
>>>>>>>                                 Assignee: Chris Bowditch
>>>>>>>                              Attachments: csspeedtrunk.patch,
>>>>>>>                     fop.xconf, test.fo <http://test.fo>
>>>>>>>
>>>>>>>
>>>>>>>                     fop test.fo <http://test.fo> -c fop.xconf -if
>>>>>>>
>>>>>>>                     application/pdf expected.if.xml
>>>>>>>                     fop -c fop.xconf -ifin expected.if.xml out.pdf
>>>>>>>
>>>>>>>                 --
>>>>>>>                 This message is automatically generated by JIRA.
>>>>>>>                 If you think it was sent incorrectly, please contact
>>>>>>>                 your JIRA administrators
>>>>>>>                 For more information on JIRA, see:
>>>>>>>                 http://www.atlassian.com/software/jira
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: [jira] [Assigned] (FOP-2210) [PATCH] Complex script IF to output missing glyphs

Posted by Glenn Adams <gl...@skynav.com>.

On Thu, Apr 25, 2013 at 1:08 PM, Vincent Hennebert <vh...@gmail.com>wrote:

> On 25/04/13 17:48, Glenn Adams wrote:
> > On Thu, Apr 25, 2013 at 2:31 AM, Vincent Hennebert <vhennebert@gmail.com
> >wrote:
> >
> >>
> >> It doesn’t shock me to store text as text in the IF and to re-do the
> >> glyph mapping when rendering it to the final output format. This is
> >> actually how it is done ATM.
> >>
> >
> > I think this a bad idea for the reasons that Alexios mentioned, and that
> I
> > previously mentioned about recreating sufficient layout context to repeat
> > the process reliably.
>
> What exactly do you mean by ‘sufficient layout context’? What would be
> missing from the IF that would prevent to re-do the glyph mapping?
>

Off hand, we would need:

   - language
   - script
   - font features to be applied (with parameters)
   - letter-spacing settings

There are probably others. I just don't see any reason to use this approach.


>
>
> >> Sure it may become more costly when you start using complex scripts,
> >> but
> >> that would have to be confirmed with some profiling first and foremost.
> >> We might be surprised.
> >>
> >> We should keep in mind that it’s a perfectly reasonable use case to add
> >> text to the IF as part of a post-processing step. That text will have to
> >> go through the glyph mapping code anyway.
> >>
> >> Also, to have copy-paste work properly from PDF the original text must
> >> be present in the IF.
> >>
> >
> > Agreed, but this is a different requirement. And doesn't entail
> > reconstructing part of the layout context and repeating the character to
> > glyph mapping and positioning process.
>
> You’ll have to do that for text added at post-process time anyway?
>

I don't understand what this means.


>
>
> >> Storing information about the private use area in the IF is exposing
> >> internal implementation details of FOP.
> >
> >
> > I disagree. In fact, it is working around a bug that exists in certain
> > fonts which forces FOP to make use of synthesized PUA mappings. The bug
> is
> > that the font designer did not fully populate the original CMAP, i.e.,
> > include a mapping for every accessible glyph.
>
> I still don’t get it I’m afraid. Where in the TrueType spec is it stated
> that every glyph should have an entry in the cmap?


It doesn't. But if someone uses a font, wants to present a glyph that has
no mapping, and must use character codes, then it won't work.


> Why can’t FOP just
> use the glyph ID? Surely that information is enough?
>

Well, for one thing, the IF interface for renderText uses a character
string, not a glyph index string, and the IF XML format uses Unicode code
points.


>
>
> >> When going the direct FO to PDF
> >> route, mapping glyphs to character codes to re-map them again into
> >> glyphs when creating the PDF is sub-optimal. We might as well work with
> >> the glyph indices all the way through.
> >>
> >
> > This is possible, but wouldn't it require two separate paths through the
> IF
> > layer, and would it not work for non-PDF output?
>
> I don’t think so. The original text should be passed through anyway to
> create the ToUnicode cmap.


Why?


> So PDF can use the glyph mapping to generate
> the text operators and the original text for the ToUnicode cmap. The IF
> renderer just streams out the original text. And the other renderers
> just deal with the glyph mapping.
>

Since the technique I suggests will work and does not require this, then
this (repeating the character to glyph mapping, positioning, and layout
process) isn't necessary. I have agreed, however, that embedding the
original UC text for performing copy and find operations will be useful,
for which there is already an open bug [1].

[1] https://issues.apache.org/jira/browse/FOP-2204


>
>
> Vincent
>
>
> > I suspect this falls under
> > the category of "premature optimization", on which Knuth says "Premature
> > optimization is the root of all evil (or at least most of it) in
> > programming."
> >
> >
> >>
> >>
> >> Vincent
> >>
> >>
> >>> On 25 Apr 2013, at 01:52, Glenn Adams <gl...@skynav.com> wrote:
> >>>
> >>>> I see no option but to modify IF. We modified IF for 1.1 in the first
> >> place.  We have recently made quite a number of backward incompatible
> >> changes to the FOP public APIs. I expect the next release will need to
> bump
> >> the major version to 2 for FOP due to these changes, so there is little
> >> risk in making a change in IF. If there are other, useful changes to IF
> >> that have been postponed, then perhaps they should be reconsidered now
> as
> >> well.
> >>>>
> >>>>
> >>>> On Wed, Apr 24, 2013 at 3:26 PM, Luis Bernardo <
> lmpmbernardo@gmail.com>
> >> wrote:
> >>>>
> >>>> These are good suggestions. I am fully aware of the shortcomings that
> >> you pointed out, but the only other option seemed to be to codify the
> >> mappings in IF, similar to your first suggestion. However that would
> mean
> >> changing IF which is not something we are keen to do since that impacts
> >> applications that rely on the current format.
> >>>>
> >>>> Are you saying that with your second approach there is no need to
> >> change IF?
> >>>>
> >>>>
> >>>> On 4/24/13 7:38 PM, Glenn Adams wrote:
> >>>>> Sure. One way to do this would be to add child elements to the
> <font/>
> >> element in IF output as follows:
> >>>>>
> >>>>> <font family="Lateef" style="normal" ...>
> >>>>>   <pua code="0xE000" gid="139"/>
> >>>>>   <pua code="0xE001" gid="481"/>
> >>>>>   <pua code="0xE002" gid="219"/>
> >>>>> </font>
> >>>>>
> >>>>> where these PUA mappings are collected by iterating over the
> >> characters of TextAreas governed by the <font/> element. These
> characters
> >> might be iterated upon invoking TextArea.add{Word,Space}, and collecting
> >> this info in text areas.
> >>>>>
> >>>>> Alternatively, MultiByteFont.getUsedGlyphs() could be used to (1)
> >> determine which glyph codes were referenced by the document, (2) given
> >> these used codes, iterate of the the CMAP mappings to find which PUA
> codes
> >> were generated for those glyph codes, then (3) output the <pua/>
> elements
> >> (above) as required.
> >>>>>
> >>>>> Finally, when reading an IF file, these <pua/> elements would be used
> >> to augment the font's CMAP (keeping in mind that when reading the font,
> >> MultiByteFont.createPrivateUseMappings() may have already been called,
> and
> >> thus the mappings in <pua/> elements may need to be replaced or merged.
> >>>>>
> >>>>> I can imagine various other optimizations on the above theme to make
> >> this readily workable.
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Wed, Apr 24, 2013 at 3:18 AM, Chris Bowditch <
> >> bowditch_chris@hotmail.com> wrote:
> >>>>> Hi Glenn,
> >>>>>
> >>>>> Can you suggest an alternative approach please?
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> Chris
> >>>>>
> >>>>>
> >>>>> On 24/04/2013 02:41, Glenn Adams wrote:
> >>>>> I don't like this. It negates any additional processing that may have
> >> occurred, such as letter spacing. It requires the IF to repeat part of
> the
> >> layout process. Bad idea.
> >>>>>
> >>>>>
> >>>>> On Tue, Apr 23, 2013 at 3:11 PM, Luis Bernardo <
> lmpmbernardo@gmail.com<mailto:
> >> lmpmbernardo@gmail.com>> wrote:
> >>>>>
> >>>>>
> >>>>>     With the approach implemented by Simon what gets written to the
> IF
> >>>>>     file is the original sequence, not the mapped sequence. Then when
> >>>>>     generating PDF from IF the same code that would generate the
> >>>>>     synthesized mappings when generating PDF straight from FO is
> >>>>>     called to recreate the mappings. So I don't think we can say
> there
> >>>>>     is information about the mappings in the text nodes.
> >>>>>
> >>>>>
> >>>>>     On 4/23/13 5:50 AM, Glenn Adams wrote:
> >>>>>     Ah, I reread your earlier (private) message. I see the problem
> >>>>>     has to do with the use of synthesized PUA mappings. Here, the
> >>>>>     problem really is that the font should always have a CMAP entry
> >>>>>     that maps to every glyph that can be produced by the GSUB
> >>>>>     process. However, not all fonts do this, so in the case in point,
> >>>>>     we have to synthesize some mapping, from which we have to turn to
> >>>>>     PUA assignments. This works when we generate PDF since we
> >>>>>     generate a subset font that contains the synthesized mappings.
> >>>>>     However, I can see that if this is going to IF instead of PDF/PS,
> >>>>>     then we need to find a way to recreate those synthesized
> mappings.
> >>>>>
> >>>>>     I think this information is really font-specific, and should not
> >>>>>     be tied to specific text nodes though. So if Simon's fix uses
> >>>>>     text nodes, then that is probably not the best approach.
> >>>>>
> >>>>>
> >>>>>     On Mon, Apr 22, 2013 at 10:45 PM, Glenn Adams <glenn@skynav.com
> >>>>>     <ma...@skynav.com>> wrote:
> >>>>>
> >>>>>         I'm presently at W3C WG meetings this week, but I'll try to
> >>>>>         get on my schedule. I'm not sure what the IF->PS/PDF problem
> >>>>>         is, since the IF->PDF path is clearly working from my tests.
> >>>>>
> >>>>>
> >>>>>         On Mon, Apr 22, 2013 at 4:27 PM, Luis Bernardo
> >>>>>         <lmpmbernardo@gmail.com <ma...@gmail.com>>
> >> wrote:
> >>>>>
> >>>>>
> >>>>>             Glenn,
> >>>>>
> >>>>>             Can you give your opinion about the approach used by
> >>>>>             Simon? As I mentioned before (in a private message), the
> >>>>>             IF -> PS/PDF route does not work in your original CS
> >>>>>             patch (for the languages that CS targets) due to the
> >>>>>             mapped sequences. Simon's approach works but requires
> >>>>>             keeping the original sequences alongside the mapped ones.
> >>>>>             I think it is a good approach but I would like to know if
> >>>>>             you have a better suggestion before we apply the patch.
> >>>>>
> >>>>>             Thanks,
> >>>>>             Luis
> >>>>>
> >>>>>
> >>>>>             On 4/22/13 3:23 PM, Chris Bowditch (JIRA) wrote:
> >>>>>
> >>>>>                 [
> >>>>>
> >>
> https://issues.apache.org/jira/browse/FOP-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
> >>>>>                 ]
> >>>>>
> >>>>>                 Chris Bowditch reassigned FOP-2210:
> >>>>>                 -----------------------------------
> >>>>>
> >>>>>                      Assignee: Chris Bowditch
> >>>>>
> >>>>>                     [PATCH] Complex script IF to output missing
> glyphs
> >>>>>
> --------------------------------------------------
> >>>>>
> >>>>>                                      Key: FOP-2210
> >>>>>                                      URL:
> >>>>>                     https://issues.apache.org/jira/browse/FOP-2210
> >>>>>                                  Project: Fop
> >>>>>                               Issue Type: Bug
> >>>>>                                 Reporter: simon steiner
> >>>>>                                 Assignee: Chris Bowditch
> >>>>>                              Attachments: csspeedtrunk.patch,
> >>>>>                     fop.xconf, test.fo <http://test.fo>
> >>>>>
> >>>>>
> >>>>>                     fop test.fo <http://test.fo> -c fop.xconf -if
> >>>>>
> >>>>>                     application/pdf expected.if.xml
> >>>>>                     fop -c fop.xconf -ifin expected.if.xml out.pdf
> >>>>>
> >>>>>                 --
> >>>>>                 This message is automatically generated by JIRA.
> >>>>>                 If you think it was sent incorrectly, please contact
> >>>>>                 your JIRA administrators
> >>>>>                 For more information on JIRA, see:
> >>>>>                 http://www.atlassian.com/software/jira
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> >
>

Re: [jira] [Assigned] (FOP-2210) [PATCH] Complex script IF to output missing glyphs

Posted by Vincent Hennebert <vh...@gmail.com>.

On 25/04/13 17:48, Glenn Adams wrote:
> On Thu, Apr 25, 2013 at 2:31 AM, Vincent Hennebert <vh...@gmail.com>wrote:
> 
>>
>> It doesn’t shock me to store text as text in the IF and to re-do the
>> glyph mapping when rendering it to the final output format. This is
>> actually how it is done ATM.
>>
> 
> I think this a bad idea for the reasons that Alexios mentioned, and that I
> previously mentioned about recreating sufficient layout context to repeat
> the process reliably.

What exactly do you mean by ‘sufficient layout context’? What would be
missing from the IF that would prevent to re-do the glyph mapping?


>> Sure it may become more costly when you start using complex scripts, 
>> but
>> that would have to be confirmed with some profiling first and foremost.
>> We might be surprised.
>>
>> We should keep in mind that it’s a perfectly reasonable use case to add
>> text to the IF as part of a post-processing step. That text will have to
>> go through the glyph mapping code anyway.
>>
>> Also, to have copy-paste work properly from PDF the original text must
>> be present in the IF.
>>
> 
> Agreed, but this is a different requirement. And doesn't entail
> reconstructing part of the layout context and repeating the character to
> glyph mapping and positioning process.

You’ll have to do that for text added at post-process time anyway?


>> Storing information about the private use area in the IF is exposing
>> internal implementation details of FOP.
> 
> 
> I disagree. In fact, it is working around a bug that exists in certain
> fonts which forces FOP to make use of synthesized PUA mappings. The bug is
> that the font designer did not fully populate the original CMAP, i.e.,
> include a mapping for every accessible glyph.

I still don’t get it I’m afraid. Where in the TrueType spec is it stated
that every glyph should have an entry in the cmap? Why can’t FOP just
use the glyph ID? Surely that information is enough?


>> When going the direct FO to PDF
>> route, mapping glyphs to character codes to re-map them again into
>> glyphs when creating the PDF is sub-optimal. We might as well work with
>> the glyph indices all the way through.
>>
> 
> This is possible, but wouldn't it require two separate paths through the IF
> layer, and would it not work for non-PDF output?

I don’t think so. The original text should be passed through anyway to
create the ToUnicode cmap. So PDF can use the glyph mapping to generate
the text operators and the original text for the ToUnicode cmap. The IF
renderer just streams out the original text. And the other renderers
just deal with the glyph mapping.


Vincent


> I suspect this falls under
> the category of "premature optimization", on which Knuth says "Premature
> optimization is the root of all evil (or at least most of it) in
> programming."
> 
> 
>>
>>
>> Vincent
>>
>>
>>> On 25 Apr 2013, at 01:52, Glenn Adams <gl...@skynav.com> wrote:
>>>
>>>> I see no option but to modify IF. We modified IF for 1.1 in the first
>> place.  We have recently made quite a number of backward incompatible
>> changes to the FOP public APIs. I expect the next release will need to bump
>> the major version to 2 for FOP due to these changes, so there is little
>> risk in making a change in IF. If there are other, useful changes to IF
>> that have been postponed, then perhaps they should be reconsidered now as
>> well.
>>>>
>>>>
>>>> On Wed, Apr 24, 2013 at 3:26 PM, Luis Bernardo <lm...@gmail.com>
>> wrote:
>>>>
>>>> These are good suggestions. I am fully aware of the shortcomings that
>> you pointed out, but the only other option seemed to be to codify the
>> mappings in IF, similar to your first suggestion. However that would mean
>> changing IF which is not something we are keen to do since that impacts
>> applications that rely on the current format.
>>>>
>>>> Are you saying that with your second approach there is no need to
>> change IF?
>>>>
>>>>
>>>> On 4/24/13 7:38 PM, Glenn Adams wrote:
>>>>> Sure. One way to do this would be to add child elements to the <font/>
>> element in IF output as follows:
>>>>>
>>>>> <font family="Lateef" style="normal" ...>
>>>>>   <pua code="0xE000" gid="139"/>
>>>>>   <pua code="0xE001" gid="481"/>
>>>>>   <pua code="0xE002" gid="219"/>
>>>>> </font>
>>>>>
>>>>> where these PUA mappings are collected by iterating over the
>> characters of TextAreas governed by the <font/> element. These characters
>> might be iterated upon invoking TextArea.add{Word,Space}, and collecting
>> this info in text areas.
>>>>>
>>>>> Alternatively, MultiByteFont.getUsedGlyphs() could be used to (1)
>> determine which glyph codes were referenced by the document, (2) given
>> these used codes, iterate of the the CMAP mappings to find which PUA codes
>> were generated for those glyph codes, then (3) output the <pua/> elements
>> (above) as required.
>>>>>
>>>>> Finally, when reading an IF file, these <pua/> elements would be used
>> to augment the font's CMAP (keeping in mind that when reading the font,
>> MultiByteFont.createPrivateUseMappings() may have already been called, and
>> thus the mappings in <pua/> elements may need to be replaced or merged.
>>>>>
>>>>> I can imagine various other optimizations on the above theme to make
>> this readily workable.
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Apr 24, 2013 at 3:18 AM, Chris Bowditch <
>> bowditch_chris@hotmail.com> wrote:
>>>>> Hi Glenn,
>>>>>
>>>>> Can you suggest an alternative approach please?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Chris
>>>>>
>>>>>
>>>>> On 24/04/2013 02:41, Glenn Adams wrote:
>>>>> I don't like this. It negates any additional processing that may have
>> occurred, such as letter spacing. It requires the IF to repeat part of the
>> layout process. Bad idea.
>>>>>
>>>>>
>>>>> On Tue, Apr 23, 2013 at 3:11 PM, Luis Bernardo <lmpmbernardo@gmail.com<mailto:
>> lmpmbernardo@gmail.com>> wrote:
>>>>>
>>>>>
>>>>>     With the approach implemented by Simon what gets written to the IF
>>>>>     file is the original sequence, not the mapped sequence. Then when
>>>>>     generating PDF from IF the same code that would generate the
>>>>>     synthesized mappings when generating PDF straight from FO is
>>>>>     called to recreate the mappings. So I don't think we can say there
>>>>>     is information about the mappings in the text nodes.
>>>>>
>>>>>
>>>>>     On 4/23/13 5:50 AM, Glenn Adams wrote:
>>>>>     Ah, I reread your earlier (private) message. I see the problem
>>>>>     has to do with the use of synthesized PUA mappings. Here, the
>>>>>     problem really is that the font should always have a CMAP entry
>>>>>     that maps to every glyph that can be produced by the GSUB
>>>>>     process. However, not all fonts do this, so in the case in point,
>>>>>     we have to synthesize some mapping, from which we have to turn to
>>>>>     PUA assignments. This works when we generate PDF since we
>>>>>     generate a subset font that contains the synthesized mappings.
>>>>>     However, I can see that if this is going to IF instead of PDF/PS,
>>>>>     then we need to find a way to recreate those synthesized mappings.
>>>>>
>>>>>     I think this information is really font-specific, and should not
>>>>>     be tied to specific text nodes though. So if Simon's fix uses
>>>>>     text nodes, then that is probably not the best approach.
>>>>>
>>>>>
>>>>>     On Mon, Apr 22, 2013 at 10:45 PM, Glenn Adams <glenn@skynav.com
>>>>>     <ma...@skynav.com>> wrote:
>>>>>
>>>>>         I'm presently at W3C WG meetings this week, but I'll try to
>>>>>         get on my schedule. I'm not sure what the IF->PS/PDF problem
>>>>>         is, since the IF->PDF path is clearly working from my tests.
>>>>>
>>>>>
>>>>>         On Mon, Apr 22, 2013 at 4:27 PM, Luis Bernardo
>>>>>         <lmpmbernardo@gmail.com <ma...@gmail.com>>
>> wrote:
>>>>>
>>>>>
>>>>>             Glenn,
>>>>>
>>>>>             Can you give your opinion about the approach used by
>>>>>             Simon? As I mentioned before (in a private message), the
>>>>>             IF -> PS/PDF route does not work in your original CS
>>>>>             patch (for the languages that CS targets) due to the
>>>>>             mapped sequences. Simon's approach works but requires
>>>>>             keeping the original sequences alongside the mapped ones.
>>>>>             I think it is a good approach but I would like to know if
>>>>>             you have a better suggestion before we apply the patch.
>>>>>
>>>>>             Thanks,
>>>>>             Luis
>>>>>
>>>>>
>>>>>             On 4/22/13 3:23 PM, Chris Bowditch (JIRA) wrote:
>>>>>
>>>>>                 [
>>>>>
>> https://issues.apache.org/jira/browse/FOP-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>>>>>                 ]
>>>>>
>>>>>                 Chris Bowditch reassigned FOP-2210:
>>>>>                 -----------------------------------
>>>>>
>>>>>                      Assignee: Chris Bowditch
>>>>>
>>>>>                     [PATCH] Complex script IF to output missing glyphs
>>>>>                     --------------------------------------------------
>>>>>
>>>>>                                      Key: FOP-2210
>>>>>                                      URL:
>>>>>                     https://issues.apache.org/jira/browse/FOP-2210
>>>>>                                  Project: Fop
>>>>>                               Issue Type: Bug
>>>>>                                 Reporter: simon steiner
>>>>>                                 Assignee: Chris Bowditch
>>>>>                              Attachments: csspeedtrunk.patch,
>>>>>                     fop.xconf, test.fo <http://test.fo>
>>>>>
>>>>>
>>>>>                     fop test.fo <http://test.fo> -c fop.xconf -if
>>>>>
>>>>>                     application/pdf expected.if.xml
>>>>>                     fop -c fop.xconf -ifin expected.if.xml out.pdf
>>>>>
>>>>>                 --
>>>>>                 This message is automatically generated by JIRA.
>>>>>                 If you think it was sent incorrectly, please contact
>>>>>                 your JIRA administrators
>>>>>                 For more information on JIRA, see:
>>>>>                 http://www.atlassian.com/software/jira
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>

Re: [jira] [Assigned] (FOP-2210) [PATCH] Complex script IF to output missing glyphs

Posted by Glenn Adams <gl...@skynav.com>.

On Thu, Apr 25, 2013 at 2:31 AM, Vincent Hennebert <vh...@gmail.com>wrote:

>
> It doesn’t shock me to store text as text in the IF and to re-do the
> glyph mapping when rendering it to the final output format. This is
> actually how it is done ATM.
>

I think this a bad idea for the reasons that Alexios mentioned, and that I
previously mentioned about recreating sufficient layout context to repeat
the process reliably.


>
> Sure it may become more costly when you start using complex scripts, but
> that would have to be confirmed with some profiling first and foremost.
> We might be surprised.
>
> We should keep in mind that it’s a perfectly reasonable use case to add
> text to the IF as part of a post-processing step. That text will have to
> go through the glyph mapping code anyway.
>
> Also, to have copy-paste work properly from PDF the original text must
> be present in the IF.
>

Agreed, but this is a different requirement. And doesn't entail
reconstructing part of the layout context and repeating the character to
glyph mapping and positioning process.


> Storing information about the private use area in the IF is exposing
> internal implementation details of FOP.


I disagree. In fact, it is working around a bug that exists in certain
fonts which forces FOP to make use of synthesized PUA mappings. The bug is
that the font designer did not fully populate the original CMAP, i.e.,
include a mapping for every accessible glyph.


> When going the direct FO to PDF
> route, mapping glyphs to character codes to re-map them again into
> glyphs when creating the PDF is sub-optimal. We might as well work with
> the glyph indices all the way through.
>

This is possible, but wouldn't it require two separate paths through the IF
layer, and would it not work for non-PDF output? I suspect this falls under
the category of "premature optimization", on which Knuth says "Premature
optimization is the root of all evil (or at least most of it) in
programming."


>
>
> Vincent
>
>
> > On 25 Apr 2013, at 01:52, Glenn Adams <gl...@skynav.com> wrote:
> >
> >> I see no option but to modify IF. We modified IF for 1.1 in the first
> place.  We have recently made quite a number of backward incompatible
> changes to the FOP public APIs. I expect the next release will need to bump
> the major version to 2 for FOP due to these changes, so there is little
> risk in making a change in IF. If there are other, useful changes to IF
> that have been postponed, then perhaps they should be reconsidered now as
> well.
> >>
> >>
> >> On Wed, Apr 24, 2013 at 3:26 PM, Luis Bernardo <lm...@gmail.com>
> wrote:
> >>
> >> These are good suggestions. I am fully aware of the shortcomings that
> you pointed out, but the only other option seemed to be to codify the
> mappings in IF, similar to your first suggestion. However that would mean
> changing IF which is not something we are keen to do since that impacts
> applications that rely on the current format.
> >>
> >> Are you saying that with your second approach there is no need to
> change IF?
> >>
> >>
> >> On 4/24/13 7:38 PM, Glenn Adams wrote:
> >>> Sure. One way to do this would be to add child elements to the <font/>
> element in IF output as follows:
> >>>
> >>> <font family="Lateef" style="normal" ...>
> >>>   <pua code="0xE000" gid="139"/>
> >>>   <pua code="0xE001" gid="481"/>
> >>>   <pua code="0xE002" gid="219"/>
> >>> </font>
> >>>
> >>> where these PUA mappings are collected by iterating over the
> characters of TextAreas governed by the <font/> element. These characters
> might be iterated upon invoking TextArea.add{Word,Space}, and collecting
> this info in text areas.
> >>>
> >>> Alternatively, MultiByteFont.getUsedGlyphs() could be used to (1)
> determine which glyph codes were referenced by the document, (2) given
> these used codes, iterate of the the CMAP mappings to find which PUA codes
> were generated for those glyph codes, then (3) output the <pua/> elements
> (above) as required.
> >>>
> >>> Finally, when reading an IF file, these <pua/> elements would be used
> to augment the font's CMAP (keeping in mind that when reading the font,
> MultiByteFont.createPrivateUseMappings() may have already been called, and
> thus the mappings in <pua/> elements may need to be replaced or merged.
> >>>
> >>> I can imagine various other optimizations on the above theme to make
> this readily workable.
> >>>
> >>>
> >>>
> >>> On Wed, Apr 24, 2013 at 3:18 AM, Chris Bowditch <
> bowditch_chris@hotmail.com> wrote:
> >>> Hi Glenn,
> >>>
> >>> Can you suggest an alternative approach please?
> >>>
> >>> Thanks,
> >>>
> >>> Chris
> >>>
> >>>
> >>> On 24/04/2013 02:41, Glenn Adams wrote:
> >>> I don't like this. It negates any additional processing that may have
> occurred, such as letter spacing. It requires the IF to repeat part of the
> layout process. Bad idea.
> >>>
> >>>
> >>> On Tue, Apr 23, 2013 at 3:11 PM, Luis Bernardo <lmpmbernardo@gmail.com<mailto:
> lmpmbernardo@gmail.com>> wrote:
> >>>
> >>>
> >>>     With the approach implemented by Simon what gets written to the IF
> >>>     file is the original sequence, not the mapped sequence. Then when
> >>>     generating PDF from IF the same code that would generate the
> >>>     synthesized mappings when generating PDF straight from FO is
> >>>     called to recreate the mappings. So I don't think we can say there
> >>>     is information about the mappings in the text nodes.
> >>>
> >>>
> >>>     On 4/23/13 5:50 AM, Glenn Adams wrote:
> >>>     Ah, I reread your earlier (private) message. I see the problem
> >>>     has to do with the use of synthesized PUA mappings. Here, the
> >>>     problem really is that the font should always have a CMAP entry
> >>>     that maps to every glyph that can be produced by the GSUB
> >>>     process. However, not all fonts do this, so in the case in point,
> >>>     we have to synthesize some mapping, from which we have to turn to
> >>>     PUA assignments. This works when we generate PDF since we
> >>>     generate a subset font that contains the synthesized mappings.
> >>>     However, I can see that if this is going to IF instead of PDF/PS,
> >>>     then we need to find a way to recreate those synthesized mappings.
> >>>
> >>>     I think this information is really font-specific, and should not
> >>>     be tied to specific text nodes though. So if Simon's fix uses
> >>>     text nodes, then that is probably not the best approach.
> >>>
> >>>
> >>>     On Mon, Apr 22, 2013 at 10:45 PM, Glenn Adams <glenn@skynav.com
> >>>     <ma...@skynav.com>> wrote:
> >>>
> >>>         I'm presently at W3C WG meetings this week, but I'll try to
> >>>         get on my schedule. I'm not sure what the IF->PS/PDF problem
> >>>         is, since the IF->PDF path is clearly working from my tests.
> >>>
> >>>
> >>>         On Mon, Apr 22, 2013 at 4:27 PM, Luis Bernardo
> >>>         <lmpmbernardo@gmail.com <ma...@gmail.com>>
> wrote:
> >>>
> >>>
> >>>             Glenn,
> >>>
> >>>             Can you give your opinion about the approach used by
> >>>             Simon? As I mentioned before (in a private message), the
> >>>             IF -> PS/PDF route does not work in your original CS
> >>>             patch (for the languages that CS targets) due to the
> >>>             mapped sequences. Simon's approach works but requires
> >>>             keeping the original sequences alongside the mapped ones.
> >>>             I think it is a good approach but I would like to know if
> >>>             you have a better suggestion before we apply the patch.
> >>>
> >>>             Thanks,
> >>>             Luis
> >>>
> >>>
> >>>             On 4/22/13 3:23 PM, Chris Bowditch (JIRA) wrote:
> >>>
> >>>                 [
> >>>
> https://issues.apache.org/jira/browse/FOP-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
> >>>                 ]
> >>>
> >>>                 Chris Bowditch reassigned FOP-2210:
> >>>                 -----------------------------------
> >>>
> >>>                      Assignee: Chris Bowditch
> >>>
> >>>                     [PATCH] Complex script IF to output missing glyphs
> >>>                     --------------------------------------------------
> >>>
> >>>                                      Key: FOP-2210
> >>>                                      URL:
> >>>                     https://issues.apache.org/jira/browse/FOP-2210
> >>>                                  Project: Fop
> >>>                               Issue Type: Bug
> >>>                                 Reporter: simon steiner
> >>>                                 Assignee: Chris Bowditch
> >>>                              Attachments: csspeedtrunk.patch,
> >>>                     fop.xconf, test.fo <http://test.fo>
> >>>
> >>>
> >>>                     fop test.fo <http://test.fo> -c fop.xconf -if
> >>>
> >>>                     application/pdf expected.if.xml
> >>>                     fop -c fop.xconf -ifin expected.if.xml out.pdf
> >>>
> >>>                 --
> >>>                 This message is automatically generated by JIRA.
> >>>                 If you think it was sent incorrectly, please contact
> >>>                 your JIRA administrators
> >>>                 For more information on JIRA, see:
> >>>                 http://www.atlassian.com/software/jira
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >>
> >
>

Re: [jira] [Assigned] (FOP-2210) [PATCH] Complex script IF to output missing glyphs

Posted by Vincent Hennebert <vh...@gmail.com>.

On 25/04/13 10:35, Alexios Giotis wrote:
> For our use cases, it would be much better to add new child elements to IF or do other similar extensions, that having to repeat part of the costly layout process. Besides repeating, the FO -> IF is easily executed by multiple threads, while the IF->PDF can not be parallelised (without big changes).

It doesn’t shock me to store text as text in the IF and to re-do the
glyph mapping when rendering it to the final output format. This is
actually how it is done ATM.

Sure it may become more costly when you start using complex scripts, but
that would have to be confirmed with some profiling first and foremost.
We might be surprised.

We should keep in mind that it’s a perfectly reasonable use case to add
text to the IF as part of a post-processing step. That text will have to
go through the glyph mapping code anyway.

Also, to have copy-paste work properly from PDF the original text must
be present in the IF.

Storing information about the private use area in the IF is exposing
internal implementation details of FOP. When going the direct FO to PDF
route, mapping glyphs to character codes to re-map them again into
glyphs when creating the PDF is sub-optimal. We might as well work with
the glyph indices all the way through.


Vincent


> On 25 Apr 2013, at 01:52, Glenn Adams <gl...@skynav.com> wrote:
> 
>> I see no option but to modify IF. We modified IF for 1.1 in the first place.  We have recently made quite a number of backward incompatible changes to the FOP public APIs. I expect the next release will need to bump the major version to 2 for FOP due to these changes, so there is little risk in making a change in IF. If there are other, useful changes to IF that have been postponed, then perhaps they should be reconsidered now as well.
>>
>>
>> On Wed, Apr 24, 2013 at 3:26 PM, Luis Bernardo <lm...@gmail.com> wrote:
>>
>> These are good suggestions. I am fully aware of the shortcomings that you pointed out, but the only other option seemed to be to codify the mappings in IF, similar to your first suggestion. However that would mean changing IF which is not something we are keen to do since that impacts applications that rely on the current format.
>>
>> Are you saying that with your second approach there is no need to change IF?
>>
>>
>> On 4/24/13 7:38 PM, Glenn Adams wrote:
>>> Sure. One way to do this would be to add child elements to the <font/> element in IF output as follows:
>>>
>>> <font family="Lateef" style="normal" ...>
>>>   <pua code="0xE000" gid="139"/>
>>>   <pua code="0xE001" gid="481"/>
>>>   <pua code="0xE002" gid="219"/>
>>> </font>
>>>
>>> where these PUA mappings are collected by iterating over the characters of TextAreas governed by the <font/> element. These characters might be iterated upon invoking TextArea.add{Word,Space}, and collecting this info in text areas.
>>>
>>> Alternatively, MultiByteFont.getUsedGlyphs() could be used to (1) determine which glyph codes were referenced by the document, (2) given these used codes, iterate of the the CMAP mappings to find which PUA codes were generated for those glyph codes, then (3) output the <pua/> elements (above) as required.
>>>
>>> Finally, when reading an IF file, these <pua/> elements would be used to augment the font's CMAP (keeping in mind that when reading the font, MultiByteFont.createPrivateUseMappings() may have already been called, and thus the mappings in <pua/> elements may need to be replaced or merged.
>>>
>>> I can imagine various other optimizations on the above theme to make this readily workable.
>>>
>>>
>>>
>>> On Wed, Apr 24, 2013 at 3:18 AM, Chris Bowditch <bo...@hotmail.com> wrote:
>>> Hi Glenn,
>>>
>>> Can you suggest an alternative approach please?
>>>
>>> Thanks,
>>>
>>> Chris
>>>
>>>
>>> On 24/04/2013 02:41, Glenn Adams wrote:
>>> I don't like this. It negates any additional processing that may have occurred, such as letter spacing. It requires the IF to repeat part of the layout process. Bad idea.
>>>
>>>
>>> On Tue, Apr 23, 2013 at 3:11 PM, Luis Bernardo <lmpmbernardo@gmail.com <ma...@gmail.com>> wrote:
>>>
>>>
>>>     With the approach implemented by Simon what gets written to the IF
>>>     file is the original sequence, not the mapped sequence. Then when
>>>     generating PDF from IF the same code that would generate the
>>>     synthesized mappings when generating PDF straight from FO is
>>>     called to recreate the mappings. So I don't think we can say there
>>>     is information about the mappings in the text nodes.
>>>
>>>
>>>     On 4/23/13 5:50 AM, Glenn Adams wrote:
>>>     Ah, I reread your earlier (private) message. I see the problem
>>>     has to do with the use of synthesized PUA mappings. Here, the
>>>     problem really is that the font should always have a CMAP entry
>>>     that maps to every glyph that can be produced by the GSUB
>>>     process. However, not all fonts do this, so in the case in point,
>>>     we have to synthesize some mapping, from which we have to turn to
>>>     PUA assignments. This works when we generate PDF since we
>>>     generate a subset font that contains the synthesized mappings.
>>>     However, I can see that if this is going to IF instead of PDF/PS,
>>>     then we need to find a way to recreate those synthesized mappings.
>>>
>>>     I think this information is really font-specific, and should not
>>>     be tied to specific text nodes though. So if Simon's fix uses
>>>     text nodes, then that is probably not the best approach.
>>>
>>>
>>>     On Mon, Apr 22, 2013 at 10:45 PM, Glenn Adams <glenn@skynav.com
>>>     <ma...@skynav.com>> wrote:
>>>
>>>         I'm presently at W3C WG meetings this week, but I'll try to
>>>         get on my schedule. I'm not sure what the IF->PS/PDF problem
>>>         is, since the IF->PDF path is clearly working from my tests.
>>>
>>>
>>>         On Mon, Apr 22, 2013 at 4:27 PM, Luis Bernardo
>>>         <lmpmbernardo@gmail.com <ma...@gmail.com>> wrote:
>>>
>>>
>>>             Glenn,
>>>
>>>             Can you give your opinion about the approach used by
>>>             Simon? As I mentioned before (in a private message), the
>>>             IF -> PS/PDF route does not work in your original CS
>>>             patch (for the languages that CS targets) due to the
>>>             mapped sequences. Simon's approach works but requires
>>>             keeping the original sequences alongside the mapped ones.
>>>             I think it is a good approach but I would like to know if
>>>             you have a better suggestion before we apply the patch.
>>>
>>>             Thanks,
>>>             Luis
>>>
>>>
>>>             On 4/22/13 3:23 PM, Chris Bowditch (JIRA) wrote:
>>>
>>>                 [
>>>                 https://issues.apache.org/jira/browse/FOP-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>>>                 ]
>>>
>>>                 Chris Bowditch reassigned FOP-2210:
>>>                 -----------------------------------
>>>
>>>                      Assignee: Chris Bowditch
>>>
>>>                     [PATCH] Complex script IF to output missing glyphs
>>>                     --------------------------------------------------
>>>
>>>                                      Key: FOP-2210
>>>                                      URL:
>>>                     https://issues.apache.org/jira/browse/FOP-2210
>>>                                  Project: Fop
>>>                               Issue Type: Bug
>>>                                 Reporter: simon steiner
>>>                                 Assignee: Chris Bowditch
>>>                              Attachments: csspeedtrunk.patch,
>>>                     fop.xconf, test.fo <http://test.fo>
>>>
>>>
>>>                     fop test.fo <http://test.fo> -c fop.xconf -if
>>>
>>>                     application/pdf expected.if.xml
>>>                     fop -c fop.xconf -ifin expected.if.xml out.pdf
>>>
>>>                 --
>>>                 This message is automatically generated by JIRA.
>>>                 If you think it was sent incorrectly, please contact
>>>                 your JIRA administrators
>>>                 For more information on JIRA, see:
>>>                 http://www.atlassian.com/software/jira
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>

Re: [jira] [Assigned] (FOP-2210) [PATCH] Complex script IF to output missing glyphs

Posted by Alexios Giotis <al...@gmail.com>.

For our use cases, it would be much better to add new child elements to IF or do other similar extensions, that having to repeat part of the costly layout process. Besides repeating, the FO -> IF is easily executed by multiple threads, while the IF->PDF can not be parallelised (without big changes).


On 25 Apr 2013, at 01:52, Glenn Adams <gl...@skynav.com> wrote:

> I see no option but to modify IF. We modified IF for 1.1 in the first place.  We have recently made quite a number of backward incompatible changes to the FOP public APIs. I expect the next release will need to bump the major version to 2 for FOP due to these changes, so there is little risk in making a change in IF. If there are other, useful changes to IF that have been postponed, then perhaps they should be reconsidered now as well.
> 
> 
> On Wed, Apr 24, 2013 at 3:26 PM, Luis Bernardo <lm...@gmail.com> wrote:
> 
> These are good suggestions. I am fully aware of the shortcomings that you pointed out, but the only other option seemed to be to codify the mappings in IF, similar to your first suggestion. However that would mean changing IF which is not something we are keen to do since that impacts applications that rely on the current format.
> 
> Are you saying that with your second approach there is no need to change IF?
> 
> 
> On 4/24/13 7:38 PM, Glenn Adams wrote:
>> Sure. One way to do this would be to add child elements to the <font/> element in IF output as follows:
>> 
>> <font family="Lateef" style="normal" ...>
>>   <pua code="0xE000" gid="139"/>
>>   <pua code="0xE001" gid="481"/>
>>   <pua code="0xE002" gid="219"/>
>> </font>
>> 
>> where these PUA mappings are collected by iterating over the characters of TextAreas governed by the <font/> element. These characters might be iterated upon invoking TextArea.add{Word,Space}, and collecting this info in text areas.
>> 
>> Alternatively, MultiByteFont.getUsedGlyphs() could be used to (1) determine which glyph codes were referenced by the document, (2) given these used codes, iterate of the the CMAP mappings to find which PUA codes were generated for those glyph codes, then (3) output the <pua/> elements (above) as required.
>> 
>> Finally, when reading an IF file, these <pua/> elements would be used to augment the font's CMAP (keeping in mind that when reading the font, MultiByteFont.createPrivateUseMappings() may have already been called, and thus the mappings in <pua/> elements may need to be replaced or merged.
>> 
>> I can imagine various other optimizations on the above theme to make this readily workable.
>> 
>> 
>> 
>> On Wed, Apr 24, 2013 at 3:18 AM, Chris Bowditch <bo...@hotmail.com> wrote:
>> Hi Glenn,
>> 
>> Can you suggest an alternative approach please?
>> 
>> Thanks,
>> 
>> Chris
>> 
>> 
>> On 24/04/2013 02:41, Glenn Adams wrote:
>> I don't like this. It negates any additional processing that may have occurred, such as letter spacing. It requires the IF to repeat part of the layout process. Bad idea.
>> 
>> 
>> On Tue, Apr 23, 2013 at 3:11 PM, Luis Bernardo <lmpmbernardo@gmail.com <ma...@gmail.com>> wrote:
>> 
>> 
>>     With the approach implemented by Simon what gets written to the IF
>>     file is the original sequence, not the mapped sequence. Then when
>>     generating PDF from IF the same code that would generate the
>>     synthesized mappings when generating PDF straight from FO is
>>     called to recreate the mappings. So I don't think we can say there
>>     is information about the mappings in the text nodes.
>> 
>> 
>>     On 4/23/13 5:50 AM, Glenn Adams wrote:
>>     Ah, I reread your earlier (private) message. I see the problem
>>     has to do with the use of synthesized PUA mappings. Here, the
>>     problem really is that the font should always have a CMAP entry
>>     that maps to every glyph that can be produced by the GSUB
>>     process. However, not all fonts do this, so in the case in point,
>>     we have to synthesize some mapping, from which we have to turn to
>>     PUA assignments. This works when we generate PDF since we
>>     generate a subset font that contains the synthesized mappings.
>>     However, I can see that if this is going to IF instead of PDF/PS,
>>     then we need to find a way to recreate those synthesized mappings.
>> 
>>     I think this information is really font-specific, and should not
>>     be tied to specific text nodes though. So if Simon's fix uses
>>     text nodes, then that is probably not the best approach.
>> 
>> 
>>     On Mon, Apr 22, 2013 at 10:45 PM, Glenn Adams <glenn@skynav.com
>>     <ma...@skynav.com>> wrote:
>> 
>>         I'm presently at W3C WG meetings this week, but I'll try to
>>         get on my schedule. I'm not sure what the IF->PS/PDF problem
>>         is, since the IF->PDF path is clearly working from my tests.
>> 
>> 
>>         On Mon, Apr 22, 2013 at 4:27 PM, Luis Bernardo
>>         <lmpmbernardo@gmail.com <ma...@gmail.com>> wrote:
>> 
>> 
>>             Glenn,
>> 
>>             Can you give your opinion about the approach used by
>>             Simon? As I mentioned before (in a private message), the
>>             IF -> PS/PDF route does not work in your original CS
>>             patch (for the languages that CS targets) due to the
>>             mapped sequences. Simon's approach works but requires
>>             keeping the original sequences alongside the mapped ones.
>>             I think it is a good approach but I would like to know if
>>             you have a better suggestion before we apply the patch.
>> 
>>             Thanks,
>>             Luis
>> 
>> 
>>             On 4/22/13 3:23 PM, Chris Bowditch (JIRA) wrote:
>> 
>>                 [
>>                 https://issues.apache.org/jira/browse/FOP-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>>                 ]
>> 
>>                 Chris Bowditch reassigned FOP-2210:
>>                 -----------------------------------
>> 
>>                      Assignee: Chris Bowditch
>> 
>>                     [PATCH] Complex script IF to output missing glyphs
>>                     --------------------------------------------------
>> 
>>                                      Key: FOP-2210
>>                                      URL:
>>                     https://issues.apache.org/jira/browse/FOP-2210
>>                                  Project: Fop
>>                               Issue Type: Bug
>>                                 Reporter: simon steiner
>>                                 Assignee: Chris Bowditch
>>                              Attachments: csspeedtrunk.patch,
>>                     fop.xconf, test.fo <http://test.fo>
>> 
>> 
>>                     fop test.fo <http://test.fo> -c fop.xconf -if
>> 
>>                     application/pdf expected.if.xml
>>                     fop -c fop.xconf -ifin expected.if.xml out.pdf
>> 
>>                 --
>>                 This message is automatically generated by JIRA.
>>                 If you think it was sent incorrectly, please contact
>>                 your JIRA administrators
>>                 For more information on JIRA, see:
>>                 http://www.atlassian.com/software/jira
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
> 
>

Re: [jira] [Assigned] (FOP-2210) [PATCH] Complex script IF to output missing glyphs

Posted by Chris Bowditch <bo...@hotmail.com>.

Hi Glenn, Luis,

It's true that modifying IF can cause us problems for older programs 
designed to modify it. However, since none of those older applications 
work with CS, then I'm confident that the additional elements proposed 
by Glenn shouldn't be a problem.

Thanks,

Chris

On 24/04/2013 23:52, Glenn Adams wrote:
> I see no option but to modify IF. We modified IF for 1.1 in the first 
> place.  We have recently made quite a number of backward incompatible 
> changes to the FOP public APIs. I expect the next release will need to 
> bump the major version to 2 for FOP due to these changes, so there is 
> little risk in making a change in IF. If there are other, useful 
> changes to IF that have been postponed, then perhaps they should be 
> reconsidered now as well.
>
>
> On Wed, Apr 24, 2013 at 3:26 PM, Luis Bernardo <lmpmbernardo@gmail.com 
> <ma...@gmail.com>> wrote:
>
>
>     These are good suggestions. I am fully aware of the shortcomings
>     that you pointed out, but the only other option seemed to be to
>     codify the mappings in IF, similar to your first suggestion.
>     However that would mean changing IF which is not something we are
>     keen to do since that impacts applications that rely on the
>     current format.
>
>     Are you saying that with your second approach there is no need to
>     change IF?
>
>
>     On 4/24/13 7:38 PM, Glenn Adams wrote:
>>     Sure. One way to do this would be to add child elements to the
>>     <font/> element in IF output as follows:
>>
>>     <font family="Lateef" style="normal" ...>
>>       <pua code="0xE000" gid="139"/>
>>       <pua code="0xE001" gid="481"/>
>>       <pua code="0xE002" gid="219"/>
>>     </font>
>>
>>     where these PUA mappings are collected by iterating over the
>>     characters of TextAreas governed by the <font/> element. These
>>     characters might be iterated upon invoking
>>     TextArea.add{Word,Space}, and collecting this info in text areas.
>>
>>     Alternatively, MultiByteFont.getUsedGlyphs() could be used to (1)
>>     determine which glyph codes were referenced by the document, (2)
>>     given these used codes, iterate of the the CMAP mappings to find
>>     which PUA codes were generated for those glyph codes, then (3)
>>     output the <pua/> elements (above) as required.
>>
>>     Finally, when reading an IF file, these <pua/> elements would be
>>     used to augment the font's CMAP (keeping in mind that when
>>     reading the font, MultiByteFont.createPrivateUseMappings() may
>>     have already been called, and thus the mappings in <pua/>
>>     elements may need to be replaced or merged.
>>
>>     I can imagine various other optimizations on the above theme to
>>     make this readily workable.
>>
>>
>>
>>     On Wed, Apr 24, 2013 at 3:18 AM, Chris Bowditch
>>     <bowditch_chris@hotmail.com <ma...@hotmail.com>>
>>     wrote:
>>
>>         Hi Glenn,
>>
>>         Can you suggest an alternative approach please?
>>
>>         Thanks,
>>
>>         Chris
>>
>>
>>         On 24/04/2013 02:41, Glenn Adams wrote:
>>
>>             I don't like this. It negates any additional processing
>>             that may have occurred, such as letter spacing. It
>>             requires the IF to repeat part of the layout process. Bad
>>             idea.
>>
>>
>>             On Tue, Apr 23, 2013 at 3:11 PM, Luis Bernardo
>>             <lmpmbernardo@gmail.com <ma...@gmail.com>
>>             <mailto:lmpmbernardo@gmail.com
>>             <ma...@gmail.com>>> wrote:
>>
>>
>>                 With the approach implemented by Simon what gets
>>             written to the IF
>>                 file is the original sequence, not the mapped
>>             sequence. Then when
>>                 generating PDF from IF the same code that would
>>             generate the
>>                 synthesized mappings when generating PDF straight
>>             from FO is
>>                 called to recreate the mappings. So I don't think we
>>             can say there
>>                 is information about the mappings in the text nodes.
>>
>>
>>                 On 4/23/13 5:50 AM, Glenn Adams wrote:
>>
>>                     Ah, I reread your earlier (private) message. I
>>                 see the problem
>>                     has to do with the use of synthesized PUA
>>                 mappings. Here, the
>>                     problem really is that the font should always
>>                 have a CMAP entry
>>                     that maps to every glyph that can be produced by
>>                 the GSUB
>>                     process. However, not all fonts do this, so in
>>                 the case in point,
>>                     we have to synthesize some mapping, from which we
>>                 have to turn to
>>                     PUA assignments. This works when we generate PDF
>>                 since we
>>                     generate a subset font that contains the
>>                 synthesized mappings.
>>                     However, I can see that if this is going to IF
>>                 instead of PDF/PS,
>>                     then we need to find a way to recreate those
>>                 synthesized mappings.
>>
>>                     I think this information is really font-specific,
>>                 and should not
>>                     be tied to specific text nodes though. So if
>>                 Simon's fix uses
>>                     text nodes, then that is probably not the best
>>                 approach.
>>
>>
>>                     On Mon, Apr 22, 2013 at 10:45 PM, Glenn Adams
>>                 <glenn@skynav.com <ma...@skynav.com>
>>                     <mailto:glenn@skynav.com
>>                 <ma...@skynav.com>>> wrote:
>>
>>                         I'm presently at W3C WG meetings this week,
>>                 but I'll try to
>>                         get on my schedule. I'm not sure what the
>>                 IF->PS/PDF problem
>>                         is, since the IF->PDF path is clearly working
>>                 from my tests.
>>
>>
>>                         On Mon, Apr 22, 2013 at 4:27 PM, Luis Bernardo
>>                         <lmpmbernardo@gmail.com
>>                 <ma...@gmail.com>
>>                 <mailto:lmpmbernardo@gmail.com
>>                 <ma...@gmail.com>>> wrote:
>>
>>
>>                             Glenn,
>>
>>                             Can you give your opinion about the
>>                 approach used by
>>                             Simon? As I mentioned before (in a
>>                 private message), the
>>                             IF -> PS/PDF route does not work in your
>>                 original CS
>>                             patch (for the languages that CS targets)
>>                 due to the
>>                             mapped sequences. Simon's approach works
>>                 but requires
>>                             keeping the original sequences alongside
>>                 the mapped ones.
>>                             I think it is a good approach but I would
>>                 like to know if
>>                             you have a better suggestion before we
>>                 apply the patch.
>>
>>                             Thanks,
>>                             Luis
>>
>>
>>                             On 4/22/13 3:23 PM, Chris Bowditch (JIRA)
>>                 wrote:
>>
>>                                 [
>>                 https://issues.apache.org/jira/browse/FOP-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>>                                 ]
>>
>>                                 Chris Bowditch reassigned FOP-2210:
>>                 -----------------------------------
>>
>>                                      Assignee: Chris Bowditch
>>
>>                                     [PATCH] Complex script IF to
>>                 output missing glyphs
>>                 --------------------------------------------------
>>
>>                  Key: FOP-2210
>>                  URL:
>>                 https://issues.apache.org/jira/browse/FOP-2210
>>                  Project: Fop
>>                                               Issue Type: Bug
>>                 Reporter: simon steiner
>>                 Assignee: Chris Bowditch
>>                  Attachments: csspeedtrunk.patch,
>>                                     fop.xconf, test.fo
>>                 <http://test.fo> <http://test.fo>
>>
>>
>>                                     fop test.fo <http://test.fo>
>>                 <http://test.fo> -c fop.xconf -if
>>
>>                                     application/pdf expected.if.xml
>>                                     fop -c fop.xconf -ifin
>>                 expected.if.xml out.pdf
>>
>>                                 --
>>                                 This message is automatically
>>                 generated by JIRA.
>>                                 If you think it was sent incorrectly,
>>                 please contact
>>                                 your JIRA administrators
>>                                 For more information on JIRA, see:
>>                 http://www.atlassian.com/software/jira
>>
>>
>>
>>
>>
>>
>>
>>
>
>

Re: [jira] [Assigned] (FOP-2210) [PATCH] Complex script IF to output missing glyphs

Posted by Glenn Adams <gl...@skynav.com>.

I see no option but to modify IF. We modified IF for 1.1 in the first
place.  We have recently made quite a number of backward incompatible
changes to the FOP public APIs. I expect the next release will need to bump
the major version to 2 for FOP due to these changes, so there is little
risk in making a change in IF. If there are other, useful changes to IF
that have been postponed, then perhaps they should be reconsidered now as
well.


On Wed, Apr 24, 2013 at 3:26 PM, Luis Bernardo <lm...@gmail.com>wrote:

>
> These are good suggestions. I am fully aware of the shortcomings that you
> pointed out, but the only other option seemed to be to codify the mappings
> in IF, similar to your first suggestion. However that would mean changing
> IF which is not something we are keen to do since that impacts applications
> that rely on the current format.
>
> Are you saying that with your second approach there is no need to change
> IF?
>
>
> On 4/24/13 7:38 PM, Glenn Adams wrote:
>
> Sure. One way to do this would be to add child elements to the <font/>
> element in IF output as follows:
>
>  <font family="Lateef" style="normal" ...>
>    <pua code="0xE000" gid="139"/>
>   <pua code="0xE001" gid="481"/>
>    <pua code="0xE002" gid="219"/>
>  </font>
>
>  where these PUA mappings are collected by iterating over the characters
> of TextAreas governed by the <font/> element. These characters might be
> iterated upon invoking TextArea.add{Word,Space}, and collecting this info
> in text areas.
>
>  Alternatively, MultiByteFont.getUsedGlyphs() could be used to (1)
> determine which glyph codes were referenced by the document, (2) given
> these used codes, iterate of the the CMAP mappings to find which PUA codes
> were generated for those glyph codes, then (3) output the <pua/> elements
> (above) as required.
>
>  Finally, when reading an IF file, these <pua/> elements would be used to
> augment the font's CMAP (keeping in mind that when reading the font,
> MultiByteFont.createPrivateUseMappings() may have already been called, and
> thus the mappings in <pua/> elements may need to be replaced or merged.
>
>  I can imagine various other optimizations on the above theme to make
> this readily workable.
>
>
>
>  On Wed, Apr 24, 2013 at 3:18 AM, Chris Bowditch <
> bowditch_chris@hotmail.com> wrote:
>
>> Hi Glenn,
>>
>> Can you suggest an alternative approach please?
>>
>> Thanks,
>>
>> Chris
>>
>>
>> On 24/04/2013 02:41, Glenn Adams wrote:
>>
>>>  I don't like this. It negates any additional processing that may have
>>> occurred, such as letter spacing. It requires the IF to repeat part of the
>>> layout process. Bad idea.
>>>
>>>
>>>  On Tue, Apr 23, 2013 at 3:11 PM, Luis Bernardo <lmpmbernardo@gmail.com<mailto:
>>> lmpmbernardo@gmail.com>> wrote:
>>>
>>>
>>>     With the approach implemented by Simon what gets written to the IF
>>>     file is the original sequence, not the mapped sequence. Then when
>>>     generating PDF from IF the same code that would generate the
>>>     synthesized mappings when generating PDF straight from FO is
>>>     called to recreate the mappings. So I don't think we can say there
>>>     is information about the mappings in the text nodes.
>>>
>>>
>>>     On 4/23/13 5:50 AM, Glenn Adams wrote:
>>>
>>>>      Ah, I reread your earlier (private) message. I see the problem
>>>>     has to do with the use of synthesized PUA mappings. Here, the
>>>>     problem really is that the font should always have a CMAP entry
>>>>     that maps to every glyph that can be produced by the GSUB
>>>>     process. However, not all fonts do this, so in the case in point,
>>>>     we have to synthesize some mapping, from which we have to turn to
>>>>     PUA assignments. This works when we generate PDF since we
>>>>     generate a subset font that contains the synthesized mappings.
>>>>     However, I can see that if this is going to IF instead of PDF/PS,
>>>>     then we need to find a way to recreate those synthesized mappings.
>>>>
>>>>     I think this information is really font-specific, and should not
>>>>     be tied to specific text nodes though. So if Simon's fix uses
>>>>     text nodes, then that is probably not the best approach.
>>>>
>>>>
>>>>     On Mon, Apr 22, 2013 at 10:45 PM, Glenn Adams <glenn@skynav.com
>>>>      <ma...@skynav.com>> wrote:
>>>>
>>>>         I'm presently at W3C WG meetings this week, but I'll try to
>>>>         get on my schedule. I'm not sure what the IF->PS/PDF problem
>>>>         is, since the IF->PDF path is clearly working from my tests.
>>>>
>>>>
>>>>         On Mon, Apr 22, 2013 at 4:27 PM, Luis Bernardo
>>>>           <lmpmbernardo@gmail.com <ma...@gmail.com>>
>>>> wrote:
>>>>
>>>>
>>>>             Glenn,
>>>>
>>>>             Can you give your opinion about the approach used by
>>>>             Simon? As I mentioned before (in a private message), the
>>>>             IF -> PS/PDF route does not work in your original CS
>>>>             patch (for the languages that CS targets) due to the
>>>>             mapped sequences. Simon's approach works but requires
>>>>             keeping the original sequences alongside the mapped ones.
>>>>             I think it is a good approach but I would like to know if
>>>>             you have a better suggestion before we apply the patch.
>>>>
>>>>             Thanks,
>>>>             Luis
>>>>
>>>>
>>>>             On 4/22/13 3:23 PM, Chris Bowditch (JIRA) wrote:
>>>>
>>>>                 [
>>>>
>>>> https://issues.apache.org/jira/browse/FOP-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>>>>                 ]
>>>>
>>>>                 Chris Bowditch reassigned FOP-2210:
>>>>                 -----------------------------------
>>>>
>>>>                      Assignee: Chris Bowditch
>>>>
>>>>                     [PATCH] Complex script IF to output missing glyphs
>>>>                     --------------------------------------------------
>>>>
>>>>                                      Key: FOP-2210
>>>>                                      URL:
>>>>                     https://issues.apache.org/jira/browse/FOP-2210
>>>>                                  Project: Fop
>>>>                               Issue Type: Bug
>>>>                                 Reporter: simon steiner
>>>>                                 Assignee: Chris Bowditch
>>>>                              Attachments: csspeedtrunk.patch,
>>>>                      fop.xconf, test.fo <http://test.fo>
>>>>
>>>>
>>>>                     fop test.fo <http://test.fo> -c fop.xconf -if
>>>>
>>>>                     application/pdf expected.if.xml
>>>>                     fop -c fop.xconf -ifin expected.if.xml out.pdf
>>>>
>>>>                 --
>>>>                 This message is automatically generated by JIRA.
>>>>                 If you think it was sent incorrectly, please contact
>>>>                 your JIRA administrators
>>>>                 For more information on JIRA, see:
>>>>                 http://www.atlassian.com/software/jira
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>
>

Re: [jira] [Assigned] (FOP-2210) [PATCH] Complex script IF to output missing glyphs

Posted by Luis Bernardo <lm...@gmail.com>.

These are good suggestions. I am fully aware of the shortcomings that 
you pointed out, but the only other option seemed to be to codify the 
mappings in IF, similar to your first suggestion. However that would 
mean changing IF which is not something we are keen to do since that 
impacts applications that rely on the current format.

Are you saying that with your second approach there is no need to change IF?

On 4/24/13 7:38 PM, Glenn Adams wrote:
> Sure. One way to do this would be to add child elements to the <font/> 
> element in IF output as follows:
>
> <font family="Lateef" style="normal" ...>
>   <pua code="0xE000" gid="139"/>
> <pua code="0xE001" gid="481"/>
> <pua code="0xE002" gid="219"/>
> </font>
>
> where these PUA mappings are collected by iterating over the 
> characters of TextAreas governed by the <font/> element. These 
> characters might be iterated upon invoking TextArea.add{Word,Space}, 
> and collecting this info in text areas.
>
> Alternatively, MultiByteFont.getUsedGlyphs() could be used to (1) 
> determine which glyph codes were referenced by the document, (2) given 
> these used codes, iterate of the the CMAP mappings to find which PUA 
> codes were generated for those glyph codes, then (3) output the <pua/> 
> elements (above) as required.
>
> Finally, when reading an IF file, these <pua/> elements would be used 
> to augment the font's CMAP (keeping in mind that when reading the 
> font, MultiByteFont.createPrivateUseMappings() may have already been 
> called, and thus the mappings in <pua/> elements may need to be 
> replaced or merged.
>
> I can imagine various other optimizations on the above theme to make 
> this readily workable.
>
>
>
> On Wed, Apr 24, 2013 at 3:18 AM, Chris Bowditch 
> <bowditch_chris@hotmail.com <ma...@hotmail.com>> wrote:
>
>     Hi Glenn,
>
>     Can you suggest an alternative approach please?
>
>     Thanks,
>
>     Chris
>
>
>     On 24/04/2013 02:41, Glenn Adams wrote:
>
>         I don't like this. It negates any additional processing that
>         may have occurred, such as letter spacing. It requires the IF
>         to repeat part of the layout process. Bad idea.
>
>
>         On Tue, Apr 23, 2013 at 3:11 PM, Luis Bernardo
>         <lmpmbernardo@gmail.com <ma...@gmail.com>
>         <mailto:lmpmbernardo@gmail.com
>         <ma...@gmail.com>>> wrote:
>
>
>             With the approach implemented by Simon what gets written
>         to the IF
>             file is the original sequence, not the mapped sequence.
>         Then when
>             generating PDF from IF the same code that would generate the
>             synthesized mappings when generating PDF straight from FO is
>             called to recreate the mappings. So I don't think we can
>         say there
>             is information about the mappings in the text nodes.
>
>
>             On 4/23/13 5:50 AM, Glenn Adams wrote:
>
>                 Ah, I reread your earlier (private) message. I see the
>             problem
>                 has to do with the use of synthesized PUA mappings.
>             Here, the
>                 problem really is that the font should always have a
>             CMAP entry
>                 that maps to every glyph that can be produced by the GSUB
>                 process. However, not all fonts do this, so in the
>             case in point,
>                 we have to synthesize some mapping, from which we have
>             to turn to
>                 PUA assignments. This works when we generate PDF since we
>                 generate a subset font that contains the synthesized
>             mappings.
>                 However, I can see that if this is going to IF instead
>             of PDF/PS,
>                 then we need to find a way to recreate those
>             synthesized mappings.
>
>                 I think this information is really font-specific, and
>             should not
>                 be tied to specific text nodes though. So if Simon's
>             fix uses
>                 text nodes, then that is probably not the best approach.
>
>
>                 On Mon, Apr 22, 2013 at 10:45 PM, Glenn Adams
>             <glenn@skynav.com <ma...@skynav.com>
>                 <mailto:glenn@skynav.com <ma...@skynav.com>>>
>             wrote:
>
>                     I'm presently at W3C WG meetings this week, but
>             I'll try to
>                     get on my schedule. I'm not sure what the
>             IF->PS/PDF problem
>                     is, since the IF->PDF path is clearly working from
>             my tests.
>
>
>                     On Mon, Apr 22, 2013 at 4:27 PM, Luis Bernardo
>                     <lmpmbernardo@gmail.com
>             <ma...@gmail.com>
>             <mailto:lmpmbernardo@gmail.com
>             <ma...@gmail.com>>> wrote:
>
>
>                         Glenn,
>
>                         Can you give your opinion about the approach
>             used by
>                         Simon? As I mentioned before (in a private
>             message), the
>                         IF -> PS/PDF route does not work in your
>             original CS
>                         patch (for the languages that CS targets) due
>             to the
>                         mapped sequences. Simon's approach works but
>             requires
>                         keeping the original sequences alongside the
>             mapped ones.
>                         I think it is a good approach but I would like
>             to know if
>                         you have a better suggestion before we apply
>             the patch.
>
>                         Thanks,
>                         Luis
>
>
>                         On 4/22/13 3:23 PM, Chris Bowditch (JIRA) wrote:
>
>                             [
>             https://issues.apache.org/jira/browse/FOP-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>                             ]
>
>                             Chris Bowditch reassigned FOP-2210:
>                             -----------------------------------
>
>                                  Assignee: Chris Bowditch
>
>                                 [PATCH] Complex script IF to output
>             missing glyphs
>                                
>             --------------------------------------------------
>
>                                                  Key: FOP-2210
>                                                  URL:
>             https://issues.apache.org/jira/browse/FOP-2210
>                                              Project: Fop
>                                           Issue Type: Bug
>                                             Reporter: simon steiner
>                                             Assignee: Chris Bowditch
>                                          Attachments: csspeedtrunk.patch,
>                                 fop.xconf, test.fo <http://test.fo>
>             <http://test.fo>
>
>
>                                 fop test.fo <http://test.fo>
>             <http://test.fo> -c fop.xconf -if
>
>                                 application/pdf expected.if.xml
>                                 fop -c fop.xconf -ifin expected.if.xml
>             out.pdf
>
>                             --
>                             This message is automatically generated by
>             JIRA.
>                             If you think it was sent incorrectly,
>             please contact
>                             your JIRA administrators
>                             For more information on JIRA, see:
>             http://www.atlassian.com/software/jira
>
>
>
>
>
>
>
>

Re: [jira] [Assigned] (FOP-2210) [PATCH] Complex script IF to output missing glyphs

Posted by Glenn Adams <gl...@skynav.com>.

Sure. One way to do this would be to add child elements to the <font/>
element in IF output as follows:

<font family="Lateef" style="normal" ...>
  <pua code="0xE000" gid="139"/>
  <pua code="0xE001" gid="481"/>
  <pua code="0xE002" gid="219"/>
</font>

where these PUA mappings are collected by iterating over the characters of
TextAreas governed by the <font/> element. These characters might be
iterated upon invoking TextArea.add{Word,Space}, and collecting this info
in text areas.

Alternatively, MultiByteFont.getUsedGlyphs() could be used to (1) determine
which glyph codes were referenced by the document, (2) given these used
codes, iterate of the the CMAP mappings to find which PUA codes were
generated for those glyph codes, then (3) output the <pua/> elements
(above) as required.

Finally, when reading an IF file, these <pua/> elements would be used to
augment the font's CMAP (keeping in mind that when reading the font,
MultiByteFont.createPrivateUseMappings() may have already been called, and
thus the mappings in <pua/> elements may need to be replaced or merged.

I can imagine various other optimizations on the above theme to make this
readily workable.



On Wed, Apr 24, 2013 at 3:18 AM, Chris Bowditch
<bo...@hotmail.com>wrote:

> Hi Glenn,
>
> Can you suggest an alternative approach please?
>
> Thanks,
>
> Chris
>
>
> On 24/04/2013 02:41, Glenn Adams wrote:
>
>> I don't like this. It negates any additional processing that may have
>> occurred, such as letter spacing. It requires the IF to repeat part of the
>> layout process. Bad idea.
>>
>>
>> On Tue, Apr 23, 2013 at 3:11 PM, Luis Bernardo <lmpmbernardo@gmail.com<mailto:
>> lmpmbernardo@gmail.com**>> wrote:
>>
>>
>>     With the approach implemented by Simon what gets written to the IF
>>     file is the original sequence, not the mapped sequence. Then when
>>     generating PDF from IF the same code that would generate the
>>     synthesized mappings when generating PDF straight from FO is
>>     called to recreate the mappings. So I don't think we can say there
>>     is information about the mappings in the text nodes.
>>
>>
>>     On 4/23/13 5:50 AM, Glenn Adams wrote:
>>
>>>     Ah, I reread your earlier (private) message. I see the problem
>>>     has to do with the use of synthesized PUA mappings. Here, the
>>>     problem really is that the font should always have a CMAP entry
>>>     that maps to every glyph that can be produced by the GSUB
>>>     process. However, not all fonts do this, so in the case in point,
>>>     we have to synthesize some mapping, from which we have to turn to
>>>     PUA assignments. This works when we generate PDF since we
>>>     generate a subset font that contains the synthesized mappings.
>>>     However, I can see that if this is going to IF instead of PDF/PS,
>>>     then we need to find a way to recreate those synthesized mappings.
>>>
>>>     I think this information is really font-specific, and should not
>>>     be tied to specific text nodes though. So if Simon's fix uses
>>>     text nodes, then that is probably not the best approach.
>>>
>>>
>>>     On Mon, Apr 22, 2013 at 10:45 PM, Glenn Adams <glenn@skynav.com
>>>     <ma...@skynav.com>> wrote:
>>>
>>>         I'm presently at W3C WG meetings this week, but I'll try to
>>>         get on my schedule. I'm not sure what the IF->PS/PDF problem
>>>         is, since the IF->PDF path is clearly working from my tests.
>>>
>>>
>>>         On Mon, Apr 22, 2013 at 4:27 PM, Luis Bernardo
>>>         <lmpmbernardo@gmail.com <mailto:lmpmbernardo@gmail.com**>>
>>> wrote:
>>>
>>>
>>>             Glenn,
>>>
>>>             Can you give your opinion about the approach used by
>>>             Simon? As I mentioned before (in a private message), the
>>>             IF -> PS/PDF route does not work in your original CS
>>>             patch (for the languages that CS targets) due to the
>>>             mapped sequences. Simon's approach works but requires
>>>             keeping the original sequences alongside the mapped ones.
>>>             I think it is a good approach but I would like to know if
>>>             you have a better suggestion before we apply the patch.
>>>
>>>             Thanks,
>>>             Luis
>>>
>>>
>>>             On 4/22/13 3:23 PM, Chris Bowditch (JIRA) wrote:
>>>
>>>                 [
>>>                 https://issues.apache.org/**
>>> jira/browse/FOP-2210?page=com.**atlassian.jira.plugin.system.**
>>> issuetabpanels:all-tabpanel<https://issues.apache.org/jira/browse/FOP-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel>
>>>                 ]
>>>
>>>                 Chris Bowditch reassigned FOP-2210:
>>>                 ------------------------------**-----
>>>
>>>                      Assignee: Chris Bowditch
>>>
>>>                     [PATCH] Complex script IF to output missing glyphs
>>>                     ------------------------------**--------------------
>>>
>>>                                      Key: FOP-2210
>>>                                      URL:
>>>                     https://issues.apache.org/**jira/browse/FOP-2210<https://issues.apache.org/jira/browse/FOP-2210>
>>>                                  Project: Fop
>>>                               Issue Type: Bug
>>>                                 Reporter: simon steiner
>>>                                 Assignee: Chris Bowditch
>>>                              Attachments: csspeedtrunk.patch,
>>>                     fop.xconf, test.fo <http://test.fo>
>>>
>>>
>>>                     fop test.fo <http://test.fo> -c fop.xconf -if
>>>
>>>                     application/pdf expected.if.xml
>>>                     fop -c fop.xconf -ifin expected.if.xml out.pdf
>>>
>>>                 --
>>>                 This message is automatically generated by JIRA.
>>>                 If you think it was sent incorrectly, please contact
>>>                 your JIRA administrators
>>>                 For more information on JIRA, see:
>>>                 http://www.atlassian.com/**software/jira<http://www.atlassian.com/software/jira>
>>>
>>>
>>>
>>>
>>>
>>
>>
>

Re: [jira] [Assigned] (FOP-2210) [PATCH] Complex script IF to output missing glyphs

Posted by Chris Bowditch <bo...@hotmail.com>.

Hi Glenn,

Can you suggest an alternative approach please?

Thanks,

Chris

On 24/04/2013 02:41, Glenn Adams wrote:
> I don't like this. It negates any additional processing that may have 
> occurred, such as letter spacing. It requires the IF to repeat part of 
> the layout process. Bad idea.
>
>
> On Tue, Apr 23, 2013 at 3:11 PM, Luis Bernardo <lmpmbernardo@gmail.com 
> <ma...@gmail.com>> wrote:
>
>
>     With the approach implemented by Simon what gets written to the IF
>     file is the original sequence, not the mapped sequence. Then when
>     generating PDF from IF the same code that would generate the
>     synthesized mappings when generating PDF straight from FO is
>     called to recreate the mappings. So I don't think we can say there
>     is information about the mappings in the text nodes.
>
>
>     On 4/23/13 5:50 AM, Glenn Adams wrote:
>>     Ah, I reread your earlier (private) message. I see the problem
>>     has to do with the use of synthesized PUA mappings. Here, the
>>     problem really is that the font should always have a CMAP entry
>>     that maps to every glyph that can be produced by the GSUB
>>     process. However, not all fonts do this, so in the case in point,
>>     we have to synthesize some mapping, from which we have to turn to
>>     PUA assignments. This works when we generate PDF since we
>>     generate a subset font that contains the synthesized mappings.
>>     However, I can see that if this is going to IF instead of PDF/PS,
>>     then we need to find a way to recreate those synthesized mappings.
>>
>>     I think this information is really font-specific, and should not
>>     be tied to specific text nodes though. So if Simon's fix uses
>>     text nodes, then that is probably not the best approach.
>>
>>
>>     On Mon, Apr 22, 2013 at 10:45 PM, Glenn Adams <glenn@skynav.com
>>     <ma...@skynav.com>> wrote:
>>
>>         I'm presently at W3C WG meetings this week, but I'll try to
>>         get on my schedule. I'm not sure what the IF->PS/PDF problem
>>         is, since the IF->PDF path is clearly working from my tests.
>>
>>
>>         On Mon, Apr 22, 2013 at 4:27 PM, Luis Bernardo
>>         <lmpmbernardo@gmail.com <ma...@gmail.com>> wrote:
>>
>>
>>             Glenn,
>>
>>             Can you give your opinion about the approach used by
>>             Simon? As I mentioned before (in a private message), the
>>             IF -> PS/PDF route does not work in your original CS
>>             patch (for the languages that CS targets) due to the
>>             mapped sequences. Simon's approach works but requires
>>             keeping the original sequences alongside the mapped ones.
>>             I think it is a good approach but I would like to know if
>>             you have a better suggestion before we apply the patch.
>>
>>             Thanks,
>>             Luis
>>
>>
>>             On 4/22/13 3:23 PM, Chris Bowditch (JIRA) wrote:
>>
>>                 [
>>                 https://issues.apache.org/jira/browse/FOP-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>>                 ]
>>
>>                 Chris Bowditch reassigned FOP-2210:
>>                 -----------------------------------
>>
>>                      Assignee: Chris Bowditch
>>
>>                     [PATCH] Complex script IF to output missing glyphs
>>                     --------------------------------------------------
>>
>>                                      Key: FOP-2210
>>                                      URL:
>>                     https://issues.apache.org/jira/browse/FOP-2210
>>                                  Project: Fop
>>                               Issue Type: Bug
>>                                 Reporter: simon steiner
>>                                 Assignee: Chris Bowditch
>>                              Attachments: csspeedtrunk.patch,
>>                     fop.xconf, test.fo <http://test.fo>
>>
>>
>>                     fop test.fo <http://test.fo> -c fop.xconf -if
>>                     application/pdf expected.if.xml
>>                     fop -c fop.xconf -ifin expected.if.xml out.pdf
>>
>>                 --
>>                 This message is automatically generated by JIRA.
>>                 If you think it was sent incorrectly, please contact
>>                 your JIRA administrators
>>                 For more information on JIRA, see:
>>                 http://www.atlassian.com/software/jira
>>
>>
>>
>>
>
>

Re: [jira] [Assigned] (FOP-2210) [PATCH] Complex script IF to output missing glyphs

Posted by Glenn Adams <gl...@skynav.com>.

I don't like this. It negates any additional processing that may have
occurred, such as letter spacing. It requires the IF to repeat part of the
layout process. Bad idea.


On Tue, Apr 23, 2013 at 3:11 PM, Luis Bernardo <lm...@gmail.com>wrote:

>
> With the approach implemented by Simon what gets written to the IF file is
> the original sequence, not the mapped sequence. Then when generating PDF
> from IF the same code that would generate the synthesized mappings when
> generating PDF straight from FO is called to recreate the mappings. So I
> don't think we can say there is information about the mappings in the text
> nodes.
>
>
> On 4/23/13 5:50 AM, Glenn Adams wrote:
>
> Ah, I reread your earlier (private) message. I see the problem has to do
> with the use of synthesized PUA mappings. Here, the problem really is that
> the font should always have a CMAP entry that maps to every glyph that can
> be produced by the GSUB process. However, not all fonts do this, so in the
> case in point, we have to synthesize some mapping, from which we have to
> turn to PUA assignments. This works when we generate PDF since we generate
> a subset font that contains the synthesized mappings. However, I can see
> that if this is going to IF instead of PDF/PS, then we need to find a way
> to recreate those synthesized mappings.
>
>  I think this information is really font-specific, and should not be tied
> to specific text nodes though. So if Simon's fix uses text nodes, then that
> is probably not the best approach.
>
>
> On Mon, Apr 22, 2013 at 10:45 PM, Glenn Adams <gl...@skynav.com> wrote:
>
>> I'm presently at W3C WG meetings this week, but I'll try to get on my
>> schedule. I'm not sure what the IF->PS/PDF problem is, since the IF->PDF
>> path is clearly working from my tests.
>>
>>
>> On Mon, Apr 22, 2013 at 4:27 PM, Luis Bernardo <lm...@gmail.com>wrote:
>>
>>>
>>> Glenn,
>>>
>>> Can you give your opinion about the approach used by Simon? As I
>>> mentioned before (in a private message), the IF -> PS/PDF route does not
>>> work in your original CS patch (for the languages that CS targets) due to
>>> the mapped sequences. Simon's approach works but requires keeping the
>>> original sequences alongside the mapped ones. I think it is a good approach
>>> but I would like to know if you have a better suggestion before we apply
>>> the patch.
>>>
>>> Thanks,
>>> Luis
>>>
>>>
>>> On 4/22/13 3:23 PM, Chris Bowditch (JIRA) wrote:
>>>
>>>>       [
>>>> https://issues.apache.org/jira/browse/FOP-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
>>>>
>>>> Chris Bowditch reassigned FOP-2210:
>>>> -----------------------------------
>>>>
>>>>      Assignee: Chris Bowditch
>>>>
>>>>
>>>>> [PATCH] Complex script IF to output missing glyphs
>>>>> --------------------------------------------------
>>>>>
>>>>>                  Key: FOP-2210
>>>>>                  URL: https://issues.apache.org/jira/browse/FOP-2210
>>>>>              Project: Fop
>>>>>           Issue Type: Bug
>>>>>             Reporter: simon steiner
>>>>>             Assignee: Chris Bowditch
>>>>>          Attachments: csspeedtrunk.patch, fop.xconf, test.fo
>>>>>
>>>>>
>>>>> fop test.fo -c fop.xconf -if application/pdf expected.if.xml
>>>>> fop -c fop.xconf -ifin expected.if.xml out.pdf
>>>>>
>>>> --
>>>> This message is automatically generated by JIRA.
>>>> If you think it was sent incorrectly, please contact your JIRA
>>>> administrators
>>>> For more information on JIRA, see:
>>>> http://www.atlassian.com/software/jira
>>>>
>>>
>>>
>>
>
>

Re: [jira] [Assigned] (FOP-2210) [PATCH] Complex script IF to output missing glyphs

Posted by Luis Bernardo <lm...@gmail.com>.

With the approach implemented by Simon what gets written to the IF file 
is the original sequence, not the mapped sequence. Then when generating 
PDF from IF the same code that would generate the synthesized mappings 
when generating PDF straight from FO is called to recreate the mappings. 
So I don't think we can say there is information about the mappings in 
the text nodes.

On 4/23/13 5:50 AM, Glenn Adams wrote:
> Ah, I reread your earlier (private) message. I see the problem has to 
> do with the use of synthesized PUA mappings. Here, the problem really 
> is that the font should always have a CMAP entry that maps to every 
> glyph that can be produced by the GSUB process. However, not all fonts 
> do this, so in the case in point, we have to synthesize some mapping, 
> from which we have to turn to PUA assignments. This works when we 
> generate PDF since we generate a subset font that contains the 
> synthesized mappings. However, I can see that if this is going to IF 
> instead of PDF/PS, then we need to find a way to recreate those 
> synthesized mappings.
>
> I think this information is really font-specific, and should not be 
> tied to specific text nodes though. So if Simon's fix uses text nodes, 
> then that is probably not the best approach.
>
>
> On Mon, Apr 22, 2013 at 10:45 PM, Glenn Adams <glenn@skynav.com 
> <ma...@skynav.com>> wrote:
>
>     I'm presently at W3C WG meetings this week, but I'll try to get on
>     my schedule. I'm not sure what the IF->PS/PDF problem is, since
>     the IF->PDF path is clearly working from my tests.
>
>
>     On Mon, Apr 22, 2013 at 4:27 PM, Luis Bernardo
>     <lmpmbernardo@gmail.com <ma...@gmail.com>> wrote:
>
>
>         Glenn,
>
>         Can you give your opinion about the approach used by Simon? As
>         I mentioned before (in a private message), the IF -> PS/PDF
>         route does not work in your original CS patch (for the
>         languages that CS targets) due to the mapped sequences.
>         Simon's approach works but requires keeping the original
>         sequences alongside the mapped ones. I think it is a good
>         approach but I would like to know if you have a better
>         suggestion before we apply the patch.
>
>         Thanks,
>         Luis
>
>
>         On 4/22/13 3:23 PM, Chris Bowditch (JIRA) wrote:
>
>                   [
>             https://issues.apache.org/jira/browse/FOP-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>             ]
>
>             Chris Bowditch reassigned FOP-2210:
>             -----------------------------------
>
>                  Assignee: Chris Bowditch
>
>                 [PATCH] Complex script IF to output missing glyphs
>                 --------------------------------------------------
>
>                                  Key: FOP-2210
>                                  URL:
>                 https://issues.apache.org/jira/browse/FOP-2210
>                              Project: Fop
>                           Issue Type: Bug
>                             Reporter: simon steiner
>                             Assignee: Chris Bowditch
>                          Attachments: csspeedtrunk.patch, fop.xconf,
>                 test.fo <http://test.fo>
>
>
>                 fop test.fo <http://test.fo> -c fop.xconf -if
>                 application/pdf expected.if.xml
>                 fop -c fop.xconf -ifin expected.if.xml out.pdf
>
>             --
>             This message is automatically generated by JIRA.
>             If you think it was sent incorrectly, please contact your
>             JIRA administrators
>             For more information on JIRA, see:
>             http://www.atlassian.com/software/jira
>
>
>
>

Re: [jira] [Assigned] (FOP-2210) [PATCH] Complex script IF to output missing glyphs

Posted by Glenn Adams <gl...@skynav.com>.

Ah, I reread your earlier (private) message. I see the problem has to do
with the use of synthesized PUA mappings. Here, the problem really is that
the font should always have a CMAP entry that maps to every glyph that can
be produced by the GSUB process. However, not all fonts do this, so in the
case in point, we have to synthesize some mapping, from which we have to
turn to PUA assignments. This works when we generate PDF since we generate
a subset font that contains the synthesized mappings. However, I can see
that if this is going to IF instead of PDF/PS, then we need to find a way
to recreate those synthesized mappings.

I think this information is really font-specific, and should not be tied to
specific text nodes though. So if Simon's fix uses text nodes, then that is
probably not the best approach.


On Mon, Apr 22, 2013 at 10:45 PM, Glenn Adams <gl...@skynav.com> wrote:

> I'm presently at W3C WG meetings this week, but I'll try to get on my
> schedule. I'm not sure what the IF->PS/PDF problem is, since the IF->PDF
> path is clearly working from my tests.
>
>
> On Mon, Apr 22, 2013 at 4:27 PM, Luis Bernardo <lm...@gmail.com>wrote:
>
>>
>> Glenn,
>>
>> Can you give your opinion about the approach used by Simon? As I
>> mentioned before (in a private message), the IF -> PS/PDF route does not
>> work in your original CS patch (for the languages that CS targets) due to
>> the mapped sequences. Simon's approach works but requires keeping the
>> original sequences alongside the mapped ones. I think it is a good approach
>> but I would like to know if you have a better suggestion before we apply
>> the patch.
>>
>> Thanks,
>> Luis
>>
>>
>> On 4/22/13 3:23 PM, Chris Bowditch (JIRA) wrote:
>>
>>>       [ https://issues.apache.org/**jira/browse/FOP-2210?page=com.**
>>> atlassian.jira.plugin.system.**issuetabpanels:all-tabpanel<https://issues.apache.org/jira/browse/FOP-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel>]
>>>
>>> Chris Bowditch reassigned FOP-2210:
>>> ------------------------------**-----
>>>
>>>      Assignee: Chris Bowditch
>>>
>>>
>>>> [PATCH] Complex script IF to output missing glyphs
>>>> ------------------------------**--------------------
>>>>
>>>>                  Key: FOP-2210
>>>>                  URL: https://issues.apache.org/**jira/browse/FOP-2210<https://issues.apache.org/jira/browse/FOP-2210>
>>>>              Project: Fop
>>>>           Issue Type: Bug
>>>>             Reporter: simon steiner
>>>>             Assignee: Chris Bowditch
>>>>          Attachments: csspeedtrunk.patch, fop.xconf, test.fo
>>>>
>>>>
>>>> fop test.fo -c fop.xconf -if application/pdf expected.if.xml
>>>> fop -c fop.xconf -ifin expected.if.xml out.pdf
>>>>
>>> --
>>> This message is automatically generated by JIRA.
>>> If you think it was sent incorrectly, please contact your JIRA
>>> administrators
>>> For more information on JIRA, see: http://www.atlassian.com/**
>>> software/jira <http://www.atlassian.com/software/jira>
>>>
>>
>>
>

Re: [jira] [Assigned] (FOP-2210) [PATCH] Complex script IF to output missing glyphs

Posted by Glenn Adams <gl...@skynav.com>.

I'm presently at W3C WG meetings this week, but I'll try to get on my
schedule. I'm not sure what the IF->PS/PDF problem is, since the IF->PDF
path is clearly working from my tests.


On Mon, Apr 22, 2013 at 4:27 PM, Luis Bernardo <lm...@gmail.com>wrote:

>
> Glenn,
>
> Can you give your opinion about the approach used by Simon? As I mentioned
> before (in a private message), the IF -> PS/PDF route does not work in your
> original CS patch (for the languages that CS targets) due to the mapped
> sequences. Simon's approach works but requires keeping the original
> sequences alongside the mapped ones. I think it is a good approach but I
> would like to know if you have a better suggestion before we apply the
> patch.
>
> Thanks,
> Luis
>
>
> On 4/22/13 3:23 PM, Chris Bowditch (JIRA) wrote:
>
>>       [ https://issues.apache.org/**jira/browse/FOP-2210?page=com.**
>> atlassian.jira.plugin.system.**issuetabpanels:all-tabpanel<https://issues.apache.org/jira/browse/FOP-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel>]
>>
>> Chris Bowditch reassigned FOP-2210:
>> ------------------------------**-----
>>
>>      Assignee: Chris Bowditch
>>
>>
>>> [PATCH] Complex script IF to output missing glyphs
>>> ------------------------------**--------------------
>>>
>>>                  Key: FOP-2210
>>>                  URL: https://issues.apache.org/**jira/browse/FOP-2210<https://issues.apache.org/jira/browse/FOP-2210>
>>>              Project: Fop
>>>           Issue Type: Bug
>>>             Reporter: simon steiner
>>>             Assignee: Chris Bowditch
>>>          Attachments: csspeedtrunk.patch, fop.xconf, test.fo
>>>
>>>
>>> fop test.fo -c fop.xconf -if application/pdf expected.if.xml
>>> fop -c fop.xconf -ifin expected.if.xml out.pdf
>>>
>> --
>> This message is automatically generated by JIRA.
>> If you think it was sent incorrectly, please contact your JIRA
>> administrators
>> For more information on JIRA, see: http://www.atlassian.com/**
>> software/jira <http://www.atlassian.com/software/jira>
>>
>
>

Re: [jira] [Assigned] (FOP-2210) [PATCH] Complex script IF to output missing glyphs

Posted by Luis Bernardo <lm...@gmail.com>.

Glenn,

Can you give your opinion about the approach used by Simon? As I 
mentioned before (in a private message), the IF -> PS/PDF route does not 
work in your original CS patch (for the languages that CS targets) due 
to the mapped sequences. Simon's approach works but requires keeping the 
original sequences alongside the mapped ones. I think it is a good 
approach but I would like to know if you have a better suggestion before 
we apply the patch.

Thanks,
Luis

On 4/22/13 3:23 PM, Chris Bowditch (JIRA) wrote:
>       [ https://issues.apache.org/jira/browse/FOP-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
>
> Chris Bowditch reassigned FOP-2210:
> -----------------------------------
>
>      Assignee: Chris Bowditch
>      
>> [PATCH] Complex script IF to output missing glyphs
>> --------------------------------------------------
>>
>>                  Key: FOP-2210
>>                  URL: https://issues.apache.org/jira/browse/FOP-2210
>>              Project: Fop
>>           Issue Type: Bug
>>             Reporter: simon steiner
>>             Assignee: Chris Bowditch
>>          Attachments: csspeedtrunk.patch, fop.xconf, test.fo
>>
>>
>> fop test.fo -c fop.xconf -if application/pdf expected.if.xml
>> fop -c fop.xconf -ifin expected.if.xml out.pdf
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators
> For more information on JIRA, see: http://www.atlassian.com/software/jira