You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Aaron Madlon-Kay (JIRA)" <ji...@apache.org> on 2018/09/05 01:12:00 UTC

[jira] [Comment Edited] (PDFBOX-4304) Glyph Substitution Table lookup Cache doesn't clear by disabling a feature.

    [ https://issues.apache.org/jira/browse/PDFBOX-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16603801#comment-16603801 ] 

Aaron Madlon-Kay edited comment on PDFBOX-4304 at 9/5/18 1:11 AM:
------------------------------------------------------------------

As I recall, when I wrote the GSUB code I put the cache in only partly for performance; there was also a correctness component to it.

(This is based on the last time I looked at the code, so I apologize if it's not quite accurate.)

When choosing a substitution, there is some dependence on context:

# The incoming Unicode characters's script influences the choice of LangSys which determines the available features
# Many Unicode characters have ambiguous script ("Default") which means we have to consider the surrounding text (it would be better to be able to set a default script/language for the whole document, but such a setting doesn't exist at the moment)
# The place where the script is determined ({{GlyphSubstitutionTable.selectScriptTag}}) can't see the actual surrounding text, so it can only guess based on the last known-valid script used
# This means that without a cache to ensure consistency, an ambiguous-script character may be substituted differently throughout the document
# (I'm fuzzy on this point) I thought the way the Unicode map was created ({{PDCIDFontType2Embedder.buildToUnicodeCMap}}) there was some need for a one-to-one correspondence

{quote}I think I've now understood the first part. You asking that the cache be reset when features are disabled or enabled, that makes sense, as long as getUnsubstitution() isn't used "too late".{quote}

I agree that resetting the cache makes sense, but I am also wary that the font needs to be stateless per the issues [~jahewson] found with my initial implementation. Unfortunately I haven't had time to see what changes were made to remove the statefulness.


was (Author: amake):
As I recall, when I wrote the GSUB code I put the cache in only partly for performance; there was also a correctness component to it.

(This is based on the last time I looked at the code, so I apologize if it's not quite accurate.)

When choosing a substitution, there is some dependence on context:

# The incoming Unicode characters's script influences the choice of LangSys which determines the available features
# Many Unicode characters have ambiguous script ("Default") which means we have to consider the surrounding text (it would be better to be able to set a default script/language for the whole document, but such a setting doesn't exist at the moment)
# The place where the script is determined ({{GlyphSubstitutionTable.selectScriptTag}}) can't see the actual surrounding text, so it can only guess based on the last known-valid script used
# This means that without a cache to ensure consistency, an ambiguous-script character may be substituted differently throughout the document
# (I'm fuzzy on this point:) I thought the way the Unicode map was created ({{PDCIDFontType2Embedder.buildToUnicodeCMap}}) there was some need for a one-to-one correspondence

{quote}I think I've now understood the first part. You asking that the cache be reset when features are disabled or enabled, that makes sense, as long as getUnsubstitution() isn't used "too late".{quote}

I agree that resetting the cache makes sense, but I am also wary that the font needs to be stateless per the issues [~jahewson] found with my initial implementation. Unfortunately I haven't had time to see what changes were made to remove the statefulness.

> Glyph Substitution Table lookup Cache doesn't clear by disabling a feature.
> ---------------------------------------------------------------------------
>
>                 Key: PDFBOX-4304
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4304
>             Project: PDFBox
>          Issue Type: Bug
>          Components: FontBox
>    Affects Versions: 2.0.11
>            Reporter: Ali Safe
>            Priority: Major
>         Attachments: FDK_aban.ttf
>
>
> When I want to use GlyphSubstitutionTable to find the substituted gid for a specific glyph that have 3 forms of substitutions, I found the same gid for each three forms.
> The font are a Persian font that have 3 substituted forms for some of it's glyphs. I enabled the 'init', 'medi' and 'fina' features one by one and then disable them. But all of these give me the same result.
> When I saw the GlyphSubstitutionTable class and getSubstitution(gid, scriptTags, enabledFeatures) method in it, I saw a lookupCache that first check for gid only, and if the gid existed returns the result, and if it's not in lookupCache do other parsing and calculations. I think every time that some features are disabled or enabled, this cache must be cleared. And also the cache lookup must be a mapping of three of the function input argument, because they are affect the result of calculations. At least the lookupCache must be a mapping of gid and enabledFeatures. 
> And when more than one feature are enabled, the lookup cache maps each gid to only one substituted glyph, but in many languages there more than one substitutions form for some glyphs. When I enable more than one features only the last enabled feature will be affected. 
> I used this code and attached the mentioned font file...
>                     // Persian Beh Letter with code 1576 in the font
>                     // Enable init feature
>                     ttf.enableGsubFeature("init");
>                     CmapLookup cMapLookupInit = ttf.getUnicodeCmapLookup();
>                     int glyphIdInit = cMapLookupInit.getGlyphId(1576);
>                     ttf.disableGsubFeature("init");
>                     // Enable medi feature
>                     ttf.enableGsubFeature("medi");
>                     CmapLookup cMapLookupMedi = ttf.getUnicodeCmapLookup();
>                     int glyphIdMedi = cMapLookupMedi.getGlyphId(1576);
>                     ttf.disableGsubFeature("medi");
>                     // Now the glypIdMedi and glyphIdInit have same values...
>                     



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org