You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by bu...@apache.org on 2010/02/15 13:10:09 UTC

DO NOT REPLY [Bug 48745] New: Hyphenation results don't always equal OpenOffice result even with the same patterns

https://issues.apache.org/bugzilla/show_bug.cgi?id=48745

           Summary: Hyphenation results don't always equal OpenOffice
                    result even with the same patterns
           Product: Fop
           Version: 0.95
          Platform: PC
        OS/Version: Windows XP
            Status: NEW
          Severity: normal
          Priority: P2
         Component: general
        AssignedTo: fop-dev@xmlgraphics.apache.org
        ReportedBy: onkelpax-forum@yahoo.de


Created an attachment (id=24988)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=24988)
German hyphenation file

As already known, the hyphenation library has some problems with patterns who
contain numbers like 7 or 8. I realized that HyphenationTree.unpackValues(int)
extracts the characters ( and ' for these values. They differ by exactly 16
character positions in ASCII table. Following code changes transforms these
characters into the right ones:

    protected String unpackValues(int k) {
      StringBuilder buf = new StringBuilder();
        byte v = this.vspace.get(k++);
        while (v != 0) {
            char c = (char)((v >>> 4) - 1 + '0');
            if (!Character.isDigit(c)) {
              /* #21219: Bug fixed which sometimes occurs. Just
               * shift the ASCII position by a correction offset. */
              c += 16;
            }
            buf.append(c);
            c = (char)(v & 0x0f);
            if (c == 0) {
                break;
            }
            c = (char)(c - 1 + '0');
            if (!Character.isDigit(c)) {
              /* #21219: Bug fixed which sometimes occurs. Just
               * shift the ASCII position by a correction offset. */
              c += 16;
            }
            buf.append(c);
            v = this.vspace.get(k++);
        }
        return buf.toString();
    }

But there's another problem which could be experienced in languages with common
occurences of these two digits in patterns. Please compare the hyphenation
result of the German word, "Flickenteppich", (Pattern: .fli7ck8en7tep7pic8h)
with OpenOffice's result. OpenOffice doesn't generate a hyphenation like
"Flick-enteppich". But FOP does it, even with the cheap bug fix above. There's
an explicit prohibition at this word's position by the concerning pattern.
Other implementations of Liang's algorithm do notice this rule (see
http://www.davidashen.net/texhyphj.html or LibHnj used by OpenOffice).

My question is: Is this issue known? If yes, are there any existing trackers
concerning this bug? When will this be fixed?

Best regards


PAX

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 48745] Hyphenation results don't always equal OpenOffice result even with the same patterns

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=48745

Glenn Adams <gl...@skynav.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P2                          |P3

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 48745] Hyphenation results don't always equal OpenOffice result even with the same patterns

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=48745

--- Comment #2 from Glenn Adams <gl...@skynav.com> 2012-04-07 01:42:36 UTC ---
resetting P2 open bugs to P3 pending further review

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

DO NOT REPLY [Bug 48745] Hyphenation results don't always equal OpenOffice result even with the same patterns

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=48745

--- Comment #1 from Carlos Villegas <cv...@apache.org> 2010-02-15 15:17:32 UTC ---
Created an attachment (id=24989)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=24989)
fix unpacking of hyphenation pattern values

Thanks to PAX for pointing out the problem area. Not only unpackValues but
getValues also needed a similar fix. The proper fix is to mask the lower 4 bits
of the packed value after shifting.
The example mentioned in the report now works, I think.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.