You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pdfbox.apache.org by ca...@apache.org on 2009/02/02 23:25:09 UTC

svn commit: r740130 - /incubator/pdfbox/trunk/src/main/java/org/apache/pdfbox/util/ICU4JImpl.java

Author: carrier
Date: Mon Feb  2 22:25:09 2009
New Revision: 740130

URL: http://svn.apache.org/viewvc?rev=740130&view=rev
Log:
Fix associated with ligature decomposition in bug PDFBOX-415

Modified:
    incubator/pdfbox/trunk/src/main/java/org/apache/pdfbox/util/ICU4JImpl.java

Modified: incubator/pdfbox/trunk/src/main/java/org/apache/pdfbox/util/ICU4JImpl.java
URL: http://svn.apache.org/viewvc/incubator/pdfbox/trunk/src/main/java/org/apache/pdfbox/util/ICU4JImpl.java?rev=740130&r1=740129&r2=740130&view=diff
==============================================================================
--- incubator/pdfbox/trunk/src/main/java/org/apache/pdfbox/util/ICU4JImpl.java (original)
+++ incubator/pdfbox/trunk/src/main/java/org/apache/pdfbox/util/ICU4JImpl.java Mon Feb  2 22:25:09 2009
@@ -71,10 +71,44 @@
              * it converts the micro symbol in extended latin to the value in the greek
              * script. We normalize the Unicode Alphabetic and Arabic A&B Presentation forms.
              */
-            if (((a_str.charAt(i) >= 0xFB00) && (a_str.charAt(i) <= 0xFDFF)) ||
-                    ((a_str.charAt(i) >= 0xFE70) && (a_str.charAt(i) <= 0xFEFF)))	{
-                retStr += Normalizer.normalize(a_str.charAt(i), Normalizer.NFKC);
-            }
+            char c = a_str.charAt(i);
+            if (((c >= 0xFB00) && (c <= 0xFDFF)) ||
+                    ((c >= 0xFE70) && (c <= 0xFEFF)))	{
+                /* The following ligatures have a space in them
+                 * when they are decomposed. PDF files adjust for
+                 * this using TJ spacing, but currently a space
+                 * gets created in the word by the normalization.
+                 * The current fix is to hard code the decomposition
+                 * without the space.
+                 */
+                if(c == 0xFC5E){
+                    retStr += "\u064c\u0651";
+                }
+                else if(c == 0xFC5F){
+                    retStr += "\u064d\u0651";
+                }
+                else if(c == 0xFC60){
+                    retStr += "\u064e\u0651";
+                }
+                else if(c == 0xFC61){
+                    retStr += "\u064f\u0651";
+                }
+                else if(c == 0xFC62){
+                    retStr += "\u0650\u0651";
+                }
+                else if(c == 0xFC63){
+                    retStr += "\u0651\u0670";
+                }
+                /* Some fonts map U+FDF2 differently than the Unicode spec.
+                 * They add an extra U+0627 character to compensate.  
+                 * This removes the extra character for those fonts. */ 
+                else if((c == 0xFDF2) && (i > 0) && ((a_str.charAt(i-1) == 0x0627) || (a_str.charAt(i-1) == 0xFE8D))) {
+                    retStr += "\u0644\u0644\u0647";
+                }
+                else{
+                    retStr += Normalizer.normalize(c, Normalizer.NFKC); 
+                }
+            }      
             else {
                 retStr += a_str.charAt(i);
             }



Re: svn commit: r740130 - /incubator/pdfbox/trunk/src/main/java/org/apache/pdfbox/util/ICU4JImpl.java

Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.
Brian,

please always mention the contributor's name in the commit message when
processing a patch. I usually add a separate line "Submitted by: name
<obfuscated mail>". Not everyone does it exactly like that but it would
allow querying for contributors. Thanks.

See also:
http://apache.org/dev/committers.html#applying-patches

On 02.02.2009 23:25:09 carrier wrote:
> Author: carrier
> Date: Mon Feb  2 22:25:09 2009
> New Revision: 740130
> 
> URL: http://svn.apache.org/viewvc?rev=740130&view=rev
> Log:
> Fix associated with ligature decomposition in bug PDFBOX-415
> 
> Modified:
>     incubator/pdfbox/trunk/src/main/java/org/apache/pdfbox/util/ICU4JImpl.java
> 



Jeremias Maerki