You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pdfbox.apache.org by ca...@apache.org on 2009/02/02 23:25:09 UTC
svn commit: r740130 -
/incubator/pdfbox/trunk/src/main/java/org/apache/pdfbox/util/ICU4JImpl.java
Author: carrier
Date: Mon Feb 2 22:25:09 2009
New Revision: 740130
URL: http://svn.apache.org/viewvc?rev=740130&view=rev
Log:
Fix associated with ligature decomposition in bug PDFBOX-415
Modified:
incubator/pdfbox/trunk/src/main/java/org/apache/pdfbox/util/ICU4JImpl.java
Modified: incubator/pdfbox/trunk/src/main/java/org/apache/pdfbox/util/ICU4JImpl.java
URL: http://svn.apache.org/viewvc/incubator/pdfbox/trunk/src/main/java/org/apache/pdfbox/util/ICU4JImpl.java?rev=740130&r1=740129&r2=740130&view=diff
==============================================================================
--- incubator/pdfbox/trunk/src/main/java/org/apache/pdfbox/util/ICU4JImpl.java (original)
+++ incubator/pdfbox/trunk/src/main/java/org/apache/pdfbox/util/ICU4JImpl.java Mon Feb 2 22:25:09 2009
@@ -71,10 +71,44 @@
* it converts the micro symbol in extended latin to the value in the greek
* script. We normalize the Unicode Alphabetic and Arabic A&B Presentation forms.
*/
- if (((a_str.charAt(i) >= 0xFB00) && (a_str.charAt(i) <= 0xFDFF)) ||
- ((a_str.charAt(i) >= 0xFE70) && (a_str.charAt(i) <= 0xFEFF))) {
- retStr += Normalizer.normalize(a_str.charAt(i), Normalizer.NFKC);
- }
+ char c = a_str.charAt(i);
+ if (((c >= 0xFB00) && (c <= 0xFDFF)) ||
+ ((c >= 0xFE70) && (c <= 0xFEFF))) {
+ /* The following ligatures have a space in them
+ * when they are decomposed. PDF files adjust for
+ * this using TJ spacing, but currently a space
+ * gets created in the word by the normalization.
+ * The current fix is to hard code the decomposition
+ * without the space.
+ */
+ if(c == 0xFC5E){
+ retStr += "\u064c\u0651";
+ }
+ else if(c == 0xFC5F){
+ retStr += "\u064d\u0651";
+ }
+ else if(c == 0xFC60){
+ retStr += "\u064e\u0651";
+ }
+ else if(c == 0xFC61){
+ retStr += "\u064f\u0651";
+ }
+ else if(c == 0xFC62){
+ retStr += "\u0650\u0651";
+ }
+ else if(c == 0xFC63){
+ retStr += "\u0651\u0670";
+ }
+ /* Some fonts map U+FDF2 differently than the Unicode spec.
+ * They add an extra U+0627 character to compensate.
+ * This removes the extra character for those fonts. */
+ else if((c == 0xFDF2) && (i > 0) && ((a_str.charAt(i-1) == 0x0627) || (a_str.charAt(i-1) == 0xFE8D))) {
+ retStr += "\u0644\u0644\u0647";
+ }
+ else{
+ retStr += Normalizer.normalize(c, Normalizer.NFKC);
+ }
+ }
else {
retStr += a_str.charAt(i);
}
Re: svn commit: r740130 - /incubator/pdfbox/trunk/src/main/java/org/apache/pdfbox/util/ICU4JImpl.java
Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.
Brian,
please always mention the contributor's name in the commit message when
processing a patch. I usually add a separate line "Submitted by: name
<obfuscated mail>". Not everyone does it exactly like that but it would
allow querying for contributors. Thanks.
See also:
http://apache.org/dev/committers.html#applying-patches
On 02.02.2009 23:25:09 carrier wrote:
> Author: carrier
> Date: Mon Feb 2 22:25:09 2009
> New Revision: 740130
>
> URL: http://svn.apache.org/viewvc?rev=740130&view=rev
> Log:
> Fix associated with ligature decomposition in bug PDFBOX-415
>
> Modified:
> incubator/pdfbox/trunk/src/main/java/org/apache/pdfbox/util/ICU4JImpl.java
>
Jeremias Maerki