You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by Andreas L Delmelle <a_...@pandora.be> on 2006/02/28 19:39:26 UTC

RTF and table/column widths (moved from fop-users)

(was getting a bit too much OT for fop-users...)

On Feb 28, 2006, at 19:07, Jeremias Maerki wrote:

> On 28.02.2006 18:25:42 Andreas L Delmelle wrote:
>>> <snip />
>>> I think I found what's causing the question-marks to appear in the
>>> RTF output...
>>>
>>> See org.apache.fop.render.rtf.RTFHandler, line 150. An
>>> OutputStreamWriter is instantiated, which uses the default platform
>>> encoding. Should be enough to force this Writer to use UTF-8, I  
>>> think.
>
> Nope, according to the RTF spec, the output should be in "US-ASCII"
> (7-bit) for portability. UTF-8 is definitely not supported by RTF

Oops! Sorry, my mistake. Anyway, relying on the default platform  
encoding is just as definitely wrong. :-)

> but I think it's possible to use various 8-bit character sets and  
> Unicode
> escapes if the proper commands are generated. The Microsoft RTF spec
> lists what is possible.

Already took a closer look at this, and this seemed already handled  
in RTFStringConverter.
So, I wondered and wandered, and found that --see  
RTFListItem.RTFListItemLabel-- currently RTFListStyleBullet is  
unused. We know only RTFListStyleNumber and RTFListStyleText.


Cheers,

Andreas


Re: RTF and table/column widths (moved from fop-users)

Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.
Absolutely!

On 28.02.2006 19:39:26 Andreas L Delmelle wrote:
<snip/>
> Anyway, relying on the default platform  
> encoding is just as definitely wrong. :-)
<snip/>


Jeremias Maerki


Re: RTF - list-item-label encoding (was: RTF and table/column widths (moved from fop-users))

Posted by Andreas L Delmelle <a_...@pandora.be>.
On Feb 28, 2006, at 23:17, Andreas L Delmelle wrote:

> <snip />
> Is it allowed to use unicode escapes in control words? If so, the  
> solution could be as simple as using RTFStringConverter to escape  
> any 'text' if necessary.
>
> In practice, my proposal would come down to

On that note, while browsing through the related sources, found some  
room for minor optimization in the RtfStringConverter itself. If this  
deals with many and/or large portions of text: see below.

If no one objects...

Cheers,

Andreas

Index: src/java/org/apache/fop/render/rtf/rtflib/rtfdoc/ 
RtfStringConverter.java
===================================================================
--- src/java/org/apache/fop/render/rtf/rtflib/rtfdoc/ 
RtfStringConverter.java    (revision 381394)
+++ src/java/org/apache/fop/render/rtf/rtflib/rtfdoc/ 
RtfStringConverter.java    (working copy)
@@ -1,5 +1,5 @@
/*
- * Copyright 1999-2004 The Apache Software Foundation.
+ * Copyright 1999-2006 The Apache Software Foundation.
   *
   * Licensed under the Apache License, Version 2.0 (the "License");
   * you may not use this file except in compliance with the License.
@@ -26,8 +26,6 @@
package org.apache.fop.render.rtf.rtflib.rtfdoc;
-import java.util.Map;
-import java.util.HashMap;
import java.io.IOException;
import java.io.Writer;
@@ -38,23 +36,6 @@
public class RtfStringConverter {
      private static final RtfStringConverter INSTANCE = new  
RtfStringConverter();
-    private static final Map SPECIAL_CHARS;
-    private static final Character DBLQUOTE = new Character('\"');
-    private static final Character QUOTE = new Character('\'');
-    private static final Character SPACE = new Character(' ');
-
-    /** List of characters to escape with corresponding replacement  
strings */
-    static {
-        SPECIAL_CHARS = new HashMap();
-        SPECIAL_CHARS.put(new Character('\t'), "tab");
-        SPECIAL_CHARS.put(new Character('\n'), "line");
-        SPECIAL_CHARS.put(new Character('\''), "rquote");
-        SPECIAL_CHARS.put(new Character('\"'), "rdblquote");
-        SPECIAL_CHARS.put(new Character('\\'), "\\");
-        SPECIAL_CHARS.put(new Character('{'), "{");
-        SPECIAL_CHARS.put(new Character('}'), "}");
-    }
-
      /** singleton pattern */
      private RtfStringConverter() {
      }
@@ -79,43 +60,50 @@
              return;
          }
+        StringBuffer sb = new StringBuffer();
+        String replacement;
+        char c, d;
          // TODO: could be made more efficient (binary lookup, etc.)
-        for (int i = 0; i < str.length(); i++) {
-            final Character c = new Character(str.charAt(i));
-            Character d;
-            String replacement;
-            if (i != 0) {
-                d = new Character(str.charAt(i - 1));
-            } else {
-                d = new Character(str.charAt(i));
+        for (int i = -1; ++i < str.length();) {
+            replacement = null;
+            c = str.charAt(i);
+            switch (c) {
+            case '\"':
+            case '\'':
+                d = str.charAt((i == 0) ? i : i - 1);
+                if (d == ' ') {
+                    replacement = (c == '\"') ? "ldblquote" : "lquote";
+                } else {
+                    replacement = (c == '\"') ? "rdblquote" : "rquote";
+                }
+                break;
+            case '\t':
+                replacement = "tab";
+                break;
+            case '\n':
+                replacement = "line";
+                break;
+            case '\\':
+            case '{':
+            case '}':
+                replacement = "" + c;
+                break;
+            default:
+                //nop
              }
-            //This section modified by Chris Scott
-            //add "smart" quote recognition
-            if (c.equals((Object)DBLQUOTE) && d.equals((Object) 
SPACE)) {
-                replacement = "ldblquote";
-            } else if (c.equals((Object)QUOTE) && d.equals((Object) 
SPACE)) {
-                replacement = "lquote";
-            } else {
-                replacement = (String)SPECIAL_CHARS.get(c);
-            }
-
              if (replacement != null) {
                  // RTF-escaped char
-                w.write('\\');
-                w.write(replacement);
-                w.write(' ');
-            } else if (c.charValue() > 127) {
+                sb.append('\\').append(replacement).append(' ');
+            } else if (c > 127) {
                  // write unicode representation - contributed by  
Michel Jacobson
                  // <ja...@idf.ext.jussieu.fr>
-                w.write("\\u");
-                w.write(Integer.toString((int)c.charValue()));
-                w.write("\\\'3f");
+                sb.append("\\u").append((int) c).append("\\\'3f");
              } else {
                  // plain char that is understood by RTF natively
-                w.write(c.charValue());
+                sb.append(c);
              }
          }
+        w.write(sb.toString());
      }
-
}


RTF - list-item-label encoding (was: RTF and table/column widths (moved from fop-users))

Posted by Andreas L Delmelle <a_...@pandora.be>.
On Feb 28, 2006, at 19:39, Andreas L Delmelle wrote:

> <snip />
> Already took a closer look at this, and this seemed already handled  
> in RTFStringConverter.
> So, I wondered and wandered, and found that --see  
> RTFListItem.RTFListItemLabel-- currently RTFListStyleBullet is  
> unused. We know only RTFListStyleNumber and RTFListStyleText.

OK, so the error qua encoding is two-fold here:

- RTFHandler should force the OutputStreamWriter to use "US-ASCII"  
encoding
- RTFListStyleText writes the bullet as 'text' (part of a control  
word) to the OutputStream

Is it allowed to use unicode escapes in control words? If so, the  
solution could be as simple as using RTFStringConverter to escape any  
'text' if necessary.

In practice, my proposal would come down to

Index: src/java/org/apache/fop/render/rtf/RTFHandler.java
===================================================================
--- src/java/org/apache/fop/render/rtf/RTFHandler.java  (revision  
381394)
+++ src/java/org/apache/fop/render/rtf/RTFHandler.java  (working copy)
@@ -147,7 +147,7 @@
      public void startDocument() throws SAXException {
          // TODO sections should be created
          try {
-            rtfFile = new RtfFile(new OutputStreamWriter(os));
+            rtfFile = new RtfFile(new OutputStreamWriter(os, "US- 
ASCII"));
              docArea = rtfFile.startDocumentArea();
          } catch (IOException ioe) {
              // TODO could we throw Exception in all FOEventHandler  
events?
Index: src/java/org/apache/fop/render/rtf/rtflib/rtfdoc/ 
RtfListStyleText.java
===================================================================
--- src/java/org/apache/fop/render/rtf/rtflib/rtfdoc/ 
RtfListStyleText.java      (revision 381394)
+++ src/java/org/apache/fop/render/rtf/rtflib/rtfdoc/ 
RtfListStyleText.java      (working copy)
@@ -63,7 +63,8 @@
          item.writeGroupMark(true);
          //item.writeControlWord("pndec");
          item.writeOneAttribute(RtfListTable.LIST_FONT_TYPE, "2");
-        item.writeControlWord("pntxtb " + text);
+        item.writeControlWord("pntxtb ");
+        RtfStringConverter.getInstance().writeRtfString(item.writer,  
text);
          item.writeGroupMark(false);
      }

Any objections? Suggestions for a better approach? (seems more like a  
quick fix; besides that, untested yet)


Cheers,

Andreas