You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by Matthias Gerth <ma...@avelon.ch> on 2018/01/12 08:03:20 UTC

XSSFCell character encoding

Hi,

XSSFCell seems to encode certain character sequences as unicode characters. How can I prevent this? Do I need to apply some kind of character escaping?

e.g.
cell.setCellValue("LUS_BO_WP_x24B8_AI"); // The cell value now is „LUS_BO_WPⒸAI"

In Unicode Ⓒ is U+24B8

Regards
Matthias Gerth


Re: XSSFCell character encoding

Posted by Matthias Gerth <ma...@avelon.ch>.
This character conversion is done in XSSFRichTextString.utfDecode()
I have now written a function that basicaly does the same thing in reverse.

private static final Pattern utfPtrn = Pattern.compile("_(x[0-9A-F]{4}_)");

private static final String UNICODE_CHARACTER_LOW_LINE = "_x005F_";

public static String escape(final String value) {
    if(value == null) return null;

    StringBuffer buf = new StringBuffer();
    Matcher m = utfPtrn.matcher(value);
    int idx = 0;
    while(m.find()) {
        int pos = m.start();
        if( pos > idx) {
            buf.append(value.substring(idx, pos));
        }

        buf.append(UNICODE_CHARACTER_LOW_LINE + m.group(1));

        idx = m.end();
    }
    buf.append(value.substring(idx));
    return buf.toString();
}
Thank you
Matthias

> On 12 Jan 2018, at 13:34, Dominik Stadler <do...@gmx.at> wrote:
> 
> This is the way how Excel encodes unicode characters here and thus POI
> performs the same conversion as well, see
> https://bz.apache.org/bugzilla/show_bug.cgi?id=57008#c8 for the related
> issue and a longer explanation.
> 
> Dominik
> 
> On Jan 12, 2018 09:03, "Matthias Gerth" <ma...@avelon.ch> wrote:
> 
> Hi,
> 
> XSSFCell seems to encode certain character sequences as unicode characters.
> How can I prevent this? Do I need to apply some kind of character escaping?
> 
> e.g.
> cell.setCellValue("LUS_BO_WP_x24B8_AI"); // The cell value now is
> „LUS_BO_WPⒸAI"
> 
> In Unicode Ⓒ is U+24B8
> 
> Regards
> Matthias Gerth


Re: XSSFCell character encoding

Posted by Dominik Stadler <do...@gmx.at>.
This is the way how Excel encodes unicode characters here and thus POI
performs the same conversion as well, see
https://bz.apache.org/bugzilla/show_bug.cgi?id=57008#c8 for the related
issue and a longer explanation.

Dominik

On Jan 12, 2018 09:03, "Matthias Gerth" <ma...@avelon.ch> wrote:

Hi,

XSSFCell seems to encode certain character sequences as unicode characters.
How can I prevent this? Do I need to apply some kind of character escaping?

e.g.
cell.setCellValue("LUS_BO_WP_x24B8_AI"); // The cell value now is
„LUS_BO_WPⒸAI"

In Unicode Ⓒ is U+24B8

Regards
Matthias Gerth