You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@poi.apache.org by rajneesh sharma <ra...@gmail.com> on 2012/06/20 11:12:05 UTC

control characters does not gets handled properly

Hi All
         I have two problems related to control characters handling in POI.
1. control characters replaced by '?' , ideally I want to show control
characters.
2. String pattern _x00HH_ gets converted to control characters (here H is
0-9 or A-F)

I investigated the problem and find out that following facts that causes
this problem
1. Excel internally uses OOXML and XML does not support control characters.
2. To overcome this XML limitation Excel escapes the control character.
3. If that escape of control character comes in a string then this also has
to be escaped so that it does not gets converted to control character.

Excel does it using following logic

 # Excel escapes control characters with _xHHHH_ and also escapes any
 # literal strings of that type by encoding the leading underscore. So
 # "\0" -> _x0000_ and "_x0000_" -> _x005F_x0000_.




Is it a bug or is it intended? If it's a bug then, are you going to solve
it in future releases?


Regards
Rajneesh

Re: control characters does not gets handled properly

Posted by Nick Burch <ni...@alfresco.com>.
On 20/06/12 10:12, rajneesh sharma wrote:
>           I have two problems related to control characters handling in POI.
> 1. control characters replaced by '?' , ideally I want to show control
> characters.

Are you sure they're being turned into question marks? That's often what 
you get if you try to display a character that your current terminal 
encoding + font doesn't support. Try checking in a debugger, or turning 
the string into a char array and casting each char to an int to see what 
codepoint it is

> 1. Excel internally uses OOXML and XML does not support control characters.
> 2. To overcome this XML limitation Excel escapes the control character.

This doesn't sound excel specific, using entities and escapes is pretty 
standard with xml processing for characters not supported.

XMLBeans should be decoding any characters that get xml escaped/encoded, 
so that you get a real character back. Are you sure you're not seeing this?

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org