You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@logging.apache.org by Volkan Yazıcı <vo...@gmail.com> on 2020/04/08 15:12:58 UTC

Magic char(s) breaking SocketAppender behavior

Hello,

While trying to understand the behavior of SocketAppender against
uncommon Unicode characters, I have come across to an interesting
case: \uD800 gets transmitted as \u003F ('?'). One can easily verify
this by appending \uD800 at the end of "This is a test message"
literals in SocketAppenderTest, lines 152 and 163. Would anybody mind
explaining why does \uD800 get transmitted as \u003F, but \uD800,
please?

Best.

Re: Magic char(s) breaking SocketAppender behavior

Posted by Remko Popma <re...@gmail.com>.
When a byte or byte combination cannot be converted to a character by the
character encoding, I think Java prints '?' (0x3F) by default.
You con't need SocketAppender to reproduce this:

@Test
public void test() {
    String txt = "?String" + '\uD800';
    System.out.println(txt); // prints ?String?
    for (byte b : txt.getBytes()) {
        System.out.print(" 0x" + Integer.toHexString(b));
    } // gives  0x3f 0x53 0x74 0x72 0x69 0x6e 0x67 0x3f
    System.out.println();
}



On Thu, Apr 9, 2020 at 12:13 AM Volkan Yazıcı <vo...@gmail.com>
wrote:

> Hello,
>
> While trying to understand the behavior of SocketAppender against
> uncommon Unicode characters, I have come across to an interesting
> case: \uD800 gets transmitted as \u003F ('?'). One can easily verify
> this by appending \uD800 at the end of "This is a test message"
> literals in SocketAppenderTest, lines 152 and 163. Would anybody mind
> explaining why does \uD800 get transmitted as \u003F, but \uD800,
> please?
>
> Best.
>