You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@logging.apache.org by Christopher <ct...@apache.org> on 2020/03/10 20:57:07 UTC

Emoji in PatternLayout?

I tried to put in a kitty (🐈) in my LayoutPattern, but it didn't
work. It replaced it with some weird character. Is this is a known
bug?
Does PatternLayout not support wide characters?

Re: Emoji in PatternLayout?

Posted by Gary Gregory <ga...@gmail.com>.
I think only new Java versions parse prop files as utf8


Gary

On Tue, Mar 10, 2020, 20:17 Carter Kozak <ck...@ckozak.net> wrote:

> I wonder if the log4j2.properties file is being parsed as ISO-8859-1
> rather than UTF-8, so we're not reading the cat properly?
>
> On Tue, Mar 10, 2020, at 20:04, Christopher wrote:
> > In my log4j2.properties file, I used:
> >
> > appender.console.type = Console
> > appender.console.name = STDERR
> > appender.console.target = SYSTEM_ERR
> > appender.console.layout.type = PatternLayout
> > appender.console.layout.pattern = 🐈%style{%d{ISO8601}}{dim,cyan}
> > %style{[}{red}%style{%-8c{2}}{dim,blue}%style{]}{red}
> > %highlight{%-5p}%style{:}{red} %m%n
> >
> > I did not try to specify a charset at first, but my understanding is that
> > the default is to use UTF-8, which should work, but it prints 'ð' instead
> > of '🐈'.
> > Strangely, even though my terminal is using UTF-8, log4j prints correctly
> > when I add:
> >
> > appender.console.layout.charset = ISO-8859-1
> >
> > Setting this to 'UTF-8' explicitly does not work. I don't know if this
> is a
> > bug, or some charset confusion on my part (I try to stick to UTF-8
> > everywhere, but perhaps I missed something). Perhaps the config file
> itself
> > is being read as ISO-8859-1, even though it contains UTF-8 characters
> and I
> > made sure to explicitly save it with a UTF-8 BOM.
> >
> > On Tue, Mar 10, 2020 at 5:58 PM Ralph Goers <ra...@dslextreme.com>
> > wrote:
> >
> > > Did you specify a charset on the layout that supports that character?
> > >
> > > Ralph
> > >
> > > > On Mar 10, 2020, at 1:57 PM, Christopher <ct...@apache.org>
> wrote:
> > > >
> > > > I tried to put in a kitty (🐈) in my LayoutPattern, but it didn't
> > > > work. It replaced it with some weird character. Is this is a known
> > > > bug?
> > > > Does PatternLayout not support wide characters?
> > > >
> > >
> > >
> > >
> >
>
> -ck
>

Re: Emoji in PatternLayout?

Posted by Christopher <ct...@apache.org>.
According to https://en.wikipedia.org/wiki/.properties , .properties files
are ISO-8859-1, except Java reads them as UTF-8 since Java 9, and falls
back to ISO-8859-1. I'd have to dig further to see how log4j is reading the
config files. It might be a library or custom code that assumes ISO-8859-1,
regardless. If the code isn't smart enough to read the UTF-8 BOM, it might
be nice to be able to set the config file encoder's charset explicitly with
something like `-Dlog4j.configurationFile.charset=` flag. I'd prefer not to
set my text editor to use UTF-8 for my *.java files and ISO-8859-1 for my
*.properties files. That's just craziness :)

On Tue, Mar 10, 2020 at 8:21 PM Christopher <ct...@apache.org> wrote:

> That was my guess, but I don't see how this could happen since my JVM
> default encoding, my terminal, System.getProperty("file.encoding"),
> System.getProperty("input.encoding") and the BOM in the config file are all
> UTF-8. I'm using Java 11.
>
> On Tue, Mar 10, 2020 at 8:17 PM Carter Kozak <ck...@ckozak.net> wrote:
>
>> I wonder if the log4j2.properties file is being parsed as ISO-8859-1
>> rather than UTF-8, so we're not reading the cat properly?
>>
>> On Tue, Mar 10, 2020, at 20:04, Christopher wrote:
>> > In my log4j2.properties file, I used:
>> >
>> > appender.console.type = Console
>> > appender.console.name = STDERR
>> > appender.console.target = SYSTEM_ERR
>> > appender.console.layout.type = PatternLayout
>> > appender.console.layout.pattern = 🐈%style{%d{ISO8601}}{dim,cyan}
>> > %style{[}{red}%style{%-8c{2}}{dim,blue}%style{]}{red}
>> > %highlight{%-5p}%style{:}{red} %m%n
>> >
>> > I did not try to specify a charset at first, but my understanding is
>> that
>> > the default is to use UTF-8, which should work, but it prints 'ð'
>> instead
>> > of '🐈'.
>> > Strangely, even though my terminal is using UTF-8, log4j prints
>> correctly
>> > when I add:
>> >
>> > appender.console.layout.charset = ISO-8859-1
>> >
>> > Setting this to 'UTF-8' explicitly does not work. I don't know if this
>> is a
>> > bug, or some charset confusion on my part (I try to stick to UTF-8
>> > everywhere, but perhaps I missed something). Perhaps the config file
>> itself
>> > is being read as ISO-8859-1, even though it contains UTF-8 characters
>> and I
>> > made sure to explicitly save it with a UTF-8 BOM.
>> >
>> > On Tue, Mar 10, 2020 at 5:58 PM Ralph Goers <ralph.goers@dslextreme.com
>> >
>> > wrote:
>> >
>> > > Did you specify a charset on the layout that supports that character?
>> > >
>> > > Ralph
>> > >
>> > > > On Mar 10, 2020, at 1:57 PM, Christopher <ct...@apache.org>
>> wrote:
>> > > >
>> > > > I tried to put in a kitty (🐈) in my LayoutPattern, but it didn't
>> > > > work. It replaced it with some weird character. Is this is a known
>> > > > bug?
>> > > > Does PatternLayout not support wide characters?
>> > > >
>> > >
>> > >
>> > >
>> >
>>
>> -ck
>>
>

Re: Emoji in PatternLayout?

Posted by Christopher <ct...@apache.org>.
The problem is that it *is* UTF-8. It appears as though log4j2 isn't
reading it correctly as UTF-8.

On Tue, Mar 10, 2020, 21:00 Matt Sicker <bo...@gmail.com> wrote:

> You can encode it with \u codes. Emoji require multiple code points
> anyways, so file formats can get weird whenever they're not using UTF-8.
>
> On Tue, 10 Mar 2020 at 19:33, Gary Gregory <ga...@gmail.com> wrote:
>
> > I do not think your file.encoding sys prop matters, see the Javadoc for
> the
> > Properties class.
> >
> > Gary
> >
> > On Tue, Mar 10, 2020, 20:22 Christopher <ct...@apache.org> wrote:
> >
> > > That was my guess, but I don't see how this could happen since my JVM
> > > default encoding, my terminal, System.getProperty("file.encoding"),
> > > System.getProperty("input.encoding") and the BOM in the config file are
> > all
> > > UTF-8. I'm using Java 11.
> > >
> > > On Tue, Mar 10, 2020 at 8:17 PM Carter Kozak <ck...@ckozak.net>
> wrote:
> > >
> > > > I wonder if the log4j2.properties file is being parsed as ISO-8859-1
> > > > rather than UTF-8, so we're not reading the cat properly?
> > > >
> > > > On Tue, Mar 10, 2020, at 20:04, Christopher wrote:
> > > > > In my log4j2.properties file, I used:
> > > > >
> > > > > appender.console.type = Console
> > > > > appender.console.name = STDERR
> > > > > appender.console.target = SYSTEM_ERR
> > > > > appender.console.layout.type = PatternLayout
> > > > > appender.console.layout.pattern = 🐈%style{%d{ISO8601}}{dim,cyan}
> > > > > %style{[}{red}%style{%-8c{2}}{dim,blue}%style{]}{red}
> > > > > %highlight{%-5p}%style{:}{red} %m%n
> > > > >
> > > > > I did not try to specify a charset at first, but my understanding
> is
> > > that
> > > > > the default is to use UTF-8, which should work, but it prints 'ð'
> > > instead
> > > > > of '🐈'.
> > > > > Strangely, even though my terminal is using UTF-8, log4j prints
> > > correctly
> > > > > when I add:
> > > > >
> > > > > appender.console.layout.charset = ISO-8859-1
> > > > >
> > > > > Setting this to 'UTF-8' explicitly does not work. I don't know if
> > this
> > > > is a
> > > > > bug, or some charset confusion on my part (I try to stick to UTF-8
> > > > > everywhere, but perhaps I missed something). Perhaps the config
> file
> > > > itself
> > > > > is being read as ISO-8859-1, even though it contains UTF-8
> characters
> > > > and I
> > > > > made sure to explicitly save it with a UTF-8 BOM.
> > > > >
> > > > > On Tue, Mar 10, 2020 at 5:58 PM Ralph Goers <
> > > ralph.goers@dslextreme.com>
> > > > > wrote:
> > > > >
> > > > > > Did you specify a charset on the layout that supports that
> > character?
> > > > > >
> > > > > > Ralph
> > > > > >
> > > > > > > On Mar 10, 2020, at 1:57 PM, Christopher <ct...@apache.org>
> > > > wrote:
> > > > > > >
> > > > > > > I tried to put in a kitty (🐈) in my LayoutPattern, but it
> didn't
> > > > > > > work. It replaced it with some weird character. Is this is a
> > known
> > > > > > > bug?
> > > > > > > Does PatternLayout not support wide characters?
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > > > -ck
> > > >
> > >
> >
>
>
> --
> Matt Sicker <bo...@gmail.com>
>

Re: Emoji in PatternLayout?

Posted by Matt Sicker <bo...@gmail.com>.
You can encode it with \u codes. Emoji require multiple code points
anyways, so file formats can get weird whenever they're not using UTF-8.

On Tue, 10 Mar 2020 at 19:33, Gary Gregory <ga...@gmail.com> wrote:

> I do not think your file.encoding sys prop matters, see the Javadoc for the
> Properties class.
>
> Gary
>
> On Tue, Mar 10, 2020, 20:22 Christopher <ct...@apache.org> wrote:
>
> > That was my guess, but I don't see how this could happen since my JVM
> > default encoding, my terminal, System.getProperty("file.encoding"),
> > System.getProperty("input.encoding") and the BOM in the config file are
> all
> > UTF-8. I'm using Java 11.
> >
> > On Tue, Mar 10, 2020 at 8:17 PM Carter Kozak <ck...@ckozak.net> wrote:
> >
> > > I wonder if the log4j2.properties file is being parsed as ISO-8859-1
> > > rather than UTF-8, so we're not reading the cat properly?
> > >
> > > On Tue, Mar 10, 2020, at 20:04, Christopher wrote:
> > > > In my log4j2.properties file, I used:
> > > >
> > > > appender.console.type = Console
> > > > appender.console.name = STDERR
> > > > appender.console.target = SYSTEM_ERR
> > > > appender.console.layout.type = PatternLayout
> > > > appender.console.layout.pattern = 🐈%style{%d{ISO8601}}{dim,cyan}
> > > > %style{[}{red}%style{%-8c{2}}{dim,blue}%style{]}{red}
> > > > %highlight{%-5p}%style{:}{red} %m%n
> > > >
> > > > I did not try to specify a charset at first, but my understanding is
> > that
> > > > the default is to use UTF-8, which should work, but it prints 'ð'
> > instead
> > > > of '🐈'.
> > > > Strangely, even though my terminal is using UTF-8, log4j prints
> > correctly
> > > > when I add:
> > > >
> > > > appender.console.layout.charset = ISO-8859-1
> > > >
> > > > Setting this to 'UTF-8' explicitly does not work. I don't know if
> this
> > > is a
> > > > bug, or some charset confusion on my part (I try to stick to UTF-8
> > > > everywhere, but perhaps I missed something). Perhaps the config file
> > > itself
> > > > is being read as ISO-8859-1, even though it contains UTF-8 characters
> > > and I
> > > > made sure to explicitly save it with a UTF-8 BOM.
> > > >
> > > > On Tue, Mar 10, 2020 at 5:58 PM Ralph Goers <
> > ralph.goers@dslextreme.com>
> > > > wrote:
> > > >
> > > > > Did you specify a charset on the layout that supports that
> character?
> > > > >
> > > > > Ralph
> > > > >
> > > > > > On Mar 10, 2020, at 1:57 PM, Christopher <ct...@apache.org>
> > > wrote:
> > > > > >
> > > > > > I tried to put in a kitty (🐈) in my LayoutPattern, but it didn't
> > > > > > work. It replaced it with some weird character. Is this is a
> known
> > > > > > bug?
> > > > > > Does PatternLayout not support wide characters?
> > > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> > > -ck
> > >
> >
>


-- 
Matt Sicker <bo...@gmail.com>

Re: Emoji in PatternLayout?

Posted by Gary Gregory <ga...@gmail.com>.
I do not think your file.encoding sys prop matters, see the Javadoc for the
Properties class.

Gary

On Tue, Mar 10, 2020, 20:22 Christopher <ct...@apache.org> wrote:

> That was my guess, but I don't see how this could happen since my JVM
> default encoding, my terminal, System.getProperty("file.encoding"),
> System.getProperty("input.encoding") and the BOM in the config file are all
> UTF-8. I'm using Java 11.
>
> On Tue, Mar 10, 2020 at 8:17 PM Carter Kozak <ck...@ckozak.net> wrote:
>
> > I wonder if the log4j2.properties file is being parsed as ISO-8859-1
> > rather than UTF-8, so we're not reading the cat properly?
> >
> > On Tue, Mar 10, 2020, at 20:04, Christopher wrote:
> > > In my log4j2.properties file, I used:
> > >
> > > appender.console.type = Console
> > > appender.console.name = STDERR
> > > appender.console.target = SYSTEM_ERR
> > > appender.console.layout.type = PatternLayout
> > > appender.console.layout.pattern = 🐈%style{%d{ISO8601}}{dim,cyan}
> > > %style{[}{red}%style{%-8c{2}}{dim,blue}%style{]}{red}
> > > %highlight{%-5p}%style{:}{red} %m%n
> > >
> > > I did not try to specify a charset at first, but my understanding is
> that
> > > the default is to use UTF-8, which should work, but it prints 'ð'
> instead
> > > of '🐈'.
> > > Strangely, even though my terminal is using UTF-8, log4j prints
> correctly
> > > when I add:
> > >
> > > appender.console.layout.charset = ISO-8859-1
> > >
> > > Setting this to 'UTF-8' explicitly does not work. I don't know if this
> > is a
> > > bug, or some charset confusion on my part (I try to stick to UTF-8
> > > everywhere, but perhaps I missed something). Perhaps the config file
> > itself
> > > is being read as ISO-8859-1, even though it contains UTF-8 characters
> > and I
> > > made sure to explicitly save it with a UTF-8 BOM.
> > >
> > > On Tue, Mar 10, 2020 at 5:58 PM Ralph Goers <
> ralph.goers@dslextreme.com>
> > > wrote:
> > >
> > > > Did you specify a charset on the layout that supports that character?
> > > >
> > > > Ralph
> > > >
> > > > > On Mar 10, 2020, at 1:57 PM, Christopher <ct...@apache.org>
> > wrote:
> > > > >
> > > > > I tried to put in a kitty (🐈) in my LayoutPattern, but it didn't
> > > > > work. It replaced it with some weird character. Is this is a known
> > > > > bug?
> > > > > Does PatternLayout not support wide characters?
> > > > >
> > > >
> > > >
> > > >
> > >
> >
> > -ck
> >
>

Re: Emoji in PatternLayout?

Posted by Christopher <ct...@apache.org>.
That was my guess, but I don't see how this could happen since my JVM
default encoding, my terminal, System.getProperty("file.encoding"),
System.getProperty("input.encoding") and the BOM in the config file are all
UTF-8. I'm using Java 11.

On Tue, Mar 10, 2020 at 8:17 PM Carter Kozak <ck...@ckozak.net> wrote:

> I wonder if the log4j2.properties file is being parsed as ISO-8859-1
> rather than UTF-8, so we're not reading the cat properly?
>
> On Tue, Mar 10, 2020, at 20:04, Christopher wrote:
> > In my log4j2.properties file, I used:
> >
> > appender.console.type = Console
> > appender.console.name = STDERR
> > appender.console.target = SYSTEM_ERR
> > appender.console.layout.type = PatternLayout
> > appender.console.layout.pattern = 🐈%style{%d{ISO8601}}{dim,cyan}
> > %style{[}{red}%style{%-8c{2}}{dim,blue}%style{]}{red}
> > %highlight{%-5p}%style{:}{red} %m%n
> >
> > I did not try to specify a charset at first, but my understanding is that
> > the default is to use UTF-8, which should work, but it prints 'ð' instead
> > of '🐈'.
> > Strangely, even though my terminal is using UTF-8, log4j prints correctly
> > when I add:
> >
> > appender.console.layout.charset = ISO-8859-1
> >
> > Setting this to 'UTF-8' explicitly does not work. I don't know if this
> is a
> > bug, or some charset confusion on my part (I try to stick to UTF-8
> > everywhere, but perhaps I missed something). Perhaps the config file
> itself
> > is being read as ISO-8859-1, even though it contains UTF-8 characters
> and I
> > made sure to explicitly save it with a UTF-8 BOM.
> >
> > On Tue, Mar 10, 2020 at 5:58 PM Ralph Goers <ra...@dslextreme.com>
> > wrote:
> >
> > > Did you specify a charset on the layout that supports that character?
> > >
> > > Ralph
> > >
> > > > On Mar 10, 2020, at 1:57 PM, Christopher <ct...@apache.org>
> wrote:
> > > >
> > > > I tried to put in a kitty (🐈) in my LayoutPattern, but it didn't
> > > > work. It replaced it with some weird character. Is this is a known
> > > > bug?
> > > > Does PatternLayout not support wide characters?
> > > >
> > >
> > >
> > >
> >
>
> -ck
>

Re: Emoji in PatternLayout?

Posted by Carter Kozak <ck...@ckozak.net>.
I wonder if the log4j2.properties file is being parsed as ISO-8859-1 rather than UTF-8, so we're not reading the cat properly?

On Tue, Mar 10, 2020, at 20:04, Christopher wrote:
> In my log4j2.properties file, I used:
> 
> appender.console.type = Console
> appender.console.name = STDERR
> appender.console.target = SYSTEM_ERR
> appender.console.layout.type = PatternLayout
> appender.console.layout.pattern = 🐈%style{%d{ISO8601}}{dim,cyan}
> %style{[}{red}%style{%-8c{2}}{dim,blue}%style{]}{red}
> %highlight{%-5p}%style{:}{red} %m%n
> 
> I did not try to specify a charset at first, but my understanding is that
> the default is to use UTF-8, which should work, but it prints 'ð' instead
> of '🐈'.
> Strangely, even though my terminal is using UTF-8, log4j prints correctly
> when I add:
> 
> appender.console.layout.charset = ISO-8859-1
> 
> Setting this to 'UTF-8' explicitly does not work. I don't know if this is a
> bug, or some charset confusion on my part (I try to stick to UTF-8
> everywhere, but perhaps I missed something). Perhaps the config file itself
> is being read as ISO-8859-1, even though it contains UTF-8 characters and I
> made sure to explicitly save it with a UTF-8 BOM.
> 
> On Tue, Mar 10, 2020 at 5:58 PM Ralph Goers <ra...@dslextreme.com>
> wrote:
> 
> > Did you specify a charset on the layout that supports that character?
> >
> > Ralph
> >
> > > On Mar 10, 2020, at 1:57 PM, Christopher <ct...@apache.org> wrote:
> > >
> > > I tried to put in a kitty (🐈) in my LayoutPattern, but it didn't
> > > work. It replaced it with some weird character. Is this is a known
> > > bug?
> > > Does PatternLayout not support wide characters?
> > >
> >
> >
> >
> 

-ck

Re: Emoji in PatternLayout?

Posted by Christopher <ct...@apache.org>.
In my log4j2.properties file, I used:

appender.console.type = Console
appender.console.name = STDERR
appender.console.target = SYSTEM_ERR
appender.console.layout.type = PatternLayout
appender.console.layout.pattern = 🐈%style{%d{ISO8601}}{dim,cyan}
%style{[}{red}%style{%-8c{2}}{dim,blue}%style{]}{red}
%highlight{%-5p}%style{:}{red} %m%n

I did not try to specify a charset at first, but my understanding is that
the default is to use UTF-8, which should work, but it prints 'ð' instead
of '🐈'.
Strangely, even though my terminal is using UTF-8, log4j prints correctly
when I add:

appender.console.layout.charset = ISO-8859-1

Setting this to 'UTF-8' explicitly does not work. I don't know if this is a
bug, or some charset confusion on my part (I try to stick to UTF-8
everywhere, but perhaps I missed something). Perhaps the config file itself
is being read as ISO-8859-1, even though it contains UTF-8 characters and I
made sure to explicitly save it with a UTF-8 BOM.

On Tue, Mar 10, 2020 at 5:58 PM Ralph Goers <ra...@dslextreme.com>
wrote:

> Did you specify a charset on the layout that supports that character?
>
> Ralph
>
> > On Mar 10, 2020, at 1:57 PM, Christopher <ct...@apache.org> wrote:
> >
> > I tried to put in a kitty (🐈) in my LayoutPattern, but it didn't
> > work. It replaced it with some weird character. Is this is a known
> > bug?
> > Does PatternLayout not support wide characters?
> >
>
>
>

Re: Emoji in PatternLayout?

Posted by Ralph Goers <ra...@dslextreme.com>.
Did you specify a charset on the layout that supports that character?

Ralph

> On Mar 10, 2020, at 1:57 PM, Christopher <ct...@apache.org> wrote:
> 
> I tried to put in a kitty (🐈) in my LayoutPattern, but it didn't
> work. It replaced it with some weird character. Is this is a known
> bug?
> Does PatternLayout not support wide characters?
>