You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by Ilya Kasnacheev <il...@gmail.com> on 2018/12/26 16:20:31 UTC

Let's force -Dfile.encoding=UTF-8

Hello!

It was recently discovered that there's some issue in H2 which leads to
differing string encodings when they come into reducer from nodes with
different system encoding. Even if we were able to fix this, I suspect that
user code may too be bitten by mismatch of this setting.

I propose to force UTF-8 as system encoding at all times when we control
how JVM is launched.
This includes ignite.sh, ignite.bat, Apache.Ignite.exe and C++'s ./ignite.

This will mainly affect Windows systems as I expect that Linux will most
always use UTF-8 locale and Mac OS X should always be UTF-8.

file.encoding is somewhat misleading name since it specifies the default
string encoding, such as the one used for String.getBytes(). It is a common
convention to set ut to UTF-8, for example, IDEA will do that.

WDYT?

There's a pull request: https://github.com/apache/ignite/pull/5725
If somebody could contribute C++ and .Net tests I would be also grateful.

Regards,
-- 
Ilya Kasnacheev

Re: Let's force -Dfile.encoding=UTF-8

Posted by Ilya Kasnacheev <il...@gmail.com>.
Hello!

I'd force UTF-8 where possible. It should not be very hard to write a test
to ensure that an old one works (using multijvm).

Regards,
-- 
Ilya Kasnacheev


чт, 10 янв. 2019 г. в 17:21, Ivan Bessonov <be...@gmail.com>:

> I believe that all current keys are just basic English strings and it's too
> early to say that current change breaks something, sorry.
>
> We just have to ensure that old cp-1251 MetaStorages work fine and
> maybe then force UTF-8 for MetaStorage in source code to avoid such
> problems in the future. What do you think?
>
> чт, 10 янв. 2019 г. в 17:13, Ilya Kasnacheev <il...@gmail.com>:
>
> > Hello!
> >
> > I'm afraid not, but it is still possible to force legacy encoding as a
> > workaround.
> >
> > Is there an expectation that MetaStorage contains I18N keys?
> >
> > Regards,
> > --
> > Ilya Kasnacheev
> >
> >
> > чт, 10 янв. 2019 г. в 17:11, Ivan Bessonov <be...@gmail.com>:
> >
> > > Hello!
> > >
> > > I have a concern about MetaStorage - it uses default system encoding to
> > > serialize keys. This means that after this change Windows nodes won't
> be
> > > able to read previous MetaStorage files.
> > >
> > > Is that OK? Is there a way to migrate old MetaStorages safely?
> > >
> > > чт, 10 янв. 2019 г. в 16:57, Ilya Kasnacheev <
> ilya.kasnacheev@gmail.com
> > >:
> > >
> > > > Hello!
> > > >
> > > > I've merged this to master. Now we should to try run nodes in UTF-8
> > mode
> > > > unless specified otherwise explicitly.
> > > >
> > > > Also added a warning on start-up if other encoding is used as it
> might
> > > lead
> > > > to data corruption.
> > > >
> > > > Regards,
> > > > --
> > > > Ilya Kasnacheev
> > > >
> > > >
> > > > ср, 26 дек. 2018 г. в 19:25, Dmitriy Pavlov <dp...@apache.org>:
> > > >
> > > > > +1 from my side. I think it is reasonable
> > > > >
> > > > > ср, 26 дек. 2018 г. в 19:20, Ilya Kasnacheev <
> > > ilya.kasnacheev@gmail.com
> > > > >:
> > > > >
> > > > > > Hello!
> > > > > >
> > > > > > It was recently discovered that there's some issue in H2 which
> > leads
> > > to
> > > > > > differing string encodings when they come into reducer from nodes
> > > with
> > > > > > different system encoding. Even if we were able to fix this, I
> > > suspect
> > > > > that
> > > > > > user code may too be bitten by mismatch of this setting.
> > > > > >
> > > > > > I propose to force UTF-8 as system encoding at all times when we
> > > > control
> > > > > > how JVM is launched.
> > > > > > This includes ignite.sh, ignite.bat, Apache.Ignite.exe and C++'s
> > > > > ./ignite.
> > > > > >
> > > > > > This will mainly affect Windows systems as I expect that Linux
> will
> > > > most
> > > > > > always use UTF-8 locale and Mac OS X should always be UTF-8.
> > > > > >
> > > > > > file.encoding is somewhat misleading name since it specifies the
> > > > default
> > > > > > string encoding, such as the one used for String.getBytes(). It
> is
> > a
> > > > > common
> > > > > > convention to set ut to UTF-8, for example, IDEA will do that.
> > > > > >
> > > > > > WDYT?
> > > > > >
> > > > > > There's a pull request:
> https://github.com/apache/ignite/pull/5725
> > > > > > If somebody could contribute C++ and .Net tests I would be also
> > > > grateful.
> > > > > >
> > > > > > Regards,
> > > > > > --
> > > > > > Ilya Kasnacheev
> > > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > Sincerely yours,
> > > Ivan Bessonov
> > >
> >
>
>
> --
> Sincerely yours,
> Ivan Bessonov
>

Re: Let's force -Dfile.encoding=UTF-8

Posted by Ivan Bessonov <be...@gmail.com>.
I believe that all current keys are just basic English strings and it's too
early to say that current change breaks something, sorry.

We just have to ensure that old cp-1251 MetaStorages work fine and
maybe then force UTF-8 for MetaStorage in source code to avoid such
problems in the future. What do you think?

чт, 10 янв. 2019 г. в 17:13, Ilya Kasnacheev <il...@gmail.com>:

> Hello!
>
> I'm afraid not, but it is still possible to force legacy encoding as a
> workaround.
>
> Is there an expectation that MetaStorage contains I18N keys?
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> чт, 10 янв. 2019 г. в 17:11, Ivan Bessonov <be...@gmail.com>:
>
> > Hello!
> >
> > I have a concern about MetaStorage - it uses default system encoding to
> > serialize keys. This means that after this change Windows nodes won't be
> > able to read previous MetaStorage files.
> >
> > Is that OK? Is there a way to migrate old MetaStorages safely?
> >
> > чт, 10 янв. 2019 г. в 16:57, Ilya Kasnacheev <ilya.kasnacheev@gmail.com
> >:
> >
> > > Hello!
> > >
> > > I've merged this to master. Now we should to try run nodes in UTF-8
> mode
> > > unless specified otherwise explicitly.
> > >
> > > Also added a warning on start-up if other encoding is used as it might
> > lead
> > > to data corruption.
> > >
> > > Regards,
> > > --
> > > Ilya Kasnacheev
> > >
> > >
> > > ср, 26 дек. 2018 г. в 19:25, Dmitriy Pavlov <dp...@apache.org>:
> > >
> > > > +1 from my side. I think it is reasonable
> > > >
> > > > ср, 26 дек. 2018 г. в 19:20, Ilya Kasnacheev <
> > ilya.kasnacheev@gmail.com
> > > >:
> > > >
> > > > > Hello!
> > > > >
> > > > > It was recently discovered that there's some issue in H2 which
> leads
> > to
> > > > > differing string encodings when they come into reducer from nodes
> > with
> > > > > different system encoding. Even if we were able to fix this, I
> > suspect
> > > > that
> > > > > user code may too be bitten by mismatch of this setting.
> > > > >
> > > > > I propose to force UTF-8 as system encoding at all times when we
> > > control
> > > > > how JVM is launched.
> > > > > This includes ignite.sh, ignite.bat, Apache.Ignite.exe and C++'s
> > > > ./ignite.
> > > > >
> > > > > This will mainly affect Windows systems as I expect that Linux will
> > > most
> > > > > always use UTF-8 locale and Mac OS X should always be UTF-8.
> > > > >
> > > > > file.encoding is somewhat misleading name since it specifies the
> > > default
> > > > > string encoding, such as the one used for String.getBytes(). It is
> a
> > > > common
> > > > > convention to set ut to UTF-8, for example, IDEA will do that.
> > > > >
> > > > > WDYT?
> > > > >
> > > > > There's a pull request: https://github.com/apache/ignite/pull/5725
> > > > > If somebody could contribute C++ and .Net tests I would be also
> > > grateful.
> > > > >
> > > > > Regards,
> > > > > --
> > > > > Ilya Kasnacheev
> > > > >
> > > >
> > >
> >
> >
> > --
> > Sincerely yours,
> > Ivan Bessonov
> >
>


-- 
Sincerely yours,
Ivan Bessonov

Re: Let's force -Dfile.encoding=UTF-8

Posted by Ilya Kasnacheev <il...@gmail.com>.
Hello!

I'm afraid not, but it is still possible to force legacy encoding as a
workaround.

Is there an expectation that MetaStorage contains I18N keys?

Regards,
-- 
Ilya Kasnacheev


чт, 10 янв. 2019 г. в 17:11, Ivan Bessonov <be...@gmail.com>:

> Hello!
>
> I have a concern about MetaStorage - it uses default system encoding to
> serialize keys. This means that after this change Windows nodes won't be
> able to read previous MetaStorage files.
>
> Is that OK? Is there a way to migrate old MetaStorages safely?
>
> чт, 10 янв. 2019 г. в 16:57, Ilya Kasnacheev <il...@gmail.com>:
>
> > Hello!
> >
> > I've merged this to master. Now we should to try run nodes in UTF-8 mode
> > unless specified otherwise explicitly.
> >
> > Also added a warning on start-up if other encoding is used as it might
> lead
> > to data corruption.
> >
> > Regards,
> > --
> > Ilya Kasnacheev
> >
> >
> > ср, 26 дек. 2018 г. в 19:25, Dmitriy Pavlov <dp...@apache.org>:
> >
> > > +1 from my side. I think it is reasonable
> > >
> > > ср, 26 дек. 2018 г. в 19:20, Ilya Kasnacheev <
> ilya.kasnacheev@gmail.com
> > >:
> > >
> > > > Hello!
> > > >
> > > > It was recently discovered that there's some issue in H2 which leads
> to
> > > > differing string encodings when they come into reducer from nodes
> with
> > > > different system encoding. Even if we were able to fix this, I
> suspect
> > > that
> > > > user code may too be bitten by mismatch of this setting.
> > > >
> > > > I propose to force UTF-8 as system encoding at all times when we
> > control
> > > > how JVM is launched.
> > > > This includes ignite.sh, ignite.bat, Apache.Ignite.exe and C++'s
> > > ./ignite.
> > > >
> > > > This will mainly affect Windows systems as I expect that Linux will
> > most
> > > > always use UTF-8 locale and Mac OS X should always be UTF-8.
> > > >
> > > > file.encoding is somewhat misleading name since it specifies the
> > default
> > > > string encoding, such as the one used for String.getBytes(). It is a
> > > common
> > > > convention to set ut to UTF-8, for example, IDEA will do that.
> > > >
> > > > WDYT?
> > > >
> > > > There's a pull request: https://github.com/apache/ignite/pull/5725
> > > > If somebody could contribute C++ and .Net tests I would be also
> > grateful.
> > > >
> > > > Regards,
> > > > --
> > > > Ilya Kasnacheev
> > > >
> > >
> >
>
>
> --
> Sincerely yours,
> Ivan Bessonov
>

Re: Let's force -Dfile.encoding=UTF-8

Posted by Ivan Bessonov <be...@gmail.com>.
Hello!

I have a concern about MetaStorage - it uses default system encoding to
serialize keys. This means that after this change Windows nodes won't be
able to read previous MetaStorage files.

Is that OK? Is there a way to migrate old MetaStorages safely?

чт, 10 янв. 2019 г. в 16:57, Ilya Kasnacheev <il...@gmail.com>:

> Hello!
>
> I've merged this to master. Now we should to try run nodes in UTF-8 mode
> unless specified otherwise explicitly.
>
> Also added a warning on start-up if other encoding is used as it might lead
> to data corruption.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> ср, 26 дек. 2018 г. в 19:25, Dmitriy Pavlov <dp...@apache.org>:
>
> > +1 from my side. I think it is reasonable
> >
> > ср, 26 дек. 2018 г. в 19:20, Ilya Kasnacheev <ilya.kasnacheev@gmail.com
> >:
> >
> > > Hello!
> > >
> > > It was recently discovered that there's some issue in H2 which leads to
> > > differing string encodings when they come into reducer from nodes with
> > > different system encoding. Even if we were able to fix this, I suspect
> > that
> > > user code may too be bitten by mismatch of this setting.
> > >
> > > I propose to force UTF-8 as system encoding at all times when we
> control
> > > how JVM is launched.
> > > This includes ignite.sh, ignite.bat, Apache.Ignite.exe and C++'s
> > ./ignite.
> > >
> > > This will mainly affect Windows systems as I expect that Linux will
> most
> > > always use UTF-8 locale and Mac OS X should always be UTF-8.
> > >
> > > file.encoding is somewhat misleading name since it specifies the
> default
> > > string encoding, such as the one used for String.getBytes(). It is a
> > common
> > > convention to set ut to UTF-8, for example, IDEA will do that.
> > >
> > > WDYT?
> > >
> > > There's a pull request: https://github.com/apache/ignite/pull/5725
> > > If somebody could contribute C++ and .Net tests I would be also
> grateful.
> > >
> > > Regards,
> > > --
> > > Ilya Kasnacheev
> > >
> >
>


-- 
Sincerely yours,
Ivan Bessonov

Re: Let's force -Dfile.encoding=UTF-8

Posted by Ilya Kasnacheev <il...@gmail.com>.
Hello!

I've merged this to master. Now we should to try run nodes in UTF-8 mode
unless specified otherwise explicitly.

Also added a warning on start-up if other encoding is used as it might lead
to data corruption.

Regards,
-- 
Ilya Kasnacheev


ср, 26 дек. 2018 г. в 19:25, Dmitriy Pavlov <dp...@apache.org>:

> +1 from my side. I think it is reasonable
>
> ср, 26 дек. 2018 г. в 19:20, Ilya Kasnacheev <il...@gmail.com>:
>
> > Hello!
> >
> > It was recently discovered that there's some issue in H2 which leads to
> > differing string encodings when they come into reducer from nodes with
> > different system encoding. Even if we were able to fix this, I suspect
> that
> > user code may too be bitten by mismatch of this setting.
> >
> > I propose to force UTF-8 as system encoding at all times when we control
> > how JVM is launched.
> > This includes ignite.sh, ignite.bat, Apache.Ignite.exe and C++'s
> ./ignite.
> >
> > This will mainly affect Windows systems as I expect that Linux will most
> > always use UTF-8 locale and Mac OS X should always be UTF-8.
> >
> > file.encoding is somewhat misleading name since it specifies the default
> > string encoding, such as the one used for String.getBytes(). It is a
> common
> > convention to set ut to UTF-8, for example, IDEA will do that.
> >
> > WDYT?
> >
> > There's a pull request: https://github.com/apache/ignite/pull/5725
> > If somebody could contribute C++ and .Net tests I would be also grateful.
> >
> > Regards,
> > --
> > Ilya Kasnacheev
> >
>

Re: Let's force -Dfile.encoding=UTF-8

Posted by Dmitriy Pavlov <dp...@apache.org>.
+1 from my side. I think it is reasonable

ср, 26 дек. 2018 г. в 19:20, Ilya Kasnacheev <il...@gmail.com>:

> Hello!
>
> It was recently discovered that there's some issue in H2 which leads to
> differing string encodings when they come into reducer from nodes with
> different system encoding. Even if we were able to fix this, I suspect that
> user code may too be bitten by mismatch of this setting.
>
> I propose to force UTF-8 as system encoding at all times when we control
> how JVM is launched.
> This includes ignite.sh, ignite.bat, Apache.Ignite.exe and C++'s ./ignite.
>
> This will mainly affect Windows systems as I expect that Linux will most
> always use UTF-8 locale and Mac OS X should always be UTF-8.
>
> file.encoding is somewhat misleading name since it specifies the default
> string encoding, such as the one used for String.getBytes(). It is a common
> convention to set ut to UTF-8, for example, IDEA will do that.
>
> WDYT?
>
> There's a pull request: https://github.com/apache/ignite/pull/5725
> If somebody could contribute C++ and .Net tests I would be also grateful.
>
> Regards,
> --
> Ilya Kasnacheev
>