You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Paulo Motta (JIRA)" <ji...@apache.org> on 2016/01/19 19:24:39 UTC
[jira] [Commented] (CASSANDRA-11030) non-ascii characters
incorrectly displayed/inserted on cqlsh on Windows
[ https://issues.apache.org/jira/browse/CASSANDRA-11030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107129#comment-15107129 ]
Paulo Motta commented on CASSANDRA-11030:
-----------------------------------------
There are two issues at play here. The first is that the default Windows terminal encoding is not {{utf-8}}, so in order to display/input {{utf-8}} characters you must set the terminal encoding (code page in Windows nomenclature) to {{cp65001}}, by issuing the command {{chcp 65001}} before starting cqlsh. The second issue is that there is no codec for {{cp65001}} in python < 3.3 (this was fixed in issue [13216|https://bugs.python.org/issue13216] in Python [3.3+|https://docs.python.org/dev/whatsnew/3.3.html#codecs]). A known workaround is to register a copy of the {{utf-8}} codec to encode/decode {{cp65001}}.
So, if the platform is native windows (the issue doesn't happen on cygwin), and the encoding is set to {{utf-8}} but the terminal encoding is not {{cp65001}}, a warning is print for the user to change its codepoint to {{cp65001}} to support {{utf-8}} encoding. Furthermore, if the {{cp650001}} is the default encoding and the python version is less than 3.3, the {{utf-8}} codec is registered as {{cp65001}}.
||2.2||3.0||3.3||trunk||
|[branch|https://github.com/apache/cassandra/compare/cassandra-2.2...pauloricardomg:2.2-11030]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.0...pauloricardomg:3.0-11030]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.3...pauloricardomg:3.3-11030]|[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-11030]|
|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-11030-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-11030-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.3-11030-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-11030-testall/lastCompletedBuild/testReport/]|
|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-2.2-11030-dtest/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-11030-dtest/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.3-11030-dtest/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-11030-dtest/lastCompletedBuild/testReport/]|
Below is a sample execution with different encoding variations (default vs utf-8/cp65001):
{noformat}
C:\Users\Paulo\Repositories\cassandra [cassandra-2.2 +8 ~1 -0 !]> bin\cqlsh.bat
Connected to test at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 2.2.4-SNAPSHOT | CQL spec 3.3.1 | Native protocol v4]
Use HELP for help.
cqlsh> select * from bla.test;
bla
--------------
joπo ßlcides
bla
nπoτ
(3 rows)
cqlsh> select * from bla.test where bla = 'nãoç';
bla
-----
(0 rows)
cqlsh> exit;
C:\Users\Paulo\Repositories\cassandra [cassandra-2.2 +8 ~1 -0 !]> bin\cqlsh.bat --encoding utf-8
WARNING: console codepage must be set to cp65001 to support utf-8 encoding on Windows platforms.
If you experience encoding problems, change your console codepage with 'chcp 65001' before starting cqlsh.
Connected to test at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 2.2.4-SNAPSHOT | CQL spec 3.3.1 | Native protocol v4]
Use HELP for help.
cqlsh> select * from bla.test;
bla
--------------
joão álcides
bla
nãoç
(3 rows)
cqlsh> select * from bla.test where bla = 'nãoç';
Traceback (most recent call last):
File "C:\Users\Paulo\Repositories\cassandra\bin\\cqlsh.py", line 1044, in get_input_line
self.lastcmd = raw_input(prompt).decode(self.encoding)
File "C:\tools\python2\lib\encodings\utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x87 in position 39: invalid start byte
WARNING: console codepage must be set to cp65001 to support utf-8 encoding on Windows platforms.
If you experience encoding problems, change your console codepage with 'chcp 65001' before starting cqlsh.
cqlsh> exit;
C:\Users\Paulo\Repositories\cassandra [cassandra-2.2 +8 ~1 -0 !]> chcp 65001
Active code page: 65001
C:\Users\Paulo\Repositories\cassandra [cassandra-2.2 +8 ~1 -0 !]> bin\cqlsh.bat --encoding utf-8
Connected to test at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 2.2.4-SNAPSHOT | CQL spec 3.3.1 | Native protocol v4]
Use HELP for help.
cqlsh> select * from bla.test;
bla
--------------
joão álcides
bla
nãoç
(3 rows)
cqlsh> select * from bla.test where bla = 'nãoç';
bla
------
nãoç
(1 rows)
cqlsh> insert into bla.test (bla ) VALUES ( 'ãnothér' );
cqlsh> select * from bla.test where bla = 'ãnothér';
bla
---------
ãnothér
(1 rows)
cqlsh> exit;
{noformat}
[~Stefania] would you mind reviewing? Would you have a Windows10 box to test it? I tested only on win7 and it works correctly.
> non-ascii characters incorrectly displayed/inserted on cqlsh on Windows
> -----------------------------------------------------------------------
>
> Key: CASSANDRA-11030
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11030
> Project: Cassandra
> Issue Type: Bug
> Reporter: Paulo Motta
> Assignee: Paulo Motta
> Priority: Minor
> Labels: cqlsh, windows
>
> {noformat}
> C:\Users\Paulo\Repositories\cassandra [2.2-10948 +6 ~1 -0 !]> .\bin\cqlsh.bat --encoding utf-8
> Connected to test at 127.0.0.1:9042.
> [cqlsh 5.0.1 | Cassandra 2.2.4-SNAPSHOT | CQL spec 3.3.1 | Native protocol v4]
> Use HELP for help.
> cqlsh> INSERT INTO bla.test (bla ) VALUES ('não') ;
> cqlsh> select * from bla.test;
> bla
> -----
> n?o
> (1 rows)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)