You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "wuyang (JIRA)" <ji...@apache.org> on 2014/10/09 05:38:33 UTC

[jira] [Updated] (PHOENIX-1334) Issue when LIKE expression contains Chinese characters on Key column

     [ https://issues.apache.org/jira/browse/PHOENIX-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

wuyang updated PHOENIX-1334:
----------------------------
    Description: 
When I use like expression in SELECT query. It works well when I put *Chinese* characters in LIKE expression on NONE PRIMARY KEY columns . BUT when I put them in LIKE expression on *PRIMARY KEY* , it occurs an Exception:‍

 select * from "test3" where PK like '中%';  ‍

||COLUMN_NAME||DATA_TYPE||TYPE_NAME
|PK|12|VARCHAR
|VAL|12|VARCHAR

{quote}
org.apache.phoenix.schema.IllegalDataException: CHAR types may only contain single byte characters (中) ‍
at org.apache.phoenix.schema.PDataType$2.toBytes(PDataType.java:216)‍
at org.apache.phoenix.compile.WhereOptimizer$KeyExpressionVisitor.visitLeave(WhereOptimizer.java:829)‍
at org.apache.phoenix.compile.WhereOptimizer$KeyExpressionVisitor.visitLeave(WhereOptimizer.java:349)‍
at org.apache.phoenix.expression.LikeExpression.accept(LikeExpression.java:269)   ‍
at 
....
{quote}

the type of PRIMARY KEY and ‍NONE PRIMARY KEY columns are all ‍VARCHAR‍

In the relative source code:

{code}
byte[] b = VARCHAR.toBytes(object);‍
 if (b.length != ((String) object).length()) {
                throw new IllegalDataException("CHAR types may only contain single byte characters (" + object + ")");
}‍
{code}

actually, Chinese (or other non-Latin) characters will never meet the condition b.length == ((String) object).length() . Default encode method is UTF-8.‍

User following sentences to reappear:

create table "test_c" ( pk varchar primary key , val varchar);
upsert into "test_c" values ('中文','中文');             
select * from "test_c" where VAL like '中%';
_// it works well until now_
select * from "test_c" where PK like '中%';
_// oops..._

  was:
When I use like expression in SELECT query. It works well when I put *Chinese* characters in LIKE expression on NONE PRIMARY KEY columns . BUT when I put them in LIKE expression on *PRIMARY KEY* , it occurs an Exception:‍

 select * from "test3" where PK like '中%';  ‍

{quote}
org.apache.phoenix.schema.IllegalDataException: CHAR types may only contain single byte characters (中) ‍
at org.apache.phoenix.schema.PDataType$2.toBytes(PDataType.java:216)‍
at org.apache.phoenix.compile.WhereOptimizer$KeyExpressionVisitor.visitLeave(WhereOptimizer.java:829)‍
at org.apache.phoenix.compile.WhereOptimizer$KeyExpressionVisitor.visitLeave(WhereOptimizer.java:349)‍
at org.apache.phoenix.expression.LikeExpression.accept(LikeExpression.java:269)   ‍
at 
....
{quote}

the type of PRIMARY KEY and ‍NONE PRIMARY KEY columns are all ‍VARCHAR‍

In the relative source code:

{code}
byte[] b = VARCHAR.toBytes(object);‍
 if (b.length != ((String) object).length()) {
                throw new IllegalDataException("CHAR types may only contain single byte characters (" + object + ")");
}‍
{code}

actually, Chinese (or other non-Latin) characters will never meet the condition b.length == ((String) object).length() . Default encode method is UTF-8.‍


> Issue when LIKE expression contains Chinese characters on Key column
> --------------------------------------------------------------------
>
>                 Key: PHOENIX-1334
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1334
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.1
>         Environment: jdk 1.8 linux
>            Reporter: wuyang
>
> When I use like expression in SELECT query. It works well when I put *Chinese* characters in LIKE expression on NONE PRIMARY KEY columns . BUT when I put them in LIKE expression on *PRIMARY KEY* , it occurs an Exception:‍
>  select * from "test3" where PK like '中%';  ‍
> ||COLUMN_NAME||DATA_TYPE||TYPE_NAME
> |PK|12|VARCHAR
> |VAL|12|VARCHAR
> {quote}
> org.apache.phoenix.schema.IllegalDataException: CHAR types may only contain single byte characters (中) ‍
> at org.apache.phoenix.schema.PDataType$2.toBytes(PDataType.java:216)‍
> at org.apache.phoenix.compile.WhereOptimizer$KeyExpressionVisitor.visitLeave(WhereOptimizer.java:829)‍
> at org.apache.phoenix.compile.WhereOptimizer$KeyExpressionVisitor.visitLeave(WhereOptimizer.java:349)‍
> at org.apache.phoenix.expression.LikeExpression.accept(LikeExpression.java:269)   ‍
> at 
> ....
> {quote}
> the type of PRIMARY KEY and ‍NONE PRIMARY KEY columns are all ‍VARCHAR‍
> In the relative source code:
> {code}
> byte[] b = VARCHAR.toBytes(object);‍
>  if (b.length != ((String) object).length()) {
>                 throw new IllegalDataException("CHAR types may only contain single byte characters (" + object + ")");
> }‍
> {code}
> actually, Chinese (or other non-Latin) characters will never meet the condition b.length == ((String) object).length() . Default encode method is UTF-8.‍
> User following sentences to reappear:
> create table "test_c" ( pk varchar primary key , val varchar);
> upsert into "test_c" values ('中文','中文');             
> select * from "test_c" where VAL like '中%';
> _// it works well until now_
> select * from "test_c" where PK like '中%';
> _// oops..._



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)