You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sylvain Lebresne (JIRA)" <ji...@apache.org> on 2013/01/30 10:25:12 UTC

[jira] [Commented] (CASSANDRA-5198) token () function automatically coerces types leading to confusing output

    [ https://issues.apache.org/jira/browse/CASSANDRA-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13566324#comment-13566324 ] 

Sylvain Lebresne commented on CASSANDRA-5198:
---------------------------------------------

So what happens is that {{token(0)}} is basically interpreted as the equivalent of {{token('0')}}.

Now this is not specific to the token method at all. With the table used above, you can do:
{noformat}
INSERT INTO users(username, firstname, lastname) VALUES (12, 42, 0)
{noformat}
and that will have the same effect than
{noformat}
INSERT INTO users(username, firstname, lastname) VALUES ('12', '42', '0')
{noformat}
In the same spirit, you can insert value {{'12'}} in an int column.

Now is that a good idea? I'm not sure indeed. This is not really intentional and is more of an oversight (more precisely it's an inheritance of CQL2 that has never been fixed).

I'm fine fixing it (thus fixing the token special case), and in fact in favor of fixing it, but of course that will break anyone that relies on this loose validation (which may be no-one). Though I guess "not validating types" is more of a bug than a feature.


bq. I feel like I should be able to say 'token(username) > 0 and token(username) < 10'

You can with the caveat that currently the token needs to be quoted, so {{token(username) > '0' and token(username) < '10'}}. The vague rational was that tokens are not always ints (i.e they are not in the case of OPP) so we only accept a string and pass that to the partitionner fromString method, oblivious to what the token type actually is. *But* this is definitively neither intuitive, nor coherent with the behavior described above. So I suggest that if we do the change suggested above of actually doing type validation, we also use the occasion for properly typing tokens.

bq. because the token is not returned to the user

On that part I'll not that it is true with thrift too. If you want to page tokens thrift side, you have to compute the token from the keys returned. That being said, I'm not opposed to allow the token function in select clause so you can do
{noformat}
SELECT username, token(username) FROM ...
{noformat}
to save the token computation client side.

                
> token () function automatically coerces types leading to confusing output
> -------------------------------------------------------------------------
>
>                 Key: CASSANDRA-5198
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5198
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.2.1
>            Reporter: Edward Capriolo
>            Priority: Minor
>
> This works as it should.
> {noformat}
> cqlsh:movies> select * from users where token (username) > token('') ;
>  username  | created_date | email | firstname | lastname | password
> -----------+--------------+-------+-----------+----------+----------
>     bsmith |         null |  null |       bob |    smith |     null
>  scapriolo |         null |  null |    stacey | capriolo |     null
>  ecapriolo |         null |  null |    edward | capriolo |     null
> cqlsh:movies> select * from users where token (username) > token('bsmith') ;
>  username  | created_date | email | firstname | lastname | password
> -----------+--------------+-------+-----------+----------+----------
>  scapriolo |         null |  null |    stacey | capriolo |     null
>  ecapriolo |         null |  null |    edward | capriolo |     null
> cqlsh:movies> select * from users where token (username) > token('scapriolo') ;
>  username  | created_date | email | firstname | lastname | password
> -----------+--------------+-------+-----------+----------+----------
>  ecapriolo |         null |  null |    edward | capriolo |     null
> {noformat}
> But look what happens when you supply numbers into the token function.
> {noformat}
> qlsh:movies> select * from users where token (username) > token(0) ;
>  username  | created_date | email | firstname | lastname | password
> -----------+--------------+-------+-----------+----------+----------
>  ecapriolo |         null |  null |    edward | capriolo |     null
> cqlsh:movies> select * from users where token (username) > token(1134314) ;
>  username  | created_date | email | firstname | lastname | password
> -----------+--------------+-------+-----------+----------+----------
>     bsmith |         null |  null |       bob |    smith |     null
>  scapriolo |         null |  null |    stacey | capriolo |     null
>  ecapriolo |         null |  null |    edward | capriolo |     null
> cqlsh:movies> select * from users where token (username) > token(113431431) ;
>  username  | created_date | email | firstname | lastname | password
> -----------+--------------+-------+-----------+----------+----------
>  scapriolo |         null |  null |    stacey | capriolo |     null
>  ecapriolo |         null |  null |    edward | capriolo |     null
> cqlsh:movies> select * from users where token (username) > token(1134) ;
>  username  | created_date | email | firstname | lastname | password
> -----------+--------------+-------+-----------+----------+----------
>  ecapriolo |         null |  null |    edward | capriolo |     null
> cqlsh:movies> select * from users where token (username) > token(1134434) ;
>  username  | created_date | email | firstname | lastname | password
> -----------+--------------+-------+-----------+----------+----------
>  scapriolo |         null |  null |    stacey | capriolo |     null
> {noformat}
> This does not make sense to me. The token function is apparently converting integers to strings leading to seemingly unpredictable results. 
> However I find this syntax odd, I feel like I should be able to say 
> 'token(username) > 0 and token(username) < 10' because from a thrift side I can page tokens or I can page keys. In this case, I guess, I am only able to page keys because the token is not returned to the user.
> Is token 0 = ''? How do I arrive at the minimal token for and int column. 
> Should the token() function at least be smart enough to reject integers for string columns?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira