You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Todd Nine (JIRA)" <ji...@apache.org> on 2010/06/28 05:02:49 UTC

[jira] Created: (CASSANDRA-1235) BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped

BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped
-----------------------------------------------------------------------------------------

                 Key: CASSANDRA-1235
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1235
             Project: Cassandra
          Issue Type: Bug
    Affects Versions: 0.6.2
         Environment: Java 1.6 sun JDK 
Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, 

Ubuntu 10.04 64 bit
            Reporter: Todd Nine
            Priority: Blocker


When running the two tests, individual column insert works with the values generated.  However, batch insert with the same values causes an encoding failure on the key.  It appears bytes are dropped from the end of the byte array that represents the key value.  See the attached unit test

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Reopened: (CASSANDRA-1235) BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped

Posted by "Jeremy Hanna (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeremy Hanna reopened CASSANDRA-1235:
-------------------------------------


> BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped
> -----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1235
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1235
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.6
>         Environment: Java 1.6 sun JDK 
> Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, 
> Ubuntu 10.04 64 bit
>            Reporter: Todd Nine
>            Assignee: Folke Behrens
>            Priority: Critical
>             Fix For: 0.6.5
>
>         Attachments: rowmutation-key-trimming.patch, TestByteKeys.py, TestEncodedKeys.java
>
>
> When running the two tests, individual column insert works with the values generated.  However, batch insert with the same values causes an encoding failure on the key.  It appears bytes are dropped from the end of the byte array that represents the key value.  See the attached unit test

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1235) BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped

Posted by "Folke Behrens (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Folke Behrens updated CASSANDRA-1235:
-------------------------------------

    Attachment: rowmutation-key-trimming.patch

This patch fixes the problem but I don't know if other problems will arise.

> BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped
> -----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1235
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1235
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.6
>         Environment: Java 1.6 sun JDK 
> Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, 
> Ubuntu 10.04 64 bit
>            Reporter: Todd Nine
>            Priority: Critical
>             Fix For: 0.6.5
>
>         Attachments: rowmutation-key-trimming.patch, TestEncodedKeys.java
>
>
> When running the two tests, individual column insert works with the values generated.  However, batch insert with the same values causes an encoding failure on the key.  It appears bytes are dropped from the end of the byte array that represents the key value.  See the attached unit test

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1235) BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-1235:
--------------------------------------

        Fix Version/s: 0.6.4
    Affects Version/s: 0.6
                           (was: 0.6.2)
             Priority: Critical  (was: Blocker)

Please don't mess with the issue metadata.

> BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped
> -----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1235
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1235
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.6
>         Environment: Java 1.6 sun JDK 
> Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, 
> Ubuntu 10.04 64 bit
>            Reporter: Todd Nine
>            Priority: Critical
>             Fix For: 0.6.4
>
>         Attachments: TestEncodedKeys.java
>
>
> When running the two tests, individual column insert works with the values generated.  However, batch insert with the same values causes an encoding failure on the key.  It appears bytes are dropped from the end of the byte array that represents the key value.  See the attached unit test

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (CASSANDRA-1235) BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped

Posted by "Todd Nine (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883261#action_12883261 ] 

Todd Nine edited comment on CASSANDRA-1235 at 6/28/10 3:42 PM:
---------------------------------------------------------------

No worries, sorry about that, I just realized the affected version was incorrect.  Where can I look to begin fixing this? Unfortunately this issue has caused our development to a halt since we depend on the functionality of numeric range queries in Lucene/Lucandra.  Ideally I'd like to create a patch that applies to 0.6.2 so we can roll our own build with the patch and get running again.  I'm assuming it's an issue with the thrift server, but I don't want to start tweaking things without a good idea on where I should be looking for this issue.

Here's an example in hex.  The left is what I pass as bytes in UTF-8 for the key, the right is what I get back during get_range_slice.

http://pastebin.com/KM8Ze794


      was (Author: tnine):
    No worries, sorry about that, I just realized the affected version was incorrect.  Where can I look to begin fixing this? Unfortunately this issue has caused our development to a halt since we depend on the functionality of numeric range queries in Lucene/Lucandra.  Ideally I'd like to create a patch that applies to 0.6.2 so we can roll our own build with the patch and get running again.  I'm assuming it's an issue with the thrift server, but I don't want to start tweaking things without a good idea on where I should be looking for this issue.
  
> BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped
> -----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1235
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1235
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.6
>         Environment: Java 1.6 sun JDK 
> Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, 
> Ubuntu 10.04 64 bit
>            Reporter: Todd Nine
>            Priority: Critical
>             Fix For: 0.6.4
>
>         Attachments: TestEncodedKeys.java
>
>
> When running the two tests, individual column insert works with the values generated.  However, batch insert with the same values causes an encoding failure on the key.  It appears bytes are dropped from the end of the byte array that represents the key value.  See the attached unit test

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1235) BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped

Posted by "Pierre Matri (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pierre Matri updated CASSANDRA-1235:
------------------------------------

    Attachment: TestByteKeys.py

> BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped
> -----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1235
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1235
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.6
>         Environment: Java 1.6 sun JDK 
> Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, 
> Ubuntu 10.04 64 bit
>            Reporter: Todd Nine
>            Assignee: Folke Behrens
>            Priority: Critical
>             Fix For: 0.6.5
>
>         Attachments: rowmutation-key-trimming.patch, TestByteKeys.py, TestEncodedKeys.java
>
>
> When running the two tests, individual column insert works with the values generated.  However, batch insert with the same values causes an encoding failure on the key.  It appears bytes are dropped from the end of the byte array that represents the key value.  See the attached unit test

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1235) BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898355#action_12898355 ] 

Uwe Schindler commented on CASSANDRA-1235:
------------------------------------------

For Lucandra/Lucene this is fine, too (at the moment, as all terms are strings here. Even binary numbers are correcty UTF-8 encoded terms).

> BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped
> -----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1235
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1235
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.6
>         Environment: Java 1.6 sun JDK 
> Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, 
> Ubuntu 10.04 64 bit
>            Reporter: Todd Nine
>            Assignee: Folke Behrens
>            Priority: Critical
>             Fix For: 0.6.5
>
>         Attachments: rowmutation-key-trimming.patch, TestByteKeys.py, TestEncodedKeys.java
>
>
> When running the two tests, individual column insert works with the values generated.  However, batch insert with the same values causes an encoding failure on the key.  It appears bytes are dropped from the end of the byte array that represents the key value.  See the attached unit test

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1235) BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-1235:
--------------------------------------

    Fix Version/s: 0.6.5
                       (was: 0.6.4)

> BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped
> -----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1235
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1235
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.6
>         Environment: Java 1.6 sun JDK 
> Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, 
> Ubuntu 10.04 64 bit
>            Reporter: Todd Nine
>            Priority: Critical
>             Fix For: 0.6.5
>
>         Attachments: TestEncodedKeys.java
>
>
> When running the two tests, individual column insert works with the values generated.  However, batch insert with the same values causes an encoding failure on the key.  It appears bytes are dropped from the end of the byte array that represents the key value.  See the attached unit test

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1235) BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped

Posted by "Todd Nine (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886429#action_12886429 ] 

Todd Nine commented on CASSANDRA-1235:
--------------------------------------

While I'm in agreement with Uwe, my bigger concern is that two tests that are functionally equivalent return different results based on the mutation operations.  Performing a batch mutate with the same insertion data as a single write should insert and the same bytes.  Unfortunately batch mutate appears to be randomly dropping bytes.  If it were a true UTF8 issue, wouldn't it drop bytes on the single column writes as well batch mutate?

> BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped
> -----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1235
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1235
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.6
>         Environment: Java 1.6 sun JDK 
> Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, 
> Ubuntu 10.04 64 bit
>            Reporter: Todd Nine
>            Priority: Critical
>             Fix For: 0.6.4
>
>         Attachments: TestEncodedKeys.java
>
>
> When running the two tests, individual column insert works with the values generated.  However, batch insert with the same values causes an encoding failure on the key.  It appears bytes are dropped from the end of the byte array that represents the key value.  See the attached unit test

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1235) BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped

Posted by "Todd Nine (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883261#action_12883261 ] 

Todd Nine commented on CASSANDRA-1235:
--------------------------------------

No worries, sorry about that, I just realized the affected version was incorrect.  Where can I look to begin fixing this? Unfortunately this issue has caused our development to a halt since we depend on the functionality of numeric range queries in Lucene/Lucandra.  Ideally I'd like to create a patch that applies to 0.6.2 so we can roll our own build with the patch and get running again.  I'm assuming it's an issue with the thrift server, but I don't want to start tweaking things without a good idea on where I should be looking for this issue.

> BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped
> -----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1235
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1235
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.6
>         Environment: Java 1.6 sun JDK 
> Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, 
> Ubuntu 10.04 64 bit
>            Reporter: Todd Nine
>            Priority: Critical
>             Fix For: 0.6.4
>
>         Attachments: TestEncodedKeys.java
>
>
> When running the two tests, individual column insert works with the values generated.  However, batch insert with the same values causes an encoding failure on the key.  It appears bytes are dropped from the end of the byte array that represents the key value.  See the attached unit test

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1235) BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886449#action_12886449 ] 

Jonathan Ellis commented on CASSANDRA-1235:
-------------------------------------------

That has to be a Thrift bug, then -- the insert and batch_mutate method both end up calling StorageProxy.mutate or mutateBlocking after converting the Thrift objects into RowMutations

> BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped
> -----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1235
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1235
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.6
>         Environment: Java 1.6 sun JDK 
> Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, 
> Ubuntu 10.04 64 bit
>            Reporter: Todd Nine
>            Priority: Critical
>             Fix For: 0.6.4
>
>         Attachments: TestEncodedKeys.java
>
>
> When running the two tests, individual column insert works with the values generated.  However, batch insert with the same values causes an encoding failure on the key.  It appears bytes are dropped from the end of the byte array that represents the key value.  See the attached unit test

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1235) BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped

Posted by "Todd Nine (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Nine updated CASSANDRA-1235:
---------------------------------

    Attachment: TestEncodedKeys.java

This file demonstrates the broken input.  Notice that the first test passes with clean input.  The second one fails utilizing batch write for the same input keys.

> BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped
> -----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1235
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1235
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.6.2
>         Environment: Java 1.6 sun JDK 
> Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, 
> Ubuntu 10.04 64 bit
>            Reporter: Todd Nine
>            Priority: Blocker
>         Attachments: TestEncodedKeys.java
>
>
> When running the two tests, individual column insert works with the values generated.  However, batch insert with the same values causes an encoding failure on the key.  It appears bytes are dropped from the end of the byte array that represents the key value.  See the attached unit test

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (CASSANDRA-1235) BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis resolved CASSANDRA-1235.
---------------------------------------

    Resolution: Fixed

0.6 row keys are _strings_ which means they must be utf-8 encoded, although your version of thrift for python doesn't enforce that (see THRIFT-395).

> BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped
> -----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1235
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1235
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.6
>         Environment: Java 1.6 sun JDK 
> Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, 
> Ubuntu 10.04 64 bit
>            Reporter: Todd Nine
>            Assignee: Folke Behrens
>            Priority: Critical
>             Fix For: 0.6.5
>
>         Attachments: rowmutation-key-trimming.patch, TestByteKeys.py, TestEncodedKeys.java
>
>
> When running the two tests, individual column insert works with the values generated.  However, batch insert with the same values causes an encoding failure on the key.  It appears bytes are dropped from the end of the byte array that represents the key value.  See the attached unit test

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1235) BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped

Posted by "Todd Nine (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892487#action_12892487 ] 

Todd Nine commented on CASSANDRA-1235:
--------------------------------------

I'm currently out of the office and will return on 2010-07-27.  If
this is an urgent request, please mail support@spidertracks.com.


> BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped
> -----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1235
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1235
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.6
>         Environment: Java 1.6 sun JDK 
> Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, 
> Ubuntu 10.04 64 bit
>            Reporter: Todd Nine
>            Priority: Critical
>             Fix For: 0.6.5
>
>         Attachments: TestEncodedKeys.java
>
>
> When running the two tests, individual column insert works with the values generated.  However, batch insert with the same values causes an encoding failure on the key.  It appears bytes are dropped from the end of the byte array that represents the key value.  See the attached unit test

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1235) BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped

Posted by "Todd Nine (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Nine updated CASSANDRA-1235:
---------------------------------

    Fix Version/s:     (was: 0.6.4)

> BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped
> -----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1235
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1235
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.6.2
>         Environment: Java 1.6 sun JDK 
> Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, 
> Ubuntu 10.04 64 bit
>            Reporter: Todd Nine
>            Priority: Critical
>         Attachments: TestEncodedKeys.java
>
>
> When running the two tests, individual column insert works with the values generated.  However, batch insert with the same values causes an encoding failure on the key.  It appears bytes are dropped from the end of the byte array that represents the key value.  See the attached unit test

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1235) BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886357#action_12886357 ] 

Uwe Schindler commented on CASSANDRA-1235:
------------------------------------------

buffer is char[], so there is no conversion at all, new String(char[]) only copies the char[] to the internal String's char[]. longToPrefixCoded is definitely correct, large parts of Lucene Java are based on this :-)

(from the Lucene Generics and Unicode Policeman)

> BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped
> -----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1235
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1235
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.6
>         Environment: Java 1.6 sun JDK 
> Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, 
> Ubuntu 10.04 64 bit
>            Reporter: Todd Nine
>            Priority: Critical
>             Fix For: 0.6.4
>
>         Attachments: TestEncodedKeys.java
>
>
> When running the two tests, individual column insert works with the values generated.  However, batch insert with the same values causes an encoding failure on the key.  It appears bytes are dropped from the end of the byte array that represents the key value.  See the attached unit test

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1235) BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886346#action_12886346 ] 

Jonathan Ellis commented on CASSANDRA-1235:
-------------------------------------------

I believe that

                return new String(buffer, 0, len);

will treat buffer as UTF-16, not UTF-8.  you want

                return new String(buffer, 0, len, "UTF8");

I'm not at all sure that longToPrefixCoded is going to generate valid UTF-8, either.

> BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped
> -----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1235
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1235
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.6
>         Environment: Java 1.6 sun JDK 
> Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, 
> Ubuntu 10.04 64 bit
>            Reporter: Todd Nine
>            Priority: Critical
>             Fix For: 0.6.4
>
>         Attachments: TestEncodedKeys.java
>
>
> When running the two tests, individual column insert works with the values generated.  However, batch insert with the same values causes an encoding failure on the key.  It appears bytes are dropped from the end of the byte array that represents the key value.  See the attached unit test

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1235) BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-1235:
--------------------------------------

        Fix Version/s: 0.6.4
    Affects Version/s: 0.6
                           (was: 0.6.2)
             Priority: Critical  (was: Blocker)

> BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped
> -----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1235
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1235
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.6
>         Environment: Java 1.6 sun JDK 
> Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, 
> Ubuntu 10.04 64 bit
>            Reporter: Todd Nine
>            Priority: Critical
>             Fix For: 0.6.4
>
>         Attachments: TestEncodedKeys.java
>
>
> When running the two tests, individual column insert works with the values generated.  However, batch insert with the same values causes an encoding failure on the key.  It appears bytes are dropped from the end of the byte array that represents the key value.  See the attached unit test

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1235) BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped

Posted by "Pierre Matri (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897928#action_12897928 ] 

Pierre Matri commented on CASSANDRA-1235:
-----------------------------------------

Still experiencing some problems with byte keys. The file "TestByteKeys.py" demonstrates the problem.
Tested with revision 984926.

> BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped
> -----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1235
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1235
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.6
>         Environment: Java 1.6 sun JDK 
> Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, 
> Ubuntu 10.04 64 bit
>            Reporter: Todd Nine
>            Assignee: Folke Behrens
>            Priority: Critical
>             Fix For: 0.6.5
>
>         Attachments: rowmutation-key-trimming.patch, TestByteKeys.py, TestEncodedKeys.java
>
>
> When running the two tests, individual column insert works with the values generated.  However, batch insert with the same values causes an encoding failure on the key.  It appears bytes are dropped from the end of the byte array that represents the key value.  See the attached unit test

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1235) BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped

Posted by "Todd Nine (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Nine updated CASSANDRA-1235:
---------------------------------

    Affects Version/s: 0.6.2
                           (was: 0.6)

> BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped
> -----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1235
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1235
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.6.2
>         Environment: Java 1.6 sun JDK 
> Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, 
> Ubuntu 10.04 64 bit
>            Reporter: Todd Nine
>            Priority: Blocker
>         Attachments: TestEncodedKeys.java
>
>
> When running the two tests, individual column insert works with the values generated.  However, batch insert with the same values causes an encoding failure on the key.  It appears bytes are dropped from the end of the byte array that represents the key value.  See the attached unit test

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1235) BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped

Posted by "Todd Nine (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Nine updated CASSANDRA-1235:
---------------------------------

    Priority: Blocker  (was: Critical)

> BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped
> -----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1235
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1235
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.6.2
>         Environment: Java 1.6 sun JDK 
> Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, 
> Ubuntu 10.04 64 bit
>            Reporter: Todd Nine
>            Priority: Blocker
>         Attachments: TestEncodedKeys.java
>
>
> When running the two tests, individual column insert works with the values generated.  However, batch insert with the same values causes an encoding failure on the key.  It appears bytes are dropped from the end of the byte array that represents the key value.  See the attached unit test

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (CASSANDRA-1235) BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis resolved CASSANDRA-1235.
---------------------------------------

      Assignee: Folke Behrens
    Resolution: Fixed

nice fix.  (It's possible that this would break someone relying on it, but it's clearly broken the way it is.)  committed.

> BytesType and batch mutate causes encoded bytes of non-printable characters to be dropped
> -----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-1235
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1235
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.6
>         Environment: Java 1.6 sun JDK 
> Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, 
> Ubuntu 10.04 64 bit
>            Reporter: Todd Nine
>            Assignee: Folke Behrens
>            Priority: Critical
>             Fix For: 0.6.5
>
>         Attachments: rowmutation-key-trimming.patch, TestEncodedKeys.java
>
>
> When running the two tests, individual column insert works with the values generated.  However, batch insert with the same values causes an encoding failure on the key.  It appears bytes are dropped from the end of the byte array that represents the key value.  See the attached unit test

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.