You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by java8964 <ja...@hotmail.com> on 2015/09/24 17:33:39 UTC

RE: Deserialize the collection type data from the SSTable file

Hi, Daniel:
I didn't find any branch related to C* 2.1 in the https://github.com/coursera/aegisthus, is there one?
It looks like there are big changes in the C* 2.1 of collections API. Just want to know if there is any CQLMapper for C* 2.1 branch. Meantime, I will also try to understand more about the new changes in C* 2.1
Thanks
Yong

From: danchia@coursera.org
Date: Wed, 10 Jun 2015 09:11:02 -0700
Subject: Re: Deserialize the collection type data from the SSTable file
To: java8964@hotmail.com
CC: user@cassandra.apache.org

Hi Yong,
Glad the code was helpful. I believe it serializes using List<Pair<ByteBuffer, Column>> for maps so that it can store the Key of the map as well.
Thanks for pointing out the edge case!Thanks,Daniel

On Wed, Jun 10, 2015 at 6:39 AM, java8964 <ja...@hotmail.com> wrote:



Thanks, Daniel.
I didn't realize that Cassandra will serialize one more way using List<Pair<ByteBuffer, Column>> for collection types. Reading your example code, I make it work.
>From link you gave me, using my test data, I found out one issue though. In any one row, if the collection column is NULL, I think the logic of code will throw NullPointException on line 148:
Line 145    private void addValue(GenericRecord record, CFDefinition.Name name, ColumnGroupMap group){
Line 146        if (name.type.isCollection()) {
Line 147            List<Pair<ByteBuffer, Column>> collection = group.getCollection(name.name.key);
Line 148            ByteBuffer buffer = ((CollectionType)name.type).serialize(collection);
            addCqlCollectionToRecord(record, name, buffer);
If the collection column in that row is NULL, then Line 147 will return NULL, which will cause the following exception:
Exception in thread "main" java.lang.NullPointerException	at org.apache.cassandra.db.marshal.CollectionType.enforceLimit(CollectionType.java:113)	at org.apache.cassandra.db.marshal.ListType.serialize(ListType.java:120)
So I need to add a check to avoid that, as any regular columns in Cassandra could just have NULL value.
Thanks
Yong
From: danchia@coursera.org
Date: Mon, 8 Jun 2015 15:13:02 -0700
Subject: Re: Deserialize the collection type data from the SSTable file
To: user@cassandra.apache.org

I'm not sure why sstable2json doesn't work for collections, but if you're into reading raw sstables we use the following code with good success:
https://github.com/coursera/aegisthus/blob/77c73f6259f2a30d3d8ca64578be5c13ecc4e6f4/aegisthus-hadoop/src/main/java/org/coursera/mapreducer/CQLMapper.java#L85
Thanks,Daniel

On Mon, Jun 8, 2015 at 1:22 PM, java8964 <ja...@hotmail.com> wrote:



Hi, Cassandra users:
I have a question related to how to Deserialize the new collection types data in the Cassandra 2.x. (The exactly version is C 2.0.10).
I create the following example tables in the CQLSH:
CREATE TABLE coupon (  account_id bigint,  campaign_id uuid,  ........................,  discount_info map<text, text>,  ........................,  PRIMARY KEY (account_id, campaign_id))
The other columns can be ignored in this case. Then I inserted into the one test data like this:
insert into coupon (account_id, campaign_id, discount_info) values (111,uuid(), {'test_key':'test_value'});
After this, I got the SSTable files. I use the sstable2json file to check the output:
$./resources/cassandra/bin/sstable2json /xxx/test-coupon-jb-1-Data.db[{"key": "000000000000006f","columns": [["0336e50d-21aa-4b3a-9f01-989a8c540e54:","",1433792922055000], ["0336e50d-21aa-4b3a-9f01-989a8c540e54:discount_info","0336e50d-21aa-4b3a-9f01-989a8c540e54:discount_info:!",1433792922054999,"t",1433792922], ["0336e50d-21aa-4b3a-9f01-989a8c540e54:discount_info:746573745f6b6579","746573745f76616c7565",1433792922055000]]}] 
What I want to is to get the {"test_key" : "test_value"} as key/value pair that I input into "discount_info" column. I followed the sstable2json code, and try to deserialize the data by myself, but to my surprise, I cannot make it work, even I tried several ways, but kept getting Exception.
>From what I researched, I know that Cassandra put the "campaign_id" + "discount_info" + "Another ByteBuffer" as composite column in this case. When I deserialize this columnName, I got the following dumped out as String:
"0336e50d-21aa-4b3a-9f01-989a8c540e54:discount_info:746573745f6b6579".
It includes 3 parts: the first part is the uuid for the campaign_id. The 2nd part as "discount_info", which is the static name I defined in the table. The 3 part is a bytes array as length of 46, which I am not sure what it is. 
The corresponding value part of this composite column is another byte array as length of 10, hex as "746573745f76616c7565" if I dump it out.
Now, here is what I did and not sure why it doesn't work. First, I assume the value part stores the real value I put in the Map, so I did the following:
ByteBuffer value = ByteBufferUtil.clone(column.value());MapType<String, String> result = MapType.getInstance(UTF8Type.instance, UTF8Type.instance);
Map<String, String> output = result.compose(value);// it gave me the following exception: org.apache.cassandra.serializers.MarshalException: Not enough bytes to read a mapThen I am think that the real value must be stored as part of the column names (the 3rd part of 46 bytes), so I did this:MapType<String, String> result = MapType.getInstance(UTF8Type.instance, UTF8Type.instance);
Map<String, String> output = result.compose(third_part.value);// I got the following exception:java.lang.IllegalArgumentException
	at java.nio.Buffer.limit(Buffer.java:267)
	at org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:587)
	at org.apache.cassandra.utils.ByteBufferUtil.readBytesWithShortLength(ByteBufferUtil.java:596)
	at org.apache.cassandra.serializers.MapSerializer.deserialize(MapSerializer.java:63)
	at org.apache.cassandra.serializers.MapSerializer.deserialize(MapSerializer.java:28)
	at org.apache.cassandra.db.marshal.AbstractType.compose(AbstractType.java:142)
I can get all other non-collection types data, but I cannot get the data from the Map. My questions are:1) How does the Cassandra store the collection data in the SSTable files? From the length of bytes, it is most likely as part of the composite column. If so, why I got the exception as above? 2) The sstable2json doesn't deserialize the real data out from the collection type. So I don't have an example to follow. Do I use the wrong way trying to compose the Map type data?
Thanks
Yong