You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Kevin Wiggen <kw...@xythos.com> on 2010/04/12 02:12:29 UTC

Newbie ? with get_range_slices

I have spent the last few days playing with Cassandra and I have attempted to create a simple "Java->Thrift->Cassandra" Discussion Group Server (because the world needs another one) to teach myself the data model and try everything out.

With all the great blog posts on cassandra out there, I am now able to read/write/delete/modify a nested discussion server.  YEA!!!

I decided to have two simple ColumnFamilies.

One called Posts

Post = {
    '7561a442-24e2-11df-8924-001ff3591711': {                    //UUID
        'id': '7561a442-24e2-11df-8924-001ff3591711',            //ID == UUID
        'parent_id': '89da3178-24e2-11df-8924-001ff3591711'      //Parent Post UUID
        'author': 'a4a70900-24e1-11df-8924-001ff3591711',        //Users UUID
        'subject': 'This is a forum post',                       //Subject
        'body': 'Forum post body. This is awesome!',             //Body
        '_ts': '89da3178-24e2-11df-8924-001ff3596713',           //TimeUUID
    },
   }

Where the key is a simple UUID and the columns are the Forum/Post/Replies.  A Forum has a hardcoded Parent UUID which I store in Java, while the Posts and Replies are tied to their parent posts/forums/etc by  the parent_id.  I sort by UTF8Type, but it really doesn't matter in this case as I drive into this map always by the Key and always get all columns (6 of them).

All queries drive into the second ColumnFamily called Threads

Thread = {
     '7561a442-24e2-11df-8924-001ff3591711': {                   //Parent thread UUID
        #timestamp of post: post UUID
        '89da3178-24e2-11df-8924-001ff3596713': '7561a442-24e2-11df-8924-001ff3591711',//TimeUUID column name -> post UUID value
      },
    }

With a Parent UUID I can drive into Threads which will give me the list of Posts/Replies at that level sorted by TimeUUID.  Column name is the post TimeUUID and the value is the Post UUID.  This ColumnFamily is sorted by TimeUUID.

Thus I can walk the tree (of any depth) of Forum/Post/Replies with the Thread table.

I have this all working on a single cassandra node and it works great.  Inserts go to both tables while deletes need to use the Thread ColumnFamily to recursively delete all child posts, the Column in the Parent key of Thread and all associated data in Post.

Any comments on whether this is a good/terrible data model, etc so far are welcome.  :)

My question comes from the fact that during this process I have written/read/deleted many "key->Columns" to these ColumnFamilies (many of which failed half-way through) so I decided to write a "clean" script to remove all data from these ColumnFamilies (much like a truncate table command in SQL).

Using the following Java code

      //get the ID column for each KEY we find
      List<byte[]> l_columns = new ArrayList<byte[]>();
      l_columns.add(Transcoder.encode(ID));
      SlicePredicate l_slicePredicate = new SlicePredicate();
      l_slicePredicate.setColumn_names(l_columns);
      //get 100 keys at a time
      KeyRange keyRange = new  KeyRange(100);
      keyRange.setStart_key("");
      keyRange.setEnd_key("");

      List<KeySlice> l_keySlices = p_context.getClient().get_range_slices("Discussions", new ColumnParent("Posts"),
                                                                          l_slicePredicate, keyRange, ConsistencyLevel.ONE);

I get ALL of the KEYS I ever wrote to the server.  Most of them have no Columns associated with them.  In fact if I query the same key with

      SlicePredicate l_slicePredicate =  new SlicePredicate();
      SliceRange l_sliceRange = new SliceRange();
      l_sliceRange.setStart(new byte[] {});
      l_sliceRange.setFinish(new byte[] {});
      l_slicePredicate.setSlice_range(l_sliceRange);
      List<ColumnOrSuperColumn> l_result =
        p_context.getClient().get_slice("Discussions", <KEY FROM GET_RANGE_SLICES>, new ColumnParent("Posts"),
                                        l_slicePredicate, ConsistencyLevel.ONE);

it returns a empty array list (the same if I give it a KEY it has never seen).

It is OK with me if get_range_slices returns keys with no columns (although it makes it a little harder to explain to others -- is there garbage collection that will clean these out in the future?), however I am stuck on how to simply truncate the table without looping through all the values looking for something that has a Column associated with it and then deleting that key->value.

It is possible I am not deleting correctly as well.  For that I simply do:

p_context.getClient().remove("Discussions", p_postUUID.toString(),
                             new ColumnPath("Posts"), l_rightNow,
                             ConsistencyLevel.ALL);

Just trying to understand what I am getting and compare it against what I expected.  I am also still trying to write a simple "clean" command.

If you read this far, thanks....  If you can add some clarity it would help me.  I have tried to find it in archives and blog posts, but I didn't see anything.

Thanks,
Kevin




This email and any attachments may contain confidential and proprietary information of Xythos that is for the sole use of the intended recipient. If you are not the intended recipient, disclosure, copying, re-distribution or other use of any of this information is strictly prohibited. Please immediately notify the sender and delete this transmission if you received this email in error.

Re: Newbie ? with get_range_slices

Posted by Benjamin Black <b...@b3k.us>.
http://wiki.apache.org/cassandra/FAQ#range_ghosts

On Sun, Apr 11, 2010 at 5:12 PM, Kevin Wiggen <kw...@xythos.com> wrote:
>
> I have spent the last few days playing with Cassandra and I have attempted
> to create a simple "Java->Thrift->Cassandra" Discussion Group Server
> (because the world needs another one) to teach myself the data model and try
> everything out.
> With all the great blog posts on cassandra out there, I am now able to
> read/write/delete/modify a nested discussion server.  YEA!!!
> I decided to have two simple ColumnFamilies.
> One called Posts
> Post = {
>     '7561a442-24e2-11df-8924-001ff3591711': {                    //UUID
>         'id': '7561a442-24e2-11df-8924-001ff3591711',            //ID ==
> UUID
>         'parent_id': '89da3178-24e2-11df-8924-001ff3591711'      //Parent
> Post UUID
>         'author': 'a4a70900-24e1-11df-8924-001ff3591711',        //Users
> UUID
>         'subject': 'This is a forum post',                       //Subject
>         'body': 'Forum post body. This is awesome!',             //Body
>         '_ts': '89da3178-24e2-11df-8924-001ff3596713',           //TimeUUID
>     },
>    }
> Where the key is a simple UUID and the columns are the Forum/Post/Replies.
>  A Forum has a hardcoded Parent UUID which I store in Java, while the Posts
> and Replies are tied to their parent posts/forums/etc by  the parent_id.  I
> sort by UTF8Type, but it really doesn't matter in this case as I drive into
> this map always by the Key and always get all columns (6 of them).
> All queries drive into the second ColumnFamily called Threads
> Thread = {
>      '7561a442-24e2-11df-8924-001ff3591711': {                   //Parent
> thread UUID
>         #timestamp of post: post UUID
>         '89da3178-24e2-11df-8924-001ff3596713':
> '7561a442-24e2-11df-8924-001ff3591711',//TimeUUID column name -> post
> UUID value
>       },
>     }
> With a Parent UUID I can drive into Threads which will give me the list of
> Posts/Replies at that level sorted by TimeUUID.  Column name is the post
> TimeUUID and the value is the Post UUID.  This ColumnFamily is sorted by
> TimeUUID.
> Thus I can walk the tree (of any depth) of Forum/Post/Replies with the
> Thread table.
> I have this all working on a single cassandra node and it works great.
>  Inserts go to both tables while deletes need to use the Thread ColumnFamily
> to recursively delete all child posts, the Column in the Parent key of
> Thread and all associated data in Post.
> Any comments on whether this is a good/terrible data model, etc so far are
> welcome.  :)
> My question comes from the fact that during this process I have
> written/read/deleted many "key->Columns" to these ColumnFamilies (many of
> which failed half-way through) so I decided to write a "clean" script to
> remove all data from these ColumnFamilies (much like a truncate table
> command in SQL).
> Using the following Java code
>       //get the ID column for each KEY we find
>       List<byte[]> l_columns = new ArrayList<byte[]>();
>       l_columns.add(Transcoder.encode(ID));
>       SlicePredicate l_slicePredicate = new SlicePredicate();
>       l_slicePredicate.setColumn_names(l_columns);
>       //get 100 keys at a time
>       KeyRange keyRange = new  KeyRange(100);
>       keyRange.setStart_key("");
>       keyRange.setEnd_key("");
>       List<KeySlice> l_keySlices =
> p_context.getClient().get_range_slices("Discussions", new
> ColumnParent("Posts"),
>
> l_slicePredicate, keyRange, ConsistencyLevel.ONE);
> I get ALL of the KEYS I ever wrote to the server.  Most of them have no
> Columns associated with them.  In fact if I query the same key with
>       SlicePredicate l_slicePredicate =  new SlicePredicate();
>       SliceRange l_sliceRange = new SliceRange();
>       l_sliceRange.setStart(new byte[] {});
>       l_sliceRange.setFinish(new byte[] {});
>       l_slicePredicate.setSlice_range(l_sliceRange);
>       List<ColumnOrSuperColumn> l_result =
>         p_context.getClient().get_slice("Discussions", <KEY FROM
> GET_RANGE_SLICES>, new ColumnParent("Posts"),
>                                         l_slicePredicate,
> ConsistencyLevel.ONE);
> it returns a empty array list (the same if I give it a KEY it has never
> seen).
> It is OK with me if get_range_slices returns keys with no columns (although
> it makes it a little harder to explain to others -- is there garbage
> collection that will clean these out in the future?), however I am stuck on
> how to simply truncate the table without looping through all the values
> looking for something that has a Column associated with it and then deleting
> that key->value.
> It is possible I am not deleting correctly as well.  For that I simply do:
> p_context.getClient().remove("Discussions", p_postUUID.toString(),
>                              new ColumnPath("Posts"), l_rightNow,
>                              ConsistencyLevel.ALL);
> Just trying to understand what I am getting and compare it against what I
> expected.  I am also still trying to write a simple "clean" command.
> If you read this far, thanks....  If you can add some clarity it would help
> me.  I have tried to find it in archives and blog posts, but I didn't see
> anything.
> Thanks,
> Kevin
>
>
> This email and any attachments may contain confidential and proprietary
> information of Xythos that is for the sole use of the intended recipient. If
> you are not the intended recipient, disclosure, copying, re-distribution or
> other use of any of this information is strictly prohibited. Please
> immediately notify the sender and delete this transmission if you received
> this email in error.
>