You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Eric Lee <er...@c11software.com> on 2010/10/16 05:03:32 UTC

Cassandra and Pig - how to get column values?

Hey guys,

I'm having a problem with pig and cassandra and was hoping someone could
point me in the right direction. I've setup Pig and Cassandra and I'm able
to run through the example shown in the README.txt - I can view a list of
top column names. That's all good stuff.

What I would like to do next is just dump out the column values. Suppose I
have a very simple Column Family called User. To that column family, I've
added 2 rows of data, each row just has 1 column 'userName'. I'm using a
GUID as my key.

When I load and dump my rows, I get some data like:

(6c7fef29-16dd-44ca-bde1-f53995b2e818,{(userName,someUserName1)})
(8be0b934-45aa-444f-90e2-ce7137a73b68,{(userName,someUserName2})
(c51fc8ce-2a53-46bb-b872-0f644b972f62,{(userName,someUserName3)})

As I understand it, at this point, the GUID is $0 and $1 is the bag that
contains my columns.

So, like in the README, I run:

cols = FOREACH rows GENERATE flatten($1);

As I understand it, when I flatten a bag, I get a set of tuples. When I dump
cols, I get the following:

(userName,someUserName1)
(userName,someUserName2)
(userName,someUserName3)

If I continue with the README, I would run colnames = FOREACH cols GENERATE
$0 to give me the column names.

I'm a little confused why I only get column names - when I do a describe on
cols, I get the following:

cols: {bytearray}

It seems like $0 should be the entire line (userName,someUserName1), not
just the column name.

Anyways, what I really what is the column value, not the name. Is there a
way to do that? I listed all of the failed attempts I made below.

   - colnames = FOREACH cols GENERATE $1 and was told $1 was out of bounds.
   - casted = FOREACH cols GENERATE (tuple(chararray, chararray))$0; but all
   I got back were empty tuples
   - values = FOREACH cols GENERATE $0.$1; but I got an error telling me
   data byte array can't be casted to tuple

So I'm stuck - any help would be greatly appreciated.

Thanks!

Eric.