You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Fabio Souto <fs...@gmail.com> on 2011/08/17 17:09:58 UTC

How to access subcolumns in cassandra

Hi, 

I have some metrics stored on a Cassandra supercolumn and the subcolumns are the timestamps of each metric, I'm loading the metrics in pig with this line:

all_metrics = LOAD 'cassandra://keyspace/metrics' USING CassandraStorage() AS (metric_key, metrics_bag: bag {metric: tuple(timestamp, columns: bag {record: tuple(name:chararray, value:chararray)})});

I just want to access the timestamp subcolumn to get the most recent value (using max), I try to use:

metric_status = FOREACH all_metrics GENERATE metric_key, metrics_bag.timestamp;
dump metric_status;

but I'm getting empty values, like this:

(key1,{(),(),()})
(key2,{(),()})
...




Re: How to access subcolumns in cassandra

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Datastax ships a Pig 0.8.3? What are they going to do if there is an Apache
0.8.3?

D

On Wed, Aug 17, 2011 at 1:44 PM, Jeremy Hanna <je...@gmail.com>wrote:

> I would go on IRC - irc.freenode.net - and go to the #datastax-brisk and
> ask driftx (Brandon) if he has any suggestions if you're already using pig
> 0.9 - that should be the version that has the nested subcolumns fix on the
> pig side of things.
>
> On Aug 17, 2011, at 3:25 PM, Fabio Souto wrote:
>
> > Hi Jeremy,
> >
> > Well I think the version it's not the problem, I'm using cassandra 0.8.3
> and pig 0.9, also I tried with brisk beta 2(pig 0.8.3). I comment on the
> ticket because it was not clear if it's solved or not, so maybe we can close
> it.
> >
> > Thanks for the response
> >
> > Fabio
> >
> > On 17/08/2011, at 22:14, Jeremy Hanna wrote:
> >
> >> Hi Fabio,
> >>
> >> I'm not sure if super columns are fully supported right now in
> CassandraStorage.  Brandon (who I CCed) would know for sure.  That and I
> thought the pig bug that made it impossible to get to nested data structures
> has been resolved - the ticket you commented on today I think was a
> duplicate of another bug that has been resolved.
> >>
> >> What version of pig and what version of cassandra are you using?
> >>
> >> Jeremy
> >>
> >> On Aug 17, 2011, at 10:09 AM, Fabio Souto wrote:
> >>
> >>> Hi,
> >>>
> >>> I have some metrics stored on a Cassandra supercolumn and the
> subcolumns are the timestamps of each metric, I'm loading the metrics in pig
> with this line:
> >>>
> >>> all_metrics = LOAD 'cassandra://keyspace/metrics' USING
> CassandraStorage() AS (metric_key, metrics_bag: bag {metric:
> tuple(timestamp, columns: bag {record: tuple(name:chararray,
> value:chararray)})});
> >>>
> >>> I just want to access the timestamp subcolumn to get the most recent
> value (using max), I try to use:
> >>>
> >>> metric_status = FOREACH all_metrics GENERATE metric_key,
> metrics_bag.timestamp;
> >>> dump metric_status;
> >>>
> >>> but I'm getting empty values, like this:
> >>>
> >>> (key1,{(),(),()})
> >>> (key2,{(),()})
> >>> ...
> >>>
> >>>
> >>>
> >>
> >
>
>

Re: How to access subcolumns in cassandra

Posted by Jeremy Hanna <je...@gmail.com>.
I would go on IRC - irc.freenode.net - and go to the #datastax-brisk and ask driftx (Brandon) if he has any suggestions if you're already using pig 0.9 - that should be the version that has the nested subcolumns fix on the pig side of things.

On Aug 17, 2011, at 3:25 PM, Fabio Souto wrote:

> Hi Jeremy,
> 
> Well I think the version it's not the problem, I'm using cassandra 0.8.3 and pig 0.9, also I tried with brisk beta 2(pig 0.8.3). I comment on the ticket because it was not clear if it's solved or not, so maybe we can close it.
> 
> Thanks for the response
> 
> Fabio
> 
> On 17/08/2011, at 22:14, Jeremy Hanna wrote:
> 
>> Hi Fabio,
>> 
>> I'm not sure if super columns are fully supported right now in CassandraStorage.  Brandon (who I CCed) would know for sure.  That and I thought the pig bug that made it impossible to get to nested data structures has been resolved - the ticket you commented on today I think was a duplicate of another bug that has been resolved.
>> 
>> What version of pig and what version of cassandra are you using?
>> 
>> Jeremy
>> 
>> On Aug 17, 2011, at 10:09 AM, Fabio Souto wrote:
>> 
>>> Hi, 
>>> 
>>> I have some metrics stored on a Cassandra supercolumn and the subcolumns are the timestamps of each metric, I'm loading the metrics in pig with this line:
>>> 
>>> all_metrics = LOAD 'cassandra://keyspace/metrics' USING CassandraStorage() AS (metric_key, metrics_bag: bag {metric: tuple(timestamp, columns: bag {record: tuple(name:chararray, value:chararray)})});
>>> 
>>> I just want to access the timestamp subcolumn to get the most recent value (using max), I try to use:
>>> 
>>> metric_status = FOREACH all_metrics GENERATE metric_key, metrics_bag.timestamp;
>>> dump metric_status;
>>> 
>>> but I'm getting empty values, like this:
>>> 
>>> (key1,{(),(),()})
>>> (key2,{(),()})
>>> ...
>>> 
>>> 
>>> 
>> 
> 


Re: How to access subcolumns in cassandra

Posted by Fabio Souto <fs...@gmail.com>.
Hi Jeremy,

Well I think the version it's not the problem, I'm using cassandra 0.8.3 and pig 0.9, also I tried with brisk beta 2(pig 0.8.3). I comment on the ticket because it was not clear if it's solved or not, so maybe we can close it.

Thanks for the response

Fabio

On 17/08/2011, at 22:14, Jeremy Hanna wrote:

> Hi Fabio,
> 
> I'm not sure if super columns are fully supported right now in CassandraStorage.  Brandon (who I CCed) would know for sure.  That and I thought the pig bug that made it impossible to get to nested data structures has been resolved - the ticket you commented on today I think was a duplicate of another bug that has been resolved.
> 
> What version of pig and what version of cassandra are you using?
> 
> Jeremy
> 
> On Aug 17, 2011, at 10:09 AM, Fabio Souto wrote:
> 
>> Hi, 
>> 
>> I have some metrics stored on a Cassandra supercolumn and the subcolumns are the timestamps of each metric, I'm loading the metrics in pig with this line:
>> 
>> all_metrics = LOAD 'cassandra://keyspace/metrics' USING CassandraStorage() AS (metric_key, metrics_bag: bag {metric: tuple(timestamp, columns: bag {record: tuple(name:chararray, value:chararray)})});
>> 
>> I just want to access the timestamp subcolumn to get the most recent value (using max), I try to use:
>> 
>> metric_status = FOREACH all_metrics GENERATE metric_key, metrics_bag.timestamp;
>> dump metric_status;
>> 
>> but I'm getting empty values, like this:
>> 
>> (key1,{(),(),()})
>> (key2,{(),()})
>> ...
>> 
>> 
>> 
> 


Re: How to access subcolumns in cassandra

Posted by Jeremy Hanna <je...@gmail.com>.
Hi Fabio,

I'm not sure if super columns are fully supported right now in CassandraStorage.  Brandon (who I CCed) would know for sure.  That and I thought the pig bug that made it impossible to get to nested data structures has been resolved - the ticket you commented on today I think was a duplicate of another bug that has been resolved.

What version of pig and what version of cassandra are you using?

Jeremy

On Aug 17, 2011, at 10:09 AM, Fabio Souto wrote:

> Hi, 
> 
> I have some metrics stored on a Cassandra supercolumn and the subcolumns are the timestamps of each metric, I'm loading the metrics in pig with this line:
> 
> all_metrics = LOAD 'cassandra://keyspace/metrics' USING CassandraStorage() AS (metric_key, metrics_bag: bag {metric: tuple(timestamp, columns: bag {record: tuple(name:chararray, value:chararray)})});
> 
> I just want to access the timestamp subcolumn to get the most recent value (using max), I try to use:
> 
> metric_status = FOREACH all_metrics GENERATE metric_key, metrics_bag.timestamp;
> dump metric_status;
> 
> but I'm getting empty values, like this:
> 
> (key1,{(),(),()})
> (key2,{(),()})
> ...
> 
> 
>