You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Fabio Souto <fs...@gmail.com> on 2011/08/17 17:09:58 UTC
How to access subcolumns in cassandra
Hi,
I have some metrics stored on a Cassandra supercolumn and the subcolumns are the timestamps of each metric, I'm loading the metrics in pig with this line:
all_metrics = LOAD 'cassandra://keyspace/metrics' USING CassandraStorage() AS (metric_key, metrics_bag: bag {metric: tuple(timestamp, columns: bag {record: tuple(name:chararray, value:chararray)})});
I just want to access the timestamp subcolumn to get the most recent value (using max), I try to use:
metric_status = FOREACH all_metrics GENERATE metric_key, metrics_bag.timestamp;
dump metric_status;
but I'm getting empty values, like this:
(key1,{(),(),()})
(key2,{(),()})
...
Re: How to access subcolumns in cassandra
Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Datastax ships a Pig 0.8.3? What are they going to do if there is an Apache
0.8.3?
D
On Wed, Aug 17, 2011 at 1:44 PM, Jeremy Hanna <je...@gmail.com>wrote:
> I would go on IRC - irc.freenode.net - and go to the #datastax-brisk and
> ask driftx (Brandon) if he has any suggestions if you're already using pig
> 0.9 - that should be the version that has the nested subcolumns fix on the
> pig side of things.
>
> On Aug 17, 2011, at 3:25 PM, Fabio Souto wrote:
>
> > Hi Jeremy,
> >
> > Well I think the version it's not the problem, I'm using cassandra 0.8.3
> and pig 0.9, also I tried with brisk beta 2(pig 0.8.3). I comment on the
> ticket because it was not clear if it's solved or not, so maybe we can close
> it.
> >
> > Thanks for the response
> >
> > Fabio
> >
> > On 17/08/2011, at 22:14, Jeremy Hanna wrote:
> >
> >> Hi Fabio,
> >>
> >> I'm not sure if super columns are fully supported right now in
> CassandraStorage. Brandon (who I CCed) would know for sure. That and I
> thought the pig bug that made it impossible to get to nested data structures
> has been resolved - the ticket you commented on today I think was a
> duplicate of another bug that has been resolved.
> >>
> >> What version of pig and what version of cassandra are you using?
> >>
> >> Jeremy
> >>
> >> On Aug 17, 2011, at 10:09 AM, Fabio Souto wrote:
> >>
> >>> Hi,
> >>>
> >>> I have some metrics stored on a Cassandra supercolumn and the
> subcolumns are the timestamps of each metric, I'm loading the metrics in pig
> with this line:
> >>>
> >>> all_metrics = LOAD 'cassandra://keyspace/metrics' USING
> CassandraStorage() AS (metric_key, metrics_bag: bag {metric:
> tuple(timestamp, columns: bag {record: tuple(name:chararray,
> value:chararray)})});
> >>>
> >>> I just want to access the timestamp subcolumn to get the most recent
> value (using max), I try to use:
> >>>
> >>> metric_status = FOREACH all_metrics GENERATE metric_key,
> metrics_bag.timestamp;
> >>> dump metric_status;
> >>>
> >>> but I'm getting empty values, like this:
> >>>
> >>> (key1,{(),(),()})
> >>> (key2,{(),()})
> >>> ...
> >>>
> >>>
> >>>
> >>
> >
>
>
Re: How to access subcolumns in cassandra
Posted by Jeremy Hanna <je...@gmail.com>.
I would go on IRC - irc.freenode.net - and go to the #datastax-brisk and ask driftx (Brandon) if he has any suggestions if you're already using pig 0.9 - that should be the version that has the nested subcolumns fix on the pig side of things.
On Aug 17, 2011, at 3:25 PM, Fabio Souto wrote:
> Hi Jeremy,
>
> Well I think the version it's not the problem, I'm using cassandra 0.8.3 and pig 0.9, also I tried with brisk beta 2(pig 0.8.3). I comment on the ticket because it was not clear if it's solved or not, so maybe we can close it.
>
> Thanks for the response
>
> Fabio
>
> On 17/08/2011, at 22:14, Jeremy Hanna wrote:
>
>> Hi Fabio,
>>
>> I'm not sure if super columns are fully supported right now in CassandraStorage. Brandon (who I CCed) would know for sure. That and I thought the pig bug that made it impossible to get to nested data structures has been resolved - the ticket you commented on today I think was a duplicate of another bug that has been resolved.
>>
>> What version of pig and what version of cassandra are you using?
>>
>> Jeremy
>>
>> On Aug 17, 2011, at 10:09 AM, Fabio Souto wrote:
>>
>>> Hi,
>>>
>>> I have some metrics stored on a Cassandra supercolumn and the subcolumns are the timestamps of each metric, I'm loading the metrics in pig with this line:
>>>
>>> all_metrics = LOAD 'cassandra://keyspace/metrics' USING CassandraStorage() AS (metric_key, metrics_bag: bag {metric: tuple(timestamp, columns: bag {record: tuple(name:chararray, value:chararray)})});
>>>
>>> I just want to access the timestamp subcolumn to get the most recent value (using max), I try to use:
>>>
>>> metric_status = FOREACH all_metrics GENERATE metric_key, metrics_bag.timestamp;
>>> dump metric_status;
>>>
>>> but I'm getting empty values, like this:
>>>
>>> (key1,{(),(),()})
>>> (key2,{(),()})
>>> ...
>>>
>>>
>>>
>>
>
Re: How to access subcolumns in cassandra
Posted by Fabio Souto <fs...@gmail.com>.
Hi Jeremy,
Well I think the version it's not the problem, I'm using cassandra 0.8.3 and pig 0.9, also I tried with brisk beta 2(pig 0.8.3). I comment on the ticket because it was not clear if it's solved or not, so maybe we can close it.
Thanks for the response
Fabio
On 17/08/2011, at 22:14, Jeremy Hanna wrote:
> Hi Fabio,
>
> I'm not sure if super columns are fully supported right now in CassandraStorage. Brandon (who I CCed) would know for sure. That and I thought the pig bug that made it impossible to get to nested data structures has been resolved - the ticket you commented on today I think was a duplicate of another bug that has been resolved.
>
> What version of pig and what version of cassandra are you using?
>
> Jeremy
>
> On Aug 17, 2011, at 10:09 AM, Fabio Souto wrote:
>
>> Hi,
>>
>> I have some metrics stored on a Cassandra supercolumn and the subcolumns are the timestamps of each metric, I'm loading the metrics in pig with this line:
>>
>> all_metrics = LOAD 'cassandra://keyspace/metrics' USING CassandraStorage() AS (metric_key, metrics_bag: bag {metric: tuple(timestamp, columns: bag {record: tuple(name:chararray, value:chararray)})});
>>
>> I just want to access the timestamp subcolumn to get the most recent value (using max), I try to use:
>>
>> metric_status = FOREACH all_metrics GENERATE metric_key, metrics_bag.timestamp;
>> dump metric_status;
>>
>> but I'm getting empty values, like this:
>>
>> (key1,{(),(),()})
>> (key2,{(),()})
>> ...
>>
>>
>>
>
Re: How to access subcolumns in cassandra
Posted by Jeremy Hanna <je...@gmail.com>.
Hi Fabio,
I'm not sure if super columns are fully supported right now in CassandraStorage. Brandon (who I CCed) would know for sure. That and I thought the pig bug that made it impossible to get to nested data structures has been resolved - the ticket you commented on today I think was a duplicate of another bug that has been resolved.
What version of pig and what version of cassandra are you using?
Jeremy
On Aug 17, 2011, at 10:09 AM, Fabio Souto wrote:
> Hi,
>
> I have some metrics stored on a Cassandra supercolumn and the subcolumns are the timestamps of each metric, I'm loading the metrics in pig with this line:
>
> all_metrics = LOAD 'cassandra://keyspace/metrics' USING CassandraStorage() AS (metric_key, metrics_bag: bag {metric: tuple(timestamp, columns: bag {record: tuple(name:chararray, value:chararray)})});
>
> I just want to access the timestamp subcolumn to get the most recent value (using max), I try to use:
>
> metric_status = FOREACH all_metrics GENERATE metric_key, metrics_bag.timestamp;
> dump metric_status;
>
> but I'm getting empty values, like this:
>
> (key1,{(),(),()})
> (key2,{(),()})
> ...
>
>
>