You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Frank LoVecchio <fr...@isidorey.com> on 2010/10/19 17:31:20 UTC

Hadoop Word Count Super Column Example?

I have a Hadoop installation working with a cluster of 0.7 Beta 2 Nodes, and
got the WordCount example to work using the standard configuration.  I have
been inserting data into a Super Column (Sensor) with TimeUUID as the
compare type, it looks like this:

get Sensor['DeviceID:Sensor']
=> (super_column=795a4da0-d8ac-11df-9a2c-12313d06187c,
     (column=sub_sensor1, value=39.742538, timestamp=1287182112633000)
     (column=sub_sensor2, value=-104.912474, timestamp=1287182112633000)
     (column=mac_address, value=DEADBEEFFEED, timestamp=1287182112633000))

Is there a Word Count example for super columns?  I am trying to count the
number of occurrences of "DEADBEEFFEED", much like "word1" in the column
example.

Thanks,

Frank LoVecchio
Software Engineer, Isidorey LLC
isidorey.com

franklovecchio.com
rodsandricers.com

Re: Hadoop Word Count Super Column Example?

Posted by Aaron Morton <aa...@thelastpickle.com>.

We're the IColumn objects passed to the map function o.a.c.db.SuperColumn instances ? 

A


On 21 Oct, 2010,at 02:48 AM, Jeremy Hanna <je...@gmail.com> wrote:

> Have your tried it ?


yes, with a modified word count example a month or so ago.

On Oct 20, 2010, at 3:27 AM, aaron morton wrote:

> My understanding of the Hadoop integration is not great but from what I can see. The code in o.a.c.hadoop.ColumnFamilyRecordReader does not use a super_column in the ColumnParent struct when making the get_range_slices() call. It's just using the ColumnFamily. 
> 
> So I would guess it would include super columns if they were present. And that the IColumns passed to your map function will be instances of o.a.c.db.SuperColumn. 
> 
> Have your tried it ?
> 
> Aaron
> 
> 
> 
> On 20 Oct 2010, at 04:44, Jeremy Hanna wrote:
> 
>> It's relatively straightforward, the current mapper gets a map of column names to IColumns. The SuperColumn implements the IColumn interface. So you would probably need both the super column name and the subcolumn name to get at it, but you just need to cast the IColumn to a super column and handle it from there.
>> 
>> On Oct 19, 2010, at 10:31 AM, Frank LoVecchio wrote:
>> 
>>> I have a Hadoop installation working with a cluster of 0.7 Beta 2 Nodes, and got the WordCount example to work using the standard configuration. I have been inserting data into a Super Column (Sensor) with TimeUUID as the compare type, it looks like this:
>>> 
>>> get Sensor['DeviceID:Sensor']
>>> => (super_column=795a4da0-d8ac-11df-9a2c-12313d06187c,
>>> (column=sub_sensor1, value=39.742538, timestamp=1287182112633000) 
>>> (column=sub_sensor2, value=-104.912474, timestamp=1287182112633000) 
>>> (column=mac_address, value=DEADBEEFFEED, timestamp=1287182112633000)) 
>>> 
>>> Is there a Word Count example for super columns? I am trying to count the number of occurrences of "DEADBEEFFEED", much like "word1" in the column example. 
>>> 
>>> Thanks,
>>> 
>>> Frank LoVecchio
>>> Software Engineer, Isidorey LLC
>>> isidorey.com
>>> 
>>> franklovecchio.com
>>> rodsandricers.com
>> 
>

Re: Hadoop Word Count Super Column Example?

Posted by Jeremy Hanna <je...@gmail.com>.

> Have your tried it ?


yes, with a modified word count example a month or so ago.

On Oct 20, 2010, at 3:27 AM, aaron morton wrote:

> My understanding of the Hadoop integration is not great but from what I can see. The code in o.a.c.hadoop.ColumnFamilyRecordReader does not use a super_column in  the ColumnParent struct when making the get_range_slices() call. It's just using the ColumnFamily. 
> 
> So I would guess it would include super columns if they were present. And that the IColumns passed to your map function will be instances of o.a.c.db.SuperColumn. 
> 
> Have your tried it ?
> 
> Aaron
> 
> 
> 
> On 20 Oct 2010, at 04:44, Jeremy Hanna wrote:
> 
>> It's relatively straightforward, the current mapper gets a map of column names to IColumns.  The SuperColumn implements the IColumn interface.  So you would probably need both the super column name and the subcolumn name to get at it, but you just need to cast the IColumn to a super column and handle it from there.
>> 
>> On Oct 19, 2010, at 10:31 AM, Frank LoVecchio wrote:
>> 
>>> I have a Hadoop installation working with a cluster of 0.7 Beta 2 Nodes, and got the WordCount example to work using the standard configuration.  I have been inserting data into a Super Column (Sensor) with TimeUUID as the compare type, it looks like this:
>>> 
>>> get Sensor['DeviceID:Sensor']
>>> => (super_column=795a4da0-d8ac-11df-9a2c-12313d06187c,
>>>    (column=sub_sensor1, value=39.742538, timestamp=1287182112633000) 
>>>    (column=sub_sensor2, value=-104.912474, timestamp=1287182112633000) 
>>>    (column=mac_address, value=DEADBEEFFEED, timestamp=1287182112633000)) 
>>> 
>>> Is there a Word Count example for super columns?  I am trying to count the number of occurrences of "DEADBEEFFEED", much like "word1" in the column example.  
>>> 
>>> Thanks,
>>> 
>>> Frank LoVecchio
>>> Software Engineer, Isidorey LLC
>>> isidorey.com
>>> 
>>> franklovecchio.com
>>> rodsandricers.com
>> 
>

Re: Hadoop Word Count Super Column Example?

Posted by aaron morton <aa...@thelastpickle.com>.

My understanding of the Hadoop integration is not great but from what I can see. The code in o.a.c.hadoop.ColumnFamilyRecordReader does not use a super_column in  the ColumnParent struct when making the get_range_slices() call. It's just using the ColumnFamily. 

So I would guess it would include super columns if they were present. And that the IColumns passed to your map function will be instances of o.a.c.db.SuperColumn. 

Have your tried it ?

Aaron



On 20 Oct 2010, at 04:44, Jeremy Hanna wrote:

> It's relatively straightforward, the current mapper gets a map of column names to IColumns.  The SuperColumn implements the IColumn interface.  So you would probably need both the super column name and the subcolumn name to get at it, but you just need to cast the IColumn to a super column and handle it from there.
> 
> On Oct 19, 2010, at 10:31 AM, Frank LoVecchio wrote:
> 
>> I have a Hadoop installation working with a cluster of 0.7 Beta 2 Nodes, and got the WordCount example to work using the standard configuration.  I have been inserting data into a Super Column (Sensor) with TimeUUID as the compare type, it looks like this:
>> 
>> get Sensor['DeviceID:Sensor']
>> => (super_column=795a4da0-d8ac-11df-9a2c-12313d06187c,
>>     (column=sub_sensor1, value=39.742538, timestamp=1287182112633000) 
>>     (column=sub_sensor2, value=-104.912474, timestamp=1287182112633000) 
>>     (column=mac_address, value=DEADBEEFFEED, timestamp=1287182112633000)) 
>> 
>> Is there a Word Count example for super columns?  I am trying to count the number of occurrences of "DEADBEEFFEED", much like "word1" in the column example.  
>> 
>> Thanks,
>> 
>> Frank LoVecchio
>> Software Engineer, Isidorey LLC
>> isidorey.com
>> 
>> franklovecchio.com
>> rodsandricers.com
>

Re: Hadoop Word Count Super Column Example?

Posted by Jeremy Hanna <je...@gmail.com>.

It's relatively straightforward, the current mapper gets a map of column names to IColumns.  The SuperColumn implements the IColumn interface.  So you would probably need both the super column name and the subcolumn name to get at it, but you just need to cast the IColumn to a super column and handle it from there.

On Oct 19, 2010, at 10:31 AM, Frank LoVecchio wrote:

> I have a Hadoop installation working with a cluster of 0.7 Beta 2 Nodes, and got the WordCount example to work using the standard configuration.  I have been inserting data into a Super Column (Sensor) with TimeUUID as the compare type, it looks like this:
> 
> get Sensor['DeviceID:Sensor']
> => (super_column=795a4da0-d8ac-11df-9a2c-12313d06187c,
>      (column=sub_sensor1, value=39.742538, timestamp=1287182112633000) 
>      (column=sub_sensor2, value=-104.912474, timestamp=1287182112633000) 
>      (column=mac_address, value=DEADBEEFFEED, timestamp=1287182112633000)) 
> 
> Is there a Word Count example for super columns?  I am trying to count the number of occurrences of "DEADBEEFFEED", much like "word1" in the column example.  
> 
> Thanks,
> 
> Frank LoVecchio
> Software Engineer, Isidorey LLC
> isidorey.com
> 
> franklovecchio.com
> rodsandricers.com