You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by "Jasper K." <ja...@incentro.com> on 2013/04/13 09:58:07 UTC

Extracting data from SSTable files with MapReduce

Hi,

Does anyone have any experience with running a MapReduce directly against a
CF's SSTable files?

I have a use case where this seems to be an option. I want to export all
data from a CF to a flat file format for statistical analysis.

Some factors that make it (more) doable in my case:
-The Cassandra instance is not 'on-line' (no writes- no reads)
-The .db files were exported from another instance. I got them all in one
place now

The SSTable files are in the -f- format from 0.8.10.

Looking at this : http://wiki.apache.org/cassandra/ArchitectureSSTable it
should be possible to write a Hadoop RecordReader for Cassandra rowkeys.

But maybe I am not fully aware of what I am up to.

-- 

*Jasper** *

Re: Extracting data from SSTable files with MapReduce

Posted by aaron morton <aa...@thelastpickle.com>.
> I did try to upgrade to 1.2 but it did not work out. Maybe to many versions in between.
Newer versions should be able to read older file formats. What was the error?

> Why would later formats make this easier you think?
it will be easier to write against the current code base and you find it easier to get help. 

Cheers


-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 16/04/2013, at 5:37 AM, Jasper K. <ja...@incentro.com> wrote:

> Hi Aaron,
> 
> I did try to upgrade to 1.2 but it did not work out. Maybe to many versions in between.
> 
> Why would later formats make this easier you think?
> 
> Jasper
> 
> 
> 
> 2013/4/14 aaron morton <aa...@thelastpickle.com>
>> The SSTable files are in the -f- format from 0.8.10.
> 
> If you can upgrade to the latest version it will make things easier. 
> Start a node and use nodetool upgradesstables. 
> 
> The org.apache.cassandra.tools.SSTableExport class provides a blue print for reading rows from disk.
> 
> hope that helps. 
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 13/04/2013, at 7:58 PM, Jasper K. <ja...@incentro.com> wrote:
> 
>> Hi,
>> 
>> Does anyone have any experience with running a MapReduce directly against a CF's SSTable files?
>> 
>> I have a use case where this seems to be an option. I want to export all data from a CF to a flat file format for statistical analysis.
>> 
>> Some factors that make it (more) doable in my case:
>> -The Cassandra instance is not 'on-line' (no writes- no reads)
>> -The .db files were exported from another instance. I got them all in one place now
>> 
>> The SSTable files are in the -f- format from 0.8.10.
>> 
>> Looking at this : http://wiki.apache.org/cassandra/ArchitectureSSTable it should be possible to write a Hadoop RecordReader for Cassandra rowkeys.
>> 
>> But maybe I am not fully aware of what I am up to.
>> 
>> -- 
>> 
>> Jasper 
> 
> 
> 
> 
> -- 
> 


Re: Extracting data from SSTable files with MapReduce

Posted by "Jasper K." <ja...@incentro.com>.
Hi Aaron,

I did try to upgrade to 1.2 but it did not work out. Maybe to many versions
in between.

Why would later formats make this easier you think?

Jasper



2013/4/14 aaron morton <aa...@thelastpickle.com>

> The SSTable files are in the -f- format from 0.8.10.
>
> If you can upgrade to the latest version it will make things easier.
> Start a node and use nodetool upgradesstables.
>
> The org.apache.cassandra.tools.SSTableExport class provides a blue print
> for reading rows from disk.
>
> hope that helps.
>
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 13/04/2013, at 7:58 PM, Jasper K. <ja...@incentro.com> wrote:
>
> Hi,
>
> Does anyone have any experience with running a MapReduce directly against
> a CF's SSTable files?
>
> I have a use case where this seems to be an option. I want to export all
> data from a CF to a flat file format for statistical analysis.
>
> Some factors that make it (more) doable in my case:
> -The Cassandra instance is not 'on-line' (no writes- no reads)
> -The .db files were exported from another instance. I got them all in one
> place now
>
> The SSTable files are in the -f- format from 0.8.10.
>
> Looking at this : http://wiki.apache.org/cassandra/ArchitectureSSTable it
> should be possible to write a Hadoop RecordReader for Cassandra rowkeys.
>
> But maybe I am not fully aware of what I am up to.
>
> --
>
> *Jasper** *
>
>
>


--

Re: Extracting data from SSTable files with MapReduce

Posted by aaron morton <aa...@thelastpickle.com>.
> The SSTable files are in the -f- format from 0.8.10.
If you can upgrade to the latest version it will make things easier. 
Start a node and use nodetool upgradesstables. 

The org.apache.cassandra.tools.SSTableExport class provides a blue print for reading rows from disk.

hope that helps. 

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 13/04/2013, at 7:58 PM, Jasper K. <ja...@incentro.com> wrote:

> Hi,
> 
> Does anyone have any experience with running a MapReduce directly against a CF's SSTable files?
> 
> I have a use case where this seems to be an option. I want to export all data from a CF to a flat file format for statistical analysis.
> 
> Some factors that make it (more) doable in my case:
> -The Cassandra instance is not 'on-line' (no writes- no reads)
> -The .db files were exported from another instance. I got them all in one place now
> 
> The SSTable files are in the -f- format from 0.8.10.
> 
> Looking at this : http://wiki.apache.org/cassandra/ArchitectureSSTable it should be possible to write a Hadoop RecordReader for Cassandra rowkeys.
> 
> But maybe I am not fully aware of what I am up to.
> 
> -- 
> 
> Jasper