You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Maxim Potekhin <po...@bnl.gov> on 2011/11/01 19:18:00 UTC

Re: Tool for SQL -> Cassandra data movement

Just a short comment -- we are going the CSV way as well because of its 
compactness and extreme portability.
The CSV files are kept in the cloud as backup. They can also find other 
uses. JSON would work as well, but
it would be at least twice as large in size.

Maxim

On 9/22/2011 1:25 PM, Nehal Mehta wrote:
> We are trying to carry out same stuff, but instead of migrating into 
> JSON, we are exporting into CSV and than importing CSV into 
> Cassandra.  Which DB are you currently using?
>
> Thanks,
> Nehal Mehta.
>
> 2011/9/22 Radim Kolar <hsn@sendmail.cz <ma...@sendmail.cz>>
>
>     I need tool which is able to dump tables via JDBC into JSON format
>     for cassandra import. I am pretty sure that somebody already wrote
>     that.
>
>     Are there tools which can do direct JDBC -> cassandra import?
>
>


Re: Tool for SQL -> Cassandra data movement

Posted by Brian O'Neill <bo...@alumni.brown.edu>.
COTs/Open-Source ETL tools exist to do this.   (Talend, Pentaho, CloverETL,
etc.)
With those, you should be able to do this without writing any code.

All of the tools can read from a SQL database.  Then you just need to push
the data into Cassandra.   Many of the ETL tools support web services, which
is why I suggested a REST layer for Cassandra might be handy.  Using the ETL
tool, you could push the data into Cassandra as JSON over REST.  (If you
want, give Virgil <http://code.google.com/a/apache-extras.org/p/virgil/>  a
try)  

I haven't tried, but you might also be able to coax the ETL tools to use
CQL.  

Some of the ETL tools are Map/Reduce friendly (more or less) and can
distribute the job over a cluster.  But if you have a lot of data, you may
also want to look at Pig and/or Map/Reduce directly.   If you stage the
CSV/JSON file on HDFS, then a simple Map/Reduce job can load the data
directly into Cassandra. (using a ColumnFamilyOutput format)

We are solving this problem right now, so I'll report back.

-brian

---- 
Brian O'Neill
Lead Architect, Software Development
Health Market Science | 2700 Horizon Drive | King of Prussia, PA 19406
p: 215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/



From:  Maxim Potekhin <po...@bnl.gov>
Organization:  Brookhaven National Laboratory
Reply-To:  <us...@cassandra.apache.org>
Date:  Tue, 01 Nov 2011 14:18:00 -0400
To:  <us...@cassandra.apache.org>
Subject:  Re: Tool for SQL -> Cassandra data movement

    
 Just a short comment -- we are going the CSV way as well because of its
compactness and extreme portability.
 The CSV files are kept in the cloud as backup. They can also find other
uses. JSON would work as well, but
 it would be at least twice as large in size.
 
 Maxim
 
 On 9/22/2011 1:25 PM, Nehal Mehta wrote:
> We are trying to carry out same stuff, but instead of migrating into JSON, we
> are exporting into CSV and than importing CSV into Cassandra.  Which DB are
> you currently using?
>  
>  Thanks,
>  Nehal Mehta. 
>  
>  
>  2011/9/22 Radim Kolar <hs...@sendmail.cz>
>  
>> I need tool which is able to dump tables via JDBC into JSON format for
>> cassandra import. I am pretty sure that somebody already wrote that.
>>  
>>  Are there tools which can do direct JDBC -> cassandra import?
>>  
>  
>  
>