You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Radim Kolar <hs...@sendmail.cz> on 2011/09/22 11:05:31 UTC

Tool for SQL -> Cassandra data movement

I need tool which is able to dump tables via JDBC into JSON format for 
cassandra import. I am pretty sure that somebody already wrote that.

Are there tools which can do direct JDBC -> cassandra import?

Re: Tool for SQL -> Cassandra data movement

Posted by Brian O'Neill <bo...@alumni.brown.edu>.
COTs/Open-Source ETL tools exist to do this.   (Talend, Pentaho, CloverETL,
etc.)
With those, you should be able to do this without writing any code.

All of the tools can read from a SQL database.  Then you just need to push
the data into Cassandra.   Many of the ETL tools support web services, which
is why I suggested a REST layer for Cassandra might be handy.  Using the ETL
tool, you could push the data into Cassandra as JSON over REST.  (If you
want, give Virgil <http://code.google.com/a/apache-extras.org/p/virgil/>  a
try)  

I haven't tried, but you might also be able to coax the ETL tools to use
CQL.  

Some of the ETL tools are Map/Reduce friendly (more or less) and can
distribute the job over a cluster.  But if you have a lot of data, you may
also want to look at Pig and/or Map/Reduce directly.   If you stage the
CSV/JSON file on HDFS, then a simple Map/Reduce job can load the data
directly into Cassandra. (using a ColumnFamilyOutput format)

We are solving this problem right now, so I'll report back.

-brian

---- 
Brian O'Neill
Lead Architect, Software Development
Health Market Science | 2700 Horizon Drive | King of Prussia, PA 19406
p: 215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/



From:  Maxim Potekhin <po...@bnl.gov>
Organization:  Brookhaven National Laboratory
Reply-To:  <us...@cassandra.apache.org>
Date:  Tue, 01 Nov 2011 14:18:00 -0400
To:  <us...@cassandra.apache.org>
Subject:  Re: Tool for SQL -> Cassandra data movement

    
 Just a short comment -- we are going the CSV way as well because of its
compactness and extreme portability.
 The CSV files are kept in the cloud as backup. They can also find other
uses. JSON would work as well, but
 it would be at least twice as large in size.
 
 Maxim
 
 On 9/22/2011 1:25 PM, Nehal Mehta wrote:
> We are trying to carry out same stuff, but instead of migrating into JSON, we
> are exporting into CSV and than importing CSV into Cassandra.  Which DB are
> you currently using?
>  
>  Thanks,
>  Nehal Mehta. 
>  
>  
>  2011/9/22 Radim Kolar <hs...@sendmail.cz>
>  
>> I need tool which is able to dump tables via JDBC into JSON format for
>> cassandra import. I am pretty sure that somebody already wrote that.
>>  
>>  Are there tools which can do direct JDBC -> cassandra import?
>>  
>  
>  
>  
 
 



Re: Tool for SQL -> Cassandra data movement

Posted by Maxim Potekhin <po...@bnl.gov>.
Just a short comment -- we are going the CSV way as well because of its 
compactness and extreme portability.
The CSV files are kept in the cloud as backup. They can also find other 
uses. JSON would work as well, but
it would be at least twice as large in size.

Maxim

On 9/22/2011 1:25 PM, Nehal Mehta wrote:
> We are trying to carry out same stuff, but instead of migrating into 
> JSON, we are exporting into CSV and than importing CSV into 
> Cassandra.  Which DB are you currently using?
>
> Thanks,
> Nehal Mehta.
>
> 2011/9/22 Radim Kolar <hsn@sendmail.cz <ma...@sendmail.cz>>
>
>     I need tool which is able to dump tables via JDBC into JSON format
>     for cassandra import. I am pretty sure that somebody already wrote
>     that.
>
>     Are there tools which can do direct JDBC -> cassandra import?
>
>


Re: Tool for SQL -> Cassandra data movement

Posted by Jeremy Hanna <je...@gmail.com>.
Take a look at http://www.datastax.com/dev/blog/bulk-loading

I'm sure there is a way to make it more seamless for what you want to do and it could be built on, but the recent bulk loading additions will provide the best foundation.

On Sep 22, 2011, at 12:25 PM, Nehal Mehta wrote:

> We are trying to carry out same stuff, but instead of migrating into JSON, we are exporting into CSV and than importing CSV into Cassandra.  Which DB are you currently using? 
> 
> Thanks,
> Nehal Mehta. 
> 
> 2011/9/22 Radim Kolar <hs...@sendmail.cz>
> I need tool which is able to dump tables via JDBC into JSON format for cassandra import. I am pretty sure that somebody already wrote that.
> 
> Are there tools which can do direct JDBC -> cassandra import?
> 


Re: Tool for SQL -> Cassandra data movement

Posted by Nehal Mehta <ne...@finaner.com>.
Hi,

Instead of passing it as command line argument, I am storing all of this
configuration in config/config.xml.

My earlier version was command line, but than as arguments increased I
shifted to config.xml. Plus I thought providing all credentials at command
line is also not a good idea. Sample Config file is
https://github.com/nehalmehta/CSV2Cassandra/blob/master/config/config.xml.

I am going to add following features: Cassandra Credentials, Selected
Columns and selected primary key. I believe it is good idea to have function
calls , which can manipulate selected csv columns before inserting records.

Thanks,
Nehal Mehta.
On Tue, Sep 27, 2011 at 8:03 PM, Radim Kolar <hs...@sendmail.cz> wrote:

> > I have cleaned up my code that imports CSV into Cassandra and I have put
> it open on https://github.com/nehalmehta/**CSV2Cassandra<https://github.com/nehalmehta/CSV2Cassandra>.
> Have a look if it is useful to you.
> Hello,
>  I will remake this tool into something which is like Oracle SQL*Loader.
> Basically, you will pass controlfile as command line argument. I need
> conversion from DATE to milliseconds based date, header less CSV and better
> CSV escaping.
>
> example of control file
>
> options (rows=1000)
> LOAD DATA
>  INFILE  'c:\tmp\searches.csv'
>  BADFILE 'c:\tmp\searches.bad'
>  REPLACE
>  INTO TABLE SEARCHES2
>  FIELDS TERMINATED BY ","
>  OPTIONALLY ENCLOSED BY '"'
>  (  query,
>     day date 'YYYY-MM-DD',
>     results,
>     ip
>   )
>
> or maybe i will start project from 0
>

Re: Tool for SQL -> Cassandra data movement

Posted by Radim Kolar <hs...@sendmail.cz>.
 > I have cleaned up my code that imports CSV into Cassandra and I have 
put it open on https://github.com/nehalmehta/CSV2Cassandra. Have a look 
if it is useful to you.
Hello,
  I will remake this tool into something which is like Oracle 
SQL*Loader. Basically, you will pass controlfile as command line 
argument. I need conversion from DATE to milliseconds based date, header 
less CSV and better CSV escaping.

example of control file

options (rows=1000)
LOAD DATA
   INFILE  'c:\tmp\searches.csv'
   BADFILE 'c:\tmp\searches.bad'
   REPLACE
   INTO TABLE SEARCHES2
   FIELDS TERMINATED BY ","
   OPTIONALLY ENCLOSED BY '"'
   (  query,
      day date 'YYYY-MM-DD',
      results,
      ip
    )

or maybe i will start project from 0

Re: Tool for SQL -> Cassandra data movement

Posted by Nehal Mehta <ne...@finaner.com>.
Hi Ramdin,

I have cleaned up my code that imports CSV into Cassandra and I have put it
open on https://github.com/nehalmehta/CSV2Cassandra. Have a look if it is
useful to you.

I have used Hector instead of sstableloader. For me it was necessary to have
consistency level of EACH_QUORUM.

Thanks,
Nehal Mehta.

On Thu, Sep 22, 2011 at 11:22 PM, Radim Kolar <hs...@sendmail.cz> wrote:

> Dne 22.9.2011 19:25, Nehal Mehta napsal(a):
>
>  We are trying to carry out same stuff, but instead of migrating into JSON,
>> we are exporting into CSV and than importing CSV into Cassandra.
>>
> You are right CSV seems to be more portable
>
>
>  Which DB are you currently using?
>>
> Postgresql and Apache Derby.
>

Re: Tool for SQL -> Cassandra data movement

Posted by Radim Kolar <hs...@sendmail.cz>.
Dne 22.9.2011 19:25, Nehal Mehta napsal(a):
> We are trying to carry out same stuff, but instead of migrating into 
> JSON, we are exporting into CSV and than importing CSV into Cassandra. 
You are right CSV seems to be more portable

> Which DB are you currently using?
Postgresql and Apache Derby.

Re: Tool for SQL -> Cassandra data movement

Posted by Nehal Mehta <ne...@finaner.com>.
We are trying to carry out same stuff, but instead of migrating into JSON,
we are exporting into CSV and than importing CSV into Cassandra.  Which DB
are you currently using?

Thanks,
Nehal Mehta.

2011/9/22 Radim Kolar <hs...@sendmail.cz>

> I need tool which is able to dump tables via JDBC into JSON format for
> cassandra import. I am pretty sure that somebody already wrote that.
>
> Are there tools which can do direct JDBC -> cassandra import?
>