You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Krassimir Kostov <kk...@hotmail.com> on 2012/01/27 10:44:12 UTC

SQL DB Integration

Hello! 

I am working on a project, for which I have to evaluate and recommend the implementation of a new database system, with the following major characteristics: 

* Operational scalability 
* Low cost 
* Ability to serve both as a data storage facility and an advanced data manipulation tool 
* Speed of execution 
* Real-time writing capability, with potential to record millions of client data records in real time 
* Flexibility: ability to support all client data types and formats, structured and unstructured 
* Capability to support multiple data centers and geographies 
* Ability to provide data infrastructure solutions for clients with small and Big Data needs 
* Full and flawless integration with the following 3 infrastructures: 

  (1) A data mining application (IBM SPSS Modeler) that imports/exports data from/to an SQL database 
  (2) A partner platform, based on an Oracle Database (CSV data import/export) 
  (3) Various client SQL databases, whose data elements will be uploaded and replicated in the recommended database system 

As a result to my research, I am planning to recommend the implementation of Apache Cassandra NoSQL DB, hosted on Amazon Elastic Compute Cloud (Amazon EC2). I realize that the biggest challenge from the above 3 points is probably the last one, since for each client we need to custom-build and replicate their database, changing the data model from SQL to NoSQL. The reason being that (1) and (2) relate only to transferring data up and down between SQL and NoSQL environments.

My question is how easy/difficult is it to build a GUI/API that will be able to do the integration in the above 3 points with respect to transferring data (upstream / downstream) between the Cassandra NoSQL NoSQL environments? Do you have any other comments or suggestions that I should consider? 

Thanks a lot for your involvement and have a great day! 

Sincerely, 

Krassimir Kostov 		 	   		  

RE: SQL DB Integration

Posted by Krassimir Kostov <kk...@hotmail.com>.
Hi Viktor,
 
Thanks for the comments.  True, the characteristics that I outlined were general, just to give a background/context to the problem I’m trying to solve.  Will address more specific questions when it comes to designing and implementing the data storage solution and the API to do the integration of (1) – (3) above.
 
Given that our data mining application (IBM SPSS Modeler), our partner platform (Oracle DB data model), used for additional services and our clients’ DBs are all based on SQL, from your experience:
 
(1) Is it a good idea to use Cassandra as a storage solution for SQL data, converted to the NoSQL data model just to be stored on Cassandra?
(2) Do you know of any similar cases of using Cassandra as a storage, supporting SQL data applications, or perhaps data model architecture differences and high development costs make no sense for this?
(3) If using Cassandra as a storage, supporting SQL data applications is not a good idea, do you recommend an alternative SQL cloud DB solution that has good scalability? 
 
Thanks and regards,
 
Krassimir Kostov 		 	   		  

RE: SQL DB Integration

Posted by Viktor Jevdokimov <Vi...@adform.com>.
Hello Krassimir,

>From a typical programmer you should receive an answer, that this is possible, but easy or difficult - depends.
>From a typical consultand you should receive a question - why?

> I am working on a project, for which I have to evaluate and recommend the implementation of a new database system, with the following major characteristics:

> * Operational scalability
Not advisable to do this automatically, do it manually.

> * Low cost
Compared to what?

> * Ability to serve both as a data storage facility and an advanced data manipulation tool
Cassandra is not a data manipulaiton tool.

> * Speed of execution
Execution of what?

> * Real-time writing capability, with potential to record millions of client data records in real time
Millions per second/minute/hour/day? Isn't any DB capable of this?

> * Flexibility: ability to support all client data types and formats, structured and unstructured
Data types supported are limited, others as binary arrays.

> * Capability to support multiple data centers and geographies
Capable.

> * Ability to provide data infrastructure solutions for clients with small and Big Data needs
Same soluiton for all? Will it be cost/performance/maintenance/support effective for all?

> * Full and flawless integration with the following 3 infrastructures:
>   (1) A data mining application (IBM SPSS Modeler) that imports/exports data from/to an SQL database
>   (2) A partner platform, based on an Oracle Database (CSV data import/export)
>   (3) Various client SQL databases, whose data elements will be uploaded and replicated in the recommended database system
Cassandra (almost any storage) does not provide any integration. Integration is built upon storage APIs.

> As a result to my research, I am planning to recommend the implementation of Apache Cassandra NoSQL DB, hosted on Amazon Elastic Compute Cloud (Amazon EC2). I realize that the biggest challenge from the above 3 points is probably the last one, since for each client we need to custom-build and replicate their database, changing the data model from SQL to NoSQL. The reason being that (1) and (2) relate only to transferring data up and down between SQL and NoSQL environments.

> My question is how easy/difficult is it to build a GUI/API that will be able to do the integration in the above 3 points with respect to transferring data (upstream / downstream) between the Cassandra NoSQL NoSQL environments? Do you have any other comments or suggestions that I should consider?
In my opinion you should do your research for Cassandra on specific questions, not global. First, define storage requirements from application/functionality perspective, then look for a solution.



Best regards/ Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: Viktor.Jevdokimov@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-01112 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.