You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by "Marcelo Valle (BLOOMBERG/ LONDON)" <mv...@bloomberg.net> on 2015/02/11 14:25:17 UTC

best supported spark connector for Cassandra

Taking the opportunity Spark was being discussed in another thread, I decided to start a new one as I have interest in using Spark + Cassandra in the feature.

About 3 years ago, Spark was not an existing option and we tried to use hadoop to process Cassandra data. My experience was horrible and we reached the conclusion it was faster to develop an internal tool than insist on Hadoop _for our specific case_. 

How I can see Spark is starting to be known as a "better hadoop" and it seems market is going this way now. I can also see I have many more options to decide how to integrate Cassandra using the Spark RDD concept than using the ColumnFamilyInputFormat. 

I have found this java driver made by Datastax: https://github.com/datastax/spark-cassandra-connector

I also have found python Cassandra support on spark's repo, but it seems experimental yet: https://github.com/apache/spark/tree/master/examples/src/main/python

Finally I have found stratio deep: https://github.com/Stratio/deep-spark
It seems Stratio guys have forked Cassandra also, I am still a little confused about it.

Question: which driver should I use, if I want to use Java? And which if I want to use python? 
I think the way Spark can integrate to Cassandra makes all the difference in the world, from my past experience, so I would like to know more about it, but I don't even know which source code I should start looking...
I would like to integrate using python and or C++, but I wonder if it doesn't pay the way to use the java driver instead.

Thanks in advance



Re: best supported spark connector for Cassandra

Posted by DuyHai Doan <do...@gmail.com>.
Start looking at the Spark/Cassandra connector here (in Scala):
https://github.com/datastax/spark-cassandra-connector/tree/master/spark-cassandra-connector/src/main/scala/com/datastax/spark/connector

Data locality is provided by this method:
https://github.com/datastax/spark-cassandra-connector/blob/master/spark-cassandra-connector/src/main/scala/com/datastax/spark/connector/rdd/CassandraRDD.scala#L329-L336

Start digging from this all the way down the code.

As for Stratio Deep, I can't tell how the did the integration with Spark.
Take some time to dig down their code to understand the logic.



On Wed, Feb 11, 2015 at 2:25 PM, Marcelo Valle (BLOOMBERG/ LONDON) <
mvallemilita@bloomberg.net> wrote:

> Taking the opportunity Spark was being discussed in another thread, I
> decided to start a new one as I have interest in using Spark + Cassandra in
> the feature.
>
> About 3 years ago, Spark was not an existing option and we tried to use
> hadoop to process Cassandra data. My experience was horrible and we reached
> the conclusion it was faster to develop an internal tool than insist on
> Hadoop _for our specific case_.
>
> How I can see Spark is starting to be known as a "better hadoop" and it
> seems market is going this way now. I can also see I have many more options
> to decide how to integrate Cassandra using the Spark RDD concept than using
> the ColumnFamilyInputFormat.
>
> I have found this java driver made by Datastax:
> https://github.com/datastax/spark-cassandra-connector
>
> I also have found python Cassandra support on spark's repo, but it seems
> experimental yet:
> https://github.com/apache/spark/tree/master/examples/src/main/python
>
> Finally I have found stratio deep: https://github.com/Stratio/deep-spark
> It seems Stratio guys have forked Cassandra also, I am still a little
> confused about it.
>
> Question: which driver should I use, if I want to use Java? And which if I
> want to use python?
> I think the way Spark can integrate to Cassandra makes all the difference
> in the world, from my past experience, so I would like to know more about
> it, but I don't even know which source code I should start looking...
> I would like to integrate using python and or C++, but I wonder if it
> doesn't pay the way to use the java driver instead.
>
> Thanks in advance
>
>
>
>