You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by mwiewiorski <mw...@opera.com> on 2015/04/09 13:54:26 UTC

Spark Cassandra Connector for Python

Hi,

At https://github.com/datastax/spark-cassandra-connector I see that you 
are extending API that Spark provides for interacting with RDDs to 
leverage some native Cassandra features. We are using Apache Cassandra 
together with PySpark to do some analytics and since we have community 
version, we use classic api calls like sc.newAPIHadoopRDD which means 
writing converters for data in Scala. We would like to use calls such as 
sc.cassandraTable but I don't see these methods anywhere in PySpark and 
https://github.com/datastax/spark-cassandra-connector does not even 
mention access from Python.

In 
http://www.datastax.com/documentation/datastax_enterprise/4.7/datastax_enterprise/spark/sparkPySpark.html 
I see however that you are using these methods in PySpark. Does it mean 
Spark Cassandra Connector for Python is available only in DataStax 
Enterprise and we have to buy it to use that API and features like 
server-side filtering from PySpark?

Also at 
https://github.com/Parsely/pyspark-cassandra/blob/master/src/main/python/pyspark_cassandra.py 
I see that there is some effort to interface CassandraSparkContext to 
Python, does it mean that those guys are duplicating your work?

Regards,
Marek WiewiĆ³rski
Opera Software