You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Oleg Ruchovets <or...@gmail.com> on 2014/09/02 20:10:12 UTC

pyspark and cassandra

Hi All ,
   Is it possible to have cassandra as input data for PySpark. I found
example for java -
http://java.dzone.com/articles/sparkcassandra-stack-perform?page=0,0 and I
am looking something similar for python.

Thanks
Oleg.

Re: pyspark and cassandra

Posted by Oleg Ruchovets <or...@gmail.com>.

Hi ,
  I try to evaluate different option of spark + cassandra and I have couple
of additional questions.
  My aim is to use cassandra only without hadoop:
  1) Is it possible to use only cassandra as input/output parameter for
PySpark?
  2) In case I'll use Spark (java,scala) is it possible to use only
cassandra - input/output without hadoop?
  3) I know there are couple of strategies for storage level, in case my
data set is quite big and I have no enough memory to process - can I use
DISK_ONLY option without hadoop (having only cassandra)?

Thanks
Oleg

On Wed, Sep 3, 2014 at 3:08 AM, Kan Zhang <kz...@apache.org> wrote:

> In Spark 1.1, it is possible to read from Cassandra using Hadoop jobs. See
> examples/src/main/python/cassandra_inputformat.py for an example. You may
> need to write your own key/value converters.
>
>
> On Tue, Sep 2, 2014 at 11:10 AM, Oleg Ruchovets <or...@gmail.com>
> wrote:
>
>> Hi All ,
>>    Is it possible to have cassandra as input data for PySpark. I found
>> example for java -
>> http://java.dzone.com/articles/sparkcassandra-stack-perform?page=0,0 and
>> I am looking something similar for python.
>>
>> Thanks
>> Oleg.
>>
>
>

Re: pyspark and cassandra

Posted by Kan Zhang <kz...@apache.org>.

In Spark 1.1, it is possible to read from Cassandra using Hadoop jobs. See
examples/src/main/python/cassandra_inputformat.py for an example. You may
need to write your own key/value converters.

On Tue, Sep 2, 2014 at 11:10 AM, Oleg Ruchovets <or...@gmail.com>
wrote:

> Hi All ,
>    Is it possible to have cassandra as input data for PySpark. I found
> example for java -
> http://java.dzone.com/articles/sparkcassandra-stack-perform?page=0,0 and
> I am looking something similar for python.
>
> Thanks
> Oleg.
>