You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by simon wang <xw...@yahoo.com.INVALID> on 2015/07/28 09:47:45 UTC

Spark-Cassandra connector DataFrame

Hi,
I would like to get the recommendations to use Spark-Cassandra connector DataFrame feature.
I was trying to save a Dataframe containing 8 Million rows to Cassandra through the Spark-Cassandra connector. Based on the Spark log, this single action took about 60 minutes to complete. I think it was a very slow process.
Are there some configurations I need to check when using this Spark-Cassandra connector DataFrame feature?
>From the Spark log, I can see saving the Dataframe to Cassandra was performed by 200 small steps. Cassandra database was connected and disconnected 4 times during the 60 minutes. This number matches the number of nodes the Cassandra cluster has.
I understand this feature is Dataframe Experimental, and I am new to both Spark and Cassandra. Any suggestions are much appreciated.
Thanks,
Simon Wang