You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by pouryas <po...@adbrain.com> on 2014/09/24 16:10:43 UTC

Spark Cassandra Connector Issue and performance

Hey all

I tried spark connector with Cassandra and I ran into a problem that I was
blocked on for couple of weeks. I managed to find a solution to the problem
but I am not sure whether it was a bug of the connector/spark or not.

I had three tables in Cassandra (Running Cassandra on 5 node cluster) and a
large Spark cluster (5 worker node with each having 32 cores and 240G
Memory).

When I ran my job which extracts data from S3 and writes to 3 tables in
Cassandra using around 1TB of memory and 160 cores, sometimes my job get
stuck at last few task of a stage...

After playing around for a while I realised that reducing number of cores to
2 per machine (10 Total) made the job stable. I gradually increased the
number of cores and it hanged again once I had about 50 cores total.

I would like to know if anyone else experienced this and if this is
explainable?

On another note I would like to know if people seeing good performance
reading from cassandra using spark as oppose to reading data from HDFS. Kind
of an open question but would like to see how others are using it.

--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Cassandra-Connector-Issue-and-performance-tp15005.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org