You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Alex Ott <al...@gmail.com> on 2019/08/17 12:31:51 UTC

Re: Performance impact with ALLOW FILTERING clause.

Spark connector doesn't do the "select * from table;" - it does reads by
token ranges, reading the data
(see https://github.com/datastax/spark-cassandra-connector/blob/master/spark-cassandra-connector/src/main/scala/com/datastax/spark/connector/rdd/partitioner/CassandraPartition.scala#L14) 


Jacques-Henri Berthemet  at "Thu, 25 Jul 2019 14:18:57 +0000" wrote:
 JB> Hi Asad,

 JB> That’s because of the way Spark works. Essentially, when you execute a Spark job, it pulls the full content of the datastore (Cassandra
 JB> in your case) in it RDDs and works with it “in memory”. While Spark uses “data locality” to read data from the nodes that have the
 JB> required data on its local disks, it’s still reading all data from Cassandra tables. To do so it’s sending ‘select * from Table ALLOW
 JB> FILTERING’ query to Cassandra.

 JB> From Spark you don’t have much control on the initial query to fill the RDDs, sometimes you’ll read the whole table even if you only
 JB> need one row.

 JB> Regards,

 JB> Jacques-Henri Berthemet

 JB> From: "ZAIDI, ASAD A" <az...@att.com>
 JB> Reply to: "user@cassandra.apache.org" <us...@cassandra.apache.org>
 JB> Date: Thursday 25 July 2019 at 15:49
 JB> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
 JB> Subject: Performance impact with ALLOW FILTERING clause.

 JB> Hello Folks,

 JB> I was going thru documentation and saw at many places saying ALLOW FILTERING causes performance unpredictability.  Our developers says
 JB> ALLOW FILTERING clause is implicitly added on bunch of queries by spark-Cassandra  connector and they cannot control it; however at the
 JB> same time we see unpredictability in application performance – just as documentation says.  

 JB> I’m trying to understand why would a connector add a clause in query when this can cause negative impact on database/application
 JB> performance. Is that data model that is driving connector make its decision and add allow filtering to query automatically or if there
 JB> are other reason this clause is added to the code. I’m not a developer though I want to know why developer don’t have any control on
 JB> this to happen.

 JB> I’ll appreciate your guidance here.

 JB> Thanks

 JB> Asad



-- 
With best wishes,                    Alex Ott
Solutions Architect EMEA, DataStax
http://datastax.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org