You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Alex Ott <al...@gmail.com> on 2019/08/17 12:31:51 UTC
Re: Performance impact with ALLOW FILTERING clause.
Spark connector doesn't do the "select * from table;" - it does reads by
token ranges, reading the data
(see https://github.com/datastax/spark-cassandra-connector/blob/master/spark-cassandra-connector/src/main/scala/com/datastax/spark/connector/rdd/partitioner/CassandraPartition.scala#L14)
Jacques-Henri Berthemet at "Thu, 25 Jul 2019 14:18:57 +0000" wrote:
JB> Hi Asad,
JB> That’s because of the way Spark works. Essentially, when you execute a Spark job, it pulls the full content of the datastore (Cassandra
JB> in your case) in it RDDs and works with it “in memory”. While Spark uses “data locality” to read data from the nodes that have the
JB> required data on its local disks, it’s still reading all data from Cassandra tables. To do so it’s sending ‘select * from Table ALLOW
JB> FILTERING’ query to Cassandra.
JB> From Spark you don’t have much control on the initial query to fill the RDDs, sometimes you’ll read the whole table even if you only
JB> need one row.
JB> Regards,
JB> Jacques-Henri Berthemet
JB> From: "ZAIDI, ASAD A" <az...@att.com>
JB> Reply to: "user@cassandra.apache.org" <us...@cassandra.apache.org>
JB> Date: Thursday 25 July 2019 at 15:49
JB> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
JB> Subject: Performance impact with ALLOW FILTERING clause.
JB> Hello Folks,
JB> I was going thru documentation and saw at many places saying ALLOW FILTERING causes performance unpredictability. Our developers says
JB> ALLOW FILTERING clause is implicitly added on bunch of queries by spark-Cassandra connector and they cannot control it; however at the
JB> same time we see unpredictability in application performance – just as documentation says.
JB> I’m trying to understand why would a connector add a clause in query when this can cause negative impact on database/application
JB> performance. Is that data model that is driving connector make its decision and add allow filtering to query automatically or if there
JB> are other reason this clause is added to the code. I’m not a developer though I want to know why developer don’t have any control on
JB> this to happen.
JB> I’ll appreciate your guidance here.
JB> Thanks
JB> Asad
--
With best wishes, Alex Ott
Solutions Architect EMEA, DataStax
http://datastax.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org