You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "Lavelle, Shawn" <Sh...@osii.com> on 2017/05/01 22:00:37 UTC

RE: Spark-SQL Query Optimization: overlapping ranges

Jacek,

   Thanks for your help.  I didn’t want to write a bug/enhancement unless warranted.

~ Shawn

From: Jacek Laskowski [mailto:jacek@japila.pl]
Sent: Thursday, April 27, 2017 8:39 AM
To: Lavelle, Shawn <Sh...@osii.com>
Cc: user <us...@spark.apache.org>
Subject: Re: Spark-SQL Query Optimization: overlapping ranges

Hi Shawn,

If you're asking me if Spark SQL should optimize such queries, I don't know.

If you're asking me if it's possible to convince Spark SQL to do so, I'd say, sure, it is. Write your optimization rule and attach it to Optimizer (using extraOptimizations extension point).


Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski

On Thu, Apr 27, 2017 at 3:22 PM, Lavelle, Shawn <Sh...@osii.com>> wrote:
Hi Jacek,

     I know that it is not currently doing so, but should it be?  The algorithm isn’t complicated and could be applied to both OR and AND logical operators with comparison operators as children.
     My users write programs to generate queries that aren’t checked for this sort of thing. We’re probably going to write our own org.apache.spark.sql.catalyst.rules.Rule to handle it.

~ Shawn

From: Jacek Laskowski [mailto:jacek@japila.pl<ma...@japila.pl>]
Sent: Wednesday, April 26, 2017 2:55 AM
To: Lavelle, Shawn <Sh...@osii.com>>
Cc: user <us...@spark.apache.org>>
Subject: Re: Spark-SQL Query Optimization: overlapping ranges

explain it and you'll know what happens under the covers.

i.e. Use explain on the Dataset.

Jacek

On 25 Apr 2017 12:46 a.m., "Lavelle, Shawn" <Sh...@osii.com>> wrote:
Hello Spark Users!

   Does the Spark Optimization engine reduce overlapping column ranges?  If so, should it push this down to a Data Source?

  Example,
    This:  Select * from table where col between 3 and 7 OR col between 5 and 9
    Reduces to:  Select * from table where col between 3 and 9


  Thanks for your insight!

~ Shawn M Lavelle



[cid:image001.png@01D2C298.8343C580]
Shawn Lavelle
Software Development

4101 Arrowhead Drive
Medina, Minnesota 55340-9457
Phone: 763 551 0559<tel:(763)%20551-0559>
Fax: 763 551 0750<tel:(763)%20551-0750>
Email: Shawn.Lavelle@osii.com<ma...@osii.com>
Website: www.osii.com<http://www.osii.com>