You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Michael Armbrust <mi...@databricks.com> on 2015/03/02 18:30:30 UTC

Re: Is SparkSQL optimizer aware of the needed data after the query?

-dev +user

No, lambda functions and other code are black-boxes to Spark SQL.  If you
want those kinds of optimizations you need to express the columns required
in either SQL or the DataFrame DSL (coming in 1.3).

On Mon, Mar 2, 2015 at 1:55 AM, Wail <w....@cces-kacst-mit.org>
wrote:

> Dears,
>
> I'm just curious about the complexity of the query optimizer. Can the
> optimizer evaluates what after the SQL? maybe it's a stupid question ,, but
> here is an example to show the case:
>
> From the Spark SQL example:
> val teenagers = sqlContext.sql("SELECT * FROM people WHERE age >= 13 AND
> age
> <= 19")
>
> if(condition)
> {
>     teenagers.map(t => "Name: " + t(0)).collect().foreach(println)
> }
> else
> {
>     teenagers.map(t => "Age: " + t(1)).collect().foreach(println)
> }
>
> As for instance ... is the optimizer aware that I need only one column and
> pushes down the projection to bring only one  as needed?
>
> Thanks!
>
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Is-SparkSQL-optimizer-aware-of-the-needed-data-after-the-query-tp10835.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>