You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Wail <w....@cces-kacst-mit.org> on 2015/03/02 10:55:56 UTC

Is SparkSQL optimizer aware of the needed data after the query?

Dears,

I'm just curious about the complexity of the query optimizer. Can the
optimizer evaluates what after the SQL? maybe it's a stupid question ,, but
here is an example to show the case:

>From the Spark SQL example:
val teenagers = sqlContext.sql("SELECT * FROM people WHERE age >= 13 AND age
<= 19")

if(condition)
{
    teenagers.map(t => "Name: " + t(0)).collect().foreach(println)
}
else
{
    teenagers.map(t => "Age: " + t(1)).collect().foreach(println)
}

As for instance ... is the optimizer aware that I need only one column and
pushes down the projection to bring only one  as needed?

Thanks!




--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Is-SparkSQL-optimizer-aware-of-the-needed-data-after-the-query-tp10835.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: Is SparkSQL optimizer aware of the needed data after the query?

Posted by Michael Armbrust <mi...@databricks.com>.

-dev +user

No, lambda functions and other code are black-boxes to Spark SQL.  If you
want those kinds of optimizations you need to express the columns required
in either SQL or the DataFrame DSL (coming in 1.3).

On Mon, Mar 2, 2015 at 1:55 AM, Wail <w....@cces-kacst-mit.org>
wrote:

> Dears,
>
> I'm just curious about the complexity of the query optimizer. Can the
> optimizer evaluates what after the SQL? maybe it's a stupid question ,, but
> here is an example to show the case:
>
> From the Spark SQL example:
> val teenagers = sqlContext.sql("SELECT * FROM people WHERE age >= 13 AND
> age
> <= 19")
>
> if(condition)
> {
>     teenagers.map(t => "Name: " + t(0)).collect().foreach(println)
> }
> else
> {
>     teenagers.map(t => "Age: " + t(1)).collect().foreach(println)
> }
>
> As for instance ... is the optimizer aware that I need only one column and
> pushes down the projection to bring only one  as needed?
>
> Thanks!
>
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Is-SparkSQL-optimizer-aware-of-the-needed-data-after-the-query-tp10835.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>