You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Hao Ren <in...@gmail.com> on 2015/07/13 15:37:38 UTC

[SPARK-SQL] Window Functions optimization

Hi, 

I would like to know: Is there any optimization has been done for window
functions in Spark SQL?

For example.

select key,
max(value1) over(partition by key) as m1,
max(value2) over(partition by key) as m2,
max(value3) over(partition by key) as m3
from table

The query above creates 3 fields based on the same partition rule. 

The question is:
Will spark-sql partition the table 3 times in the same way to get the three
max values ? or just partition once if it finds the partition rule is the
same ?

It would be nice if someone could point out some lines of code on it.

Thank you.
Hao



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SPARK-SQL-Window-Functions-optimization-tp23796.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: [SPARK-SQL] Window Functions optimization

Posted by Yin Huai <yh...@databricks.com>.

Your query will be partitioned once. Then, a single Window operator will
evaluate these three functions. As mentioned by Harish, you can take a look
at the plan (sql("your sql...").explain()).

On Mon, Jul 13, 2015 at 12:26 PM, Harish Butani <rh...@gmail.com>
wrote:

> Just once.
> You can see this by printing the optimized logical plan.
> You will see just one repartition operation.
>
> So do:
> val df = sql("your sql...")
> println(df.queryExecution.analyzed)
>
> On Mon, Jul 13, 2015 at 6:37 AM, Hao Ren <in...@gmail.com> wrote:
>
>> Hi,
>>
>> I would like to know: Is there any optimization has been done for window
>> functions in Spark SQL?
>>
>> For example.
>>
>> select key,
>> max(value1) over(partition by key) as m1,
>> max(value2) over(partition by key) as m2,
>> max(value3) over(partition by key) as m3
>> from table
>>
>> The query above creates 3 fields based on the same partition rule.
>>
>> The question is:
>> Will spark-sql partition the table 3 times in the same way to get the
>> three
>> max values ? or just partition once if it finds the partition rule is the
>> same ?
>>
>> It would be nice if someone could point out some lines of code on it.
>>
>> Thank you.
>> Hao
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/SPARK-SQL-Window-Functions-optimization-tp23796.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>

Re: [SPARK-SQL] Window Functions optimization

Posted by Harish Butani <rh...@gmail.com>.

Just once.
You can see this by printing the optimized logical plan.
You will see just one repartition operation.

So do:
val df = sql("your sql...")
println(df.queryExecution.analyzed)

On Mon, Jul 13, 2015 at 6:37 AM, Hao Ren <in...@gmail.com> wrote:

> Hi,
>
> I would like to know: Is there any optimization has been done for window
> functions in Spark SQL?
>
> For example.
>
> select key,
> max(value1) over(partition by key) as m1,
> max(value2) over(partition by key) as m2,
> max(value3) over(partition by key) as m3
> from table
>
> The query above creates 3 fields based on the same partition rule.
>
> The question is:
> Will spark-sql partition the table 3 times in the same way to get the three
> max values ? or just partition once if it finds the partition rule is the
> same ?
>
> It would be nice if someone could point out some lines of code on it.
>
> Thank you.
> Hao
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/SPARK-SQL-Window-Functions-optimization-tp23796.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>