You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Sathish Kumaran Vairavelu <vs...@gmail.com> on 2017/03/31 02:25:32 UTC

Spark SQL 2.1 Complex SQL - Query Planning Issue

Hi Everyone,

I have complex SQL with approx 2000 lines of code and works with 50+ tables
with 50+ left joins and transformations. All the tables are fully cached in
Memory with sufficient storage memory and working memory. The issue is
after the launch of the query for the execution; the query takes
approximately 40 seconds to appear in the Jobs/SQL in the application UI.

While the execution takes only 25 seconds; the execution is delayed by 40
seconds by the scheduler so the total runtime of the query becomes 65
seconds(40s + 25s). Also, there are enough cores available during this wait
time. I couldn't figure out why DAG scheduler is delaying the execution by
40 seconds. Is this due to time taken for Query Parsing and Query Planning
for the Complex SQL? If thats the case; how do we optimize this Query
Parsing and Query Planning time in Spark? Any help would be helpful.


Thanks

Sathish

Re: Spark SQL 2.1 Complex SQL - Query Planning Issue

Posted by Sathish Kumaran Vairavelu <vs...@gmail.com>.
Please let me know if anybody has any thoughts on this issue?

On Thu, Mar 30, 2017 at 10:37 PM Sathish Kumaran Vairavelu <
vsathishkumaran@gmail.com> wrote:

> Also, is it possible to cache logical plan and parsed query so that in
> subsequent executions it can be reused. It would improve overall query
> performance particularly in streaming jobs
> On Thu, Mar 30, 2017 at 10:06 PM Sathish Kumaran Vairavelu <
> vsathishkumaran@gmail.com> wrote:
>
> Hi Ayan,
>
> I have searched Spark configuration options but couldn't find one to pin
> execution plans in memory. Can you please help?
>
>
> Thanks
>
> Sathish
>
> On Thu, Mar 30, 2017 at 9:30 PM ayan guha <gu...@gmail.com> wrote:
>
> I think there is an option of pinning execution plans in memory to avoid
> such scenarios....
>
> On Fri, Mar 31, 2017 at 1:25 PM, Sathish Kumaran Vairavelu <
> vsathishkumaran@gmail.com> wrote:
>
> Hi Everyone,
>
> I have complex SQL with approx 2000 lines of code and works with 50+
> tables with 50+ left joins and transformations. All the tables are fully
> cached in Memory with sufficient storage memory and working memory. The
> issue is after the launch of the query for the execution; the query takes
> approximately 40 seconds to appear in the Jobs/SQL in the application UI.
>
> While the execution takes only 25 seconds; the execution is delayed by 40
> seconds by the scheduler so the total runtime of the query becomes 65
> seconds(40s + 25s). Also, there are enough cores available during this wait
> time. I couldn't figure out why DAG scheduler is delaying the execution by
> 40 seconds. Is this due to time taken for Query Parsing and Query Planning
> for the Complex SQL? If thats the case; how do we optimize this Query
> Parsing and Query Planning time in Spark? Any help would be helpful.
>
>
> Thanks
>
> Sathish
>
>
>
>
> --
> Best Regards,
> Ayan Guha
>
>

Re: Spark SQL 2.1 Complex SQL - Query Planning Issue

Posted by Sathish Kumaran Vairavelu <vs...@gmail.com>.
Also, is it possible to cache logical plan and parsed query so that in
subsequent executions it can be reused. It would improve overall query
performance particularly in streaming jobs
On Thu, Mar 30, 2017 at 10:06 PM Sathish Kumaran Vairavelu <
vsathishkumaran@gmail.com> wrote:

> Hi Ayan,
>
> I have searched Spark configuration options but couldn't find one to pin
> execution plans in memory. Can you please help?
>
>
> Thanks
>
> Sathish
>
> On Thu, Mar 30, 2017 at 9:30 PM ayan guha <gu...@gmail.com> wrote:
>
> I think there is an option of pinning execution plans in memory to avoid
> such scenarios....
>
> On Fri, Mar 31, 2017 at 1:25 PM, Sathish Kumaran Vairavelu <
> vsathishkumaran@gmail.com> wrote:
>
> Hi Everyone,
>
> I have complex SQL with approx 2000 lines of code and works with 50+
> tables with 50+ left joins and transformations. All the tables are fully
> cached in Memory with sufficient storage memory and working memory. The
> issue is after the launch of the query for the execution; the query takes
> approximately 40 seconds to appear in the Jobs/SQL in the application UI.
>
> While the execution takes only 25 seconds; the execution is delayed by 40
> seconds by the scheduler so the total runtime of the query becomes 65
> seconds(40s + 25s). Also, there are enough cores available during this wait
> time. I couldn't figure out why DAG scheduler is delaying the execution by
> 40 seconds. Is this due to time taken for Query Parsing and Query Planning
> for the Complex SQL? If thats the case; how do we optimize this Query
> Parsing and Query Planning time in Spark? Any help would be helpful.
>
>
> Thanks
>
> Sathish
>
>
>
>
> --
> Best Regards,
> Ayan Guha
>
>

Re: Spark SQL 2.1 Complex SQL - Query Planning Issue

Posted by Sathish Kumaran Vairavelu <vs...@gmail.com>.
Hi Ayan,

I have searched Spark configuration options but couldn't find one to pin
execution plans in memory. Can you please help?


Thanks

Sathish

On Thu, Mar 30, 2017 at 9:30 PM ayan guha <gu...@gmail.com> wrote:

> I think there is an option of pinning execution plans in memory to avoid
> such scenarios....
>
> On Fri, Mar 31, 2017 at 1:25 PM, Sathish Kumaran Vairavelu <
> vsathishkumaran@gmail.com> wrote:
>
> Hi Everyone,
>
> I have complex SQL with approx 2000 lines of code and works with 50+
> tables with 50+ left joins and transformations. All the tables are fully
> cached in Memory with sufficient storage memory and working memory. The
> issue is after the launch of the query for the execution; the query takes
> approximately 40 seconds to appear in the Jobs/SQL in the application UI.
>
> While the execution takes only 25 seconds; the execution is delayed by 40
> seconds by the scheduler so the total runtime of the query becomes 65
> seconds(40s + 25s). Also, there are enough cores available during this wait
> time. I couldn't figure out why DAG scheduler is delaying the execution by
> 40 seconds. Is this due to time taken for Query Parsing and Query Planning
> for the Complex SQL? If thats the case; how do we optimize this Query
> Parsing and Query Planning time in Spark? Any help would be helpful.
>
>
> Thanks
>
> Sathish
>
>
>
>
> --
> Best Regards,
> Ayan Guha
>

Re: Spark SQL 2.1 Complex SQL - Query Planning Issue

Posted by ayan guha <gu...@gmail.com>.
I think there is an option of pinning execution plans in memory to avoid
such scenarios....

On Fri, Mar 31, 2017 at 1:25 PM, Sathish Kumaran Vairavelu <
vsathishkumaran@gmail.com> wrote:

> Hi Everyone,
>
> I have complex SQL with approx 2000 lines of code and works with 50+
> tables with 50+ left joins and transformations. All the tables are fully
> cached in Memory with sufficient storage memory and working memory. The
> issue is after the launch of the query for the execution; the query takes
> approximately 40 seconds to appear in the Jobs/SQL in the application UI.
>
> While the execution takes only 25 seconds; the execution is delayed by 40
> seconds by the scheduler so the total runtime of the query becomes 65
> seconds(40s + 25s). Also, there are enough cores available during this wait
> time. I couldn't figure out why DAG scheduler is delaying the execution by
> 40 seconds. Is this due to time taken for Query Parsing and Query Planning
> for the Complex SQL? If thats the case; how do we optimize this Query
> Parsing and Query Planning time in Spark? Any help would be helpful.
>
>
> Thanks
>
> Sathish
>



-- 
Best Regards,
Ayan Guha