You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Maciej Bryński <ma...@brynski.pl> on 2017/07/24 16:11:03 UTC

Speeding up Catalyst engine

Hi Everyone,
I'm trying to speed up my Spark streaming application and I have following
problem.
I'm using a lot of joins in my app and full catalyst analysis is triggered
during every join.

I found 2 options to speed up.

1) spark.sql.selfJoinAutoResolveAmbiguity  option
But looking at code:
https://github.com/apache/spark/blob/8cd9cdf17a7a4ad6f2eecd7c4b388ca363c20982/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L918

Shouldn't lines 925-927 be before 920-922 ?

2) https://issues.apache.org/jira/browse/SPARK-20392

Is it safe to use it on top of 2.2.0 ?

Regards,
-- 
Maciek Bryński

Re: Speeding up Catalyst engine

Posted by Maciej Bryński <ma...@brynski.pl>.
Hi,

I did backport this to 2.2.
First results of tests (join of about 60 tables).
Vanilla Spark: 50 sec
With 20392 - 38 sec
With 20392 and spark.sql.selfJoinAutoResolveAmbiguity=false - 29 sec
Vanilla Spark with spark.sql.selfJoinAutoResolveAmbiguity=false - 34 sec

I didn't measure any difference
changing spark.sql.constraintPropagation.enabled and any other spark.sql
option.

So I will leave your patch on top of 2.2
Thank you.

M.

2017-07-25 1:39 GMT+02:00 Liang-Chi Hsieh <vi...@gmail.com>:

>
> Hi Maciej,
>
> For backportting https://issues.apache.org/jira/browse/SPARK-20392, you
> can
> see the suggestion from committers on the PR. I think we don't expect it
> will be merged into 2.2.
>
>
>
> Maciej Bryński wrote
> > Hi Everyone,
> > I'm trying to speed up my Spark streaming application and I have
> following
> > problem.
> > I'm using a lot of joins in my app and full catalyst analysis is
> triggered
> > during every join.
> >
> > I found 2 options to speed up.
> >
> > 1) spark.sql.selfJoinAutoResolveAmbiguity  option
> > But looking at code:
> > https://github.com/apache/spark/blob/8cd9cdf17a7a4ad6f2eecd7c4b388c
> a363c20982/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L918
> >
> > Shouldn't lines 925-927 be before 920-922 ?
> >
> > 2) https://issues.apache.org/jira/browse/SPARK-20392
> >
> > Is it safe to use it on top of 2.2.0 ?
> >
> > Regards,
> > --
> > Maciek Bryński
>
>
>
>
>
> -----
> Liang-Chi Hsieh | @viirya
> Spark Technology Center
> http://www.spark.tc/
> --
> View this message in context: http://apache-spark-
> developers-list.1001551.n3.nabble.com/Speeding-up-
> Catalyst-engine-tp22013p22014.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>


-- 
Maciek Bryński

Re: Speeding up Catalyst engine

Posted by Liang-Chi Hsieh <vi...@gmail.com>.
Hi Maciej,

For backportting https://issues.apache.org/jira/browse/SPARK-20392, you can
see the suggestion from committers on the PR. I think we don't expect it
will be merged into 2.2.



Maciej Bryński wrote
> Hi Everyone,
> I'm trying to speed up my Spark streaming application and I have following
> problem.
> I'm using a lot of joins in my app and full catalyst analysis is triggered
> during every join.
> 
> I found 2 options to speed up.
> 
> 1) spark.sql.selfJoinAutoResolveAmbiguity  option
> But looking at code:
> https://github.com/apache/spark/blob/8cd9cdf17a7a4ad6f2eecd7c4b388ca363c20982/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L918
> 
> Shouldn't lines 925-927 be before 920-922 ?
> 
> 2) https://issues.apache.org/jira/browse/SPARK-20392
> 
> Is it safe to use it on top of 2.2.0 ?
> 
> Regards,
> -- 
> Maciek Bryński





-----
Liang-Chi Hsieh | @viirya 
Spark Technology Center 
http://www.spark.tc/ 
--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Speeding-up-Catalyst-engine-tp22013p22014.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org