You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by ssb61 <sa...@gmail.com> on 2014/06/04 23:16:04 UTC

Re: SQLContext and HiveContext Query Performance




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SQLContext-and-HiveContext-Query-Performance-tp6948p6976.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: SQLContext and HiveContext Query Performance

Posted by Michael Armbrust <mi...@databricks.com>.

For a dataset as small as this one you could probably reduce the number of
shuffle partitions.  This will be possible once
https://github.com/apache/spark/pull/956 is merged.

On Thu, Jun 5, 2014 at 11:31 AM, ssb61 <sa...@gmail.com> wrote:

> Any inputs to reduce the time duration for mapPartitions at
> Exchange.scala:44
> from 13 s?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/SQLContext-and-HiveContext-Query-Performance-tp6948p7075.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Re: SQLContext and HiveContext Query Performance

Posted by ssb61 <sa...@gmail.com>.

Any inputs to reduce the time duration for mapPartitions at Exchange.scala:44
from 13 s?



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SQLContext-and-HiveContext-Query-Performance-tp6948p7075.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: SQLContext and HiveContext Query Performance

Posted by ssb61 <sa...@gmail.com>.

I timed the third line and here are stage timings,

collect at SparkPlan.scala:52                        ----- 0.5 s
mapPartitions at Exchange.scala:58               ----- 0.7 s
RangePartitioner at Exchange.Scala:62           ----- 0.7 s
RangePartitioner at Exchange.Scala:62           ----- 0.5 s
mapPartitions at Exchange.scala:44               ----- 13 s

Thanks,
Santosh



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SQLContext-and-HiveContext-Query-Performance-tp6948p6981.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: SQLContext and HiveContext Query Performance

Posted by Zongheng Yang <zo...@gmail.com>.

Hi,

Just wondering if you can try this:

val obj = sql("select manufacturer, count(*) as examcount from pft
group by manufacturer order by examcount desc")
obj.collect()
obj.queryExecution.executedPlan.executeCollect()

and time the third line alone. It could be that Spark SQL taking some
time to run the optimizer & generate physical plans that slows down
the query.

Thanks,
Zongheng

On Wed, Jun 4, 2014 at 2:16 PM, ssb61 <sa...@gmail.com> wrote:
>
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SQLContext-and-HiveContext-Query-Performance-tp6948p6976.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.