You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Rishitesh Mishra <ri...@gmail.com> on 2016/02/10 05:37:31 UTC

map-side-combine in Spark SQL

Can anybody confirm, whether ANY operator in Spark SQL uses
map-side-combine ? If not, is it safe to assume SortShuffleManager will
always use Serialized sorting in case of queries from Spark SQL ?

Re: map-side-combine in Spark SQL

Posted by Reynold Xin <rx...@databricks.com>.

I'm not 100% sure I understand your question, but yes, Spark (both the RDD
API and SQL/DataFrame) does partial aggregation.

On Tue, Feb 9, 2016 at 8:37 PM, Rishitesh Mishra <ri...@gmail.com>
wrote:

> Can anybody confirm, whether ANY operator in Spark SQL uses
> map-side-combine ? If not, is it safe to assume SortShuffleManager will
> always use Serialized sorting in case of queries from Spark SQL ?
>