You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Reynold Xin (JIRA)" <ji...@apache.org> on 2015/08/19 22:44:46 UTC
[jira] [Resolved] (SPARK-10087) Disable reducer locality in 1.5

     [ https://issues.apache.org/jira/browse/SPARK-10087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Reynold Xin resolved SPARK-10087.
---------------------------------
          Resolution: Fixed
            Assignee: Yin Huai
       Fix Version/s: 1.5.0
    Target Version/s:   (was: 1.6.0)

> Disable reducer locality in 1.5
> -------------------------------
>
>                 Key: SPARK-10087
>                 URL: https://issues.apache.org/jira/browse/SPARK-10087
>             Project: Spark
>          Issue Type: Bug
>          Components: Scheduler
>    Affects Versions: 1.5.0
>            Reporter: Yin Huai
>            Assignee: Yin Huai
>            Priority: Critical
>             Fix For: 1.5.0
>
>
> In some cases, when spark.shuffle.reduceLocality.enabled is enabled, we are scheduling all reducers to the same executor (the cluster has plenty of resources). Changing spark.shuffle.reduceLocality.enabled to false resolve the problem. 
> Comments of https://github.com/apache/spark/pull/8280 provide more details of the symptom of this issue.
> The query I was using is
> {code:sql}
> select
>   i_brand_id,
>   i_brand,
>   i_manufact_id,
>   i_manufact,
>   sum(ss_ext_sales_price) ext_price
> from
>   store_sales
>   join item on (store_sales.ss_item_sk = item.i_item_sk)
>   join customer on (store_sales.ss_customer_sk = customer.c_customer_sk)
>   join customer_address on (customer.c_current_addr_sk = customer_address.ca_address_sk)
>   join store on (store_sales.ss_store_sk = store.s_store_sk)
>   join date_dim on (store_sales.ss_sold_date_sk = date_dim.d_date_sk)
> where
>   --ss_date between '1999-11-01' and '1999-11-30'
>   ss_sold_date_sk between 2451484 and 2451513
>   and d_moy = 11
>   and d_year = 1999
>   and i_manager_id = 7
>   and substr(ca_zip, 1, 5) <> substr(s_zip, 1, 5)
> group by
>   i_brand,
>   i_brand_id,
>   i_manufact_id,
>   i_manufact
> order by
>   ext_price desc,
>   i_brand,
>   i_brand_id,
>   i_manufact_id,
>   i_manufact
> limit 100
> {code}
> The dataset is tpc-ds scale factor 1500. To reproduce the problem, you can just join store_sales with customer and make sure there is only one mapper reads the data of customer.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org