You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Reynold Xin (JIRA)" <ji...@apache.org> on 2015/08/19 22:44:46 UTC
[jira] [Resolved] (SPARK-10087) Disable reducer locality in 1.5
[ https://issues.apache.org/jira/browse/SPARK-10087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Reynold Xin resolved SPARK-10087.
---------------------------------
Resolution: Fixed
Assignee: Yin Huai
Fix Version/s: 1.5.0
Target Version/s: (was: 1.6.0)
> Disable reducer locality in 1.5
> -------------------------------
>
> Key: SPARK-10087
> URL: https://issues.apache.org/jira/browse/SPARK-10087
> Project: Spark
> Issue Type: Bug
> Components: Scheduler
> Affects Versions: 1.5.0
> Reporter: Yin Huai
> Assignee: Yin Huai
> Priority: Critical
> Fix For: 1.5.0
>
>
> In some cases, when spark.shuffle.reduceLocality.enabled is enabled, we are scheduling all reducers to the same executor (the cluster has plenty of resources). Changing spark.shuffle.reduceLocality.enabled to false resolve the problem.
> Comments of https://github.com/apache/spark/pull/8280 provide more details of the symptom of this issue.
> The query I was using is
> {code:sql}
> select
> i_brand_id,
> i_brand,
> i_manufact_id,
> i_manufact,
> sum(ss_ext_sales_price) ext_price
> from
> store_sales
> join item on (store_sales.ss_item_sk = item.i_item_sk)
> join customer on (store_sales.ss_customer_sk = customer.c_customer_sk)
> join customer_address on (customer.c_current_addr_sk = customer_address.ca_address_sk)
> join store on (store_sales.ss_store_sk = store.s_store_sk)
> join date_dim on (store_sales.ss_sold_date_sk = date_dim.d_date_sk)
> where
> --ss_date between '1999-11-01' and '1999-11-30'
> ss_sold_date_sk between 2451484 and 2451513
> and d_moy = 11
> and d_year = 1999
> and i_manager_id = 7
> and substr(ca_zip, 1, 5) <> substr(s_zip, 1, 5)
> group by
> i_brand,
> i_brand_id,
> i_manufact_id,
> i_manufact
> order by
> ext_price desc,
> i_brand,
> i_brand_id,
> i_manufact_id,
> i_manufact
> limit 100
> {code}
> The dataset is tpc-ds scale factor 1500. To reproduce the problem, you can just join store_sales with customer and make sure there is only one mapper reads the data of customer.
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org