You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yin Huai (JIRA)" <ji...@apache.org> on 2015/08/19 04:14:45 UTC

[jira] [Updated] (SPARK-10087) In some cases, all reducers are scheduled to the same executor

     [ https://issues.apache.org/jira/browse/SPARK-10087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yin Huai updated SPARK-10087:
-----------------------------
    Description: 
In some cases, when spark.shuffle.reduceLocality.enabled is enabled, we are scheduling all reducers to the same executor (the cluster has plenty of resources). Changing spark.shuffle.reduceLocality.enabled to false resolve the problem. 

Comments of https://github.com/apache/spark/pull/8280 provide more details of the symptom of this issue.

The query I was using is
{code:sql}
select
  i_brand_id,
  i_brand,
  i_manufact_id,
  i_manufact,
  sum(ss_ext_sales_price) ext_price
from
  store_sales
  join item on (store_sales.ss_item_sk = item.i_item_sk)
  join customer on (store_sales.ss_customer_sk = customer.c_customer_sk)
  join customer_address on (customer.c_current_addr_sk = customer_address.ca_address_sk)
  join store on (store_sales.ss_store_sk = store.s_store_sk)
  join date_dim on (store_sales.ss_sold_date_sk = date_dim.d_date_sk)
where
  --ss_date between '1999-11-01' and '1999-11-30'
  ss_sold_date_sk between 2451484 and 2451513
  and d_moy = 11
  and d_year = 1999
  and i_manager_id = 7
  and substr(ca_zip, 1, 5) <> substr(s_zip, 1, 5)
group by
  i_brand,
  i_brand_id,
  i_manufact_id,
  i_manufact
order by
  ext_price desc,
  i_brand,
  i_brand_id,
  i_manufact_id,
  i_manufact
limit 100
{code}
The dataset is tpc-ds scale factor 1500. To reproduce the problem, you can just join store_sales with customer and make sure there is only one mapper reads the data of customer.
 

  was:
In some cases, when spark.shuffle.reduceLocality.enabled is enabled, we are scheduling all reducers to the same executor (the cluster has plenty of resources). Changing spark.shuffle.reduceLocality.enabled to false resolve the problem. 

Here is a little bit more information. For one of my query, all 200 reducers were scheduled to the same reducer and every reducer has about 800 KB input.
 


> In some cases, all reducers are scheduled to the same executor
> --------------------------------------------------------------
>
>                 Key: SPARK-10087
>                 URL: https://issues.apache.org/jira/browse/SPARK-10087
>             Project: Spark
>          Issue Type: Bug
>          Components: Scheduler
>    Affects Versions: 1.5.0
>            Reporter: Yin Huai
>            Priority: Critical
>
> In some cases, when spark.shuffle.reduceLocality.enabled is enabled, we are scheduling all reducers to the same executor (the cluster has plenty of resources). Changing spark.shuffle.reduceLocality.enabled to false resolve the problem. 
> Comments of https://github.com/apache/spark/pull/8280 provide more details of the symptom of this issue.
> The query I was using is
> {code:sql}
> select
>   i_brand_id,
>   i_brand,
>   i_manufact_id,
>   i_manufact,
>   sum(ss_ext_sales_price) ext_price
> from
>   store_sales
>   join item on (store_sales.ss_item_sk = item.i_item_sk)
>   join customer on (store_sales.ss_customer_sk = customer.c_customer_sk)
>   join customer_address on (customer.c_current_addr_sk = customer_address.ca_address_sk)
>   join store on (store_sales.ss_store_sk = store.s_store_sk)
>   join date_dim on (store_sales.ss_sold_date_sk = date_dim.d_date_sk)
> where
>   --ss_date between '1999-11-01' and '1999-11-30'
>   ss_sold_date_sk between 2451484 and 2451513
>   and d_moy = 11
>   and d_year = 1999
>   and i_manager_id = 7
>   and substr(ca_zip, 1, 5) <> substr(s_zip, 1, 5)
> group by
>   i_brand,
>   i_brand_id,
>   i_manufact_id,
>   i_manufact
> order by
>   ext_price desc,
>   i_brand,
>   i_brand_id,
>   i_manufact_id,
>   i_manufact
> limit 100
> {code}
> The dataset is tpc-ds scale factor 1500. To reproduce the problem, you can just join store_sales with customer and make sure there is only one mapper reads the data of customer.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org