You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:15:39 UTC

[jira] [Resolved] (SPARK-22639) no rowcount estimation returned if groupby clause involves substring

     [ https://issues.apache.org/jira/browse/SPARK-22639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-22639.
----------------------------------
    Resolution: Incomplete

> no rowcount estimation returned if groupby clause involves substring
> --------------------------------------------------------------------
>
>                 Key: SPARK-22639
>                 URL: https://issues.apache.org/jira/browse/SPARK-22639
>             Project: Spark
>          Issue Type: Bug
>          Components: Optimizer, SQL
>    Affects Versions: 2.2.0
>            Reporter: ey-chih chow
>            Priority: Major
>              Labels: bulk-closed
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> CBO can not estimate rowcount if the groupby clause of a query involves the expression substring.  For example, we can not estimate the row count of the following query, extracted from TPC-DS queries and based on the TPC-DS schema:
> SELECT item.`i_brand`, count(1), date_dim.`d_year`, item.`i_brand_id`, sum(store_sales.`ss_ext_sales_price`) AS `ext_price`, item.`i_item_sk`   
> FROM  store_sales  INNER JOIN date_dim ON (date_dim.`d_date_sk` = store_sales.`ss_sold_date_sk`)  INNER JOIN item ON (store_sales.`ss_item_sk` = item.`i_item_sk`)  
> GROUP BY item.`i_brand`, date_dim.`d_date`, substring(item.`i_item_desc`, 1, 30), date_dim.`d_year`, item.`i_brand_id`, item.`i_item_sk`
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org