You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2023/06/24 04:44:00 UTC
[jira] [Commented] (IMPALA-10287) Distribution strategy is sub-optimal for certain queries

    [ https://issues.apache.org/jira/browse/IMPALA-10287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17736663#comment-17736663 ] 

ASF subversion and git services commented on IMPALA-10287:
----------------------------------------------------------

Commit 7a94adbc3076881f6e7043a429a7294a04bd7519 in impala's branch refs/heads/master from Riza Suminto
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=7a94adbc3 ]

IMPALA-12192: Fix scaling bug in scan fragment

IMPALA-12091 has a bug where scan fragment parallelism will always be
limited solely by the ScanNode cost. If ScanNode is colocated with other
query node operators that have higher processing costs, Planner will not
scale it up beyond what is allowed by the ScanNode cost.

This patch fixes the problem in two aspects. The first is to allow a
scan fragment to scale up higher as long as it is within the total
fragment cost and the number of effective scan ranges. The second is to
add missing Math.max() in CostingSegment.java which causes lower
fragment parallelism even when the total fragment cost is high.

IMPALA-10287 optimization is re-enabled to reduce regression in TPC-DS
Q78. Ideally, the broadcast vs partitioned costing formula during
distributed planning should not rely on numInstance. But enabling this
optimization ensures consistent query plan shape when comparing against
MT_DOP plan. This optimization can still be disabled by specifying
USE_DOP_FOR_COSTING=false.

This patch also does some cleanup including:
- Fix "max-parallelism" value in explain string.
- Make a constant in ScanNode.rowMaterializationCost() into a backend
  flag named scan_range_cost_factor for experimental purposes.
- Replace all references to ProcessingCost.isComputeCost() to
  queryOptions.isCompute_processing_cost() directly.
- Add Precondition in PlanFragment.getNumInstances() to verify that the
  fragment's num instance is not modified anymore after the costing
  algorithm finish.

Testing:
- Manually run TPCDS Q84 over tpcds10_parquet and confirm that the
  leftmost scan fragment parallelism is raised from 12 (before the
  patch) to 18 (after the patch).
- Add test in PlannerTest.testProcessingCost that reproduces the issue.
- Update compute stats test in test_executor_groups.py to maintain test
  assertion.
- Pass core tests.

Change-Id: I7010f6c3bc48ae3f74e8db98a83f645b6c157226
Reviewed-on: http://gerrit.cloudera.org:8080/20024
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Distribution strategy is sub-optimal for certain queries
> --------------------------------------------------------
>
>                 Key: IMPALA-10287
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10287
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>    Affects Versions: Impala 3.4.0
>            Reporter: Aman Sinha
>            Assignee: Aman Sinha
>            Priority: Major
>             Fix For: Impala 4.0.0
>
>
> I ran a simplified query (extracted from q78 of TPC-DS) on a 600GB dataset on an 8 node cluster. I forced the distribution strategy for the left outer join and compared Broadcast vs Hash Partition for different values of mt_dop.  The example query and results are shown below (elapsed times are in seconds):
> {noformat}
> Query (with shuffle or broadcast hint):
> select count(*)
>    from store_sales
>    left join [shuffle] store_returns on sr_ticket_number=ss_ticket_number 
>          and ss_item_sk=sr_item_sk
>    join date_dim on ss_sold_date_sk = d_date_sk
>    where sr_ticket_number is null
>    and d_year=2002;
> {noformat}
> ||mt_dop||Broadcast||Partition||
> |1|45|15|
> |2|37|9|
> |4|33|5|
> |8|31|4|
> |12|31|4|
> Given the nearly 7.5x speedup for partition distribution at mt_dop = 12 (which is the default), it indicates that the cost formula comparing the broadcast vs partition needs to be modified to take into account the mt_dop.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org