You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2020/01/18 05:29:00 UTC

[jira] [Commented] (IMPALA-9302) Multithreaded scanners don't check for filter effectiveness

    [ https://issues.apache.org/jira/browse/IMPALA-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018507#comment-17018507 ] 

ASF subversion and git services commented on IMPALA-9302:
---------------------------------------------------------

Commit 023e92f5e7b9f9a722db34bc738607aeae10a07a in impala's branch refs/heads/master from Tim Armstrong
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=023e92f ]

IMPALA-9302: disable ineffective filters for mt_dop > 0

The optimisation of disabling ineffective row-level
runtime filters was not implemented in the MT scan
code paths, because the ProcessSplit() functions,
where it was implemented, are not used for mt_dop > 0.

This change adds it to HdfsScanner::GetNext(), which
is used for mt_dop > 0 but not mt_dop = 0.

Testing:
Run existing runtime row filters test with mt_dop.
This reproduced the issue before I fixed it.

Change-Id: I8a55a9d4ac9e0d93cb3675dd2d5da086cb7d941d
Reviewed-on: http://gerrit.cloudera.org:8080/15065
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Multithreaded scanners don't check for filter effectiveness
> -----------------------------------------------------------
>
>                 Key: IMPALA-9302
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9302
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 3.3.0
>            Reporter: Tim Armstrong
>            Assignee: Tim Armstrong
>            Priority: Major
>              Labels: multithreading, performance
>
> This can be reproduced for TPC-H Q9. I saw this on scale factor 30 locally, where the mt_dop=4 version of the query uses a lot more CPU in the scan than the mt_dop=0 version.
> This turns out to be because none of the runtime filters are getting disabled, not even the ineffective ones.
> {noformat}
>           Filter 2 (16.00 MB):
>              - Files processed: 0 (0)
>              - Files rejected: 0 (0)
>              - Files total: 0 (0)
>              - RowGroups processed: 0 (0)
>              - RowGroups rejected: 0 (0)
>              - RowGroups total: 0 (0)
>              - Rows processed: 30.97M (30970695)
>              - Rows rejected: 0 (0)
>              - Rows total: 31.01M (31009074)
>              - Splits processed: 0 (0)
>              - Splits rejected: 0 (0)
>              - Splits total: 0 (0)
>           Filter 4 (8.00 MB):
>              - Files processed: 0 (0)
>              - Files rejected: 0 (0)
>              - Files total: 0 (0)
>              - RowGroups processed: 0 (0)
>              - RowGroups rejected: 0 (0)
>              - RowGroups total: 0 (0)
>              - Rows processed: 30.97M (30970695)
>              - Rows rejected: 0 (0)
>              - Rows total: 31.01M (31009074)
>              - Splits processed: 0 (0)
>              - Splits rejected: 0 (0)
>              - Splits total: 0 (0)
>           Filter 5 (8.00 MB):
>              - Files processed: 0 (0)
>              - Files rejected: 0 (0)
>              - Files total: 0 (0)
>              - RowGroups processed: 0 (0)
>              - RowGroups rejected: 0 (0)
>              - RowGroups total: 0 (0)
>              - Rows processed: 30.97M (30970695)
>              - Rows rejected: 0 (0)
>              - Rows total: 31.01M (31009074)
>              - Splits processed: 0 (0)
>              - Splits rejected: 0 (0)
>              - Splits total: 0 (0)
>           Filter 8 (1.00 MB):
>              - Files processed: 0 (0)
>              - Files rejected: 0 (0)
>              - Files total: 0 (0)
>              - RowGroups processed: 0 (0)
>              - RowGroups rejected: 0 (0)
>              - RowGroups total: 0 (0)
>              - Rows processed: 31.01M (31009074)
>              - Rows rejected: 0 (0)
>              - Rows total: 31.01M (31009074)
>              - Splits processed: 0 (0)
>              - Splits rejected: 0 (0)
>              - Splits total: 0 (0)
>           Filter 10 (1.00 MB):
>              - Files processed: 0 (0)
>              - Files rejected: 0 (0)
>              - Files total: 0 (0)
>              - RowGroups processed: 0 (0)
>              - RowGroups rejected: 0 (0)
>              - RowGroups total: 0 (0)
>              - Rows processed: 31.01M (31009074)
>              - Rows rejected: 29.32M (29317263)
>              - Rows total: 31.01M (31009074)
>              - Splits processed: 0 (0)
>              - Splits rejected: 0 (0)
>              - Splits total: 0 (0)
> {noformat}
> In contrast here are the filters for mt_dop=0, where not all the rows are processed.
> {noformat}
>           Filter 2 (16.00 MB):
>              - Files processed: 0 (0)
>              - Files rejected: 0 (0)
>              - Files total: 0 (0)
>              - RowGroups processed: 0 (0)
>              - RowGroups rejected: 0 (0)
>              - RowGroups total: 0 (0)
>              - Rows processed: 8.18M (8180257)
>              - Rows rejected: 0 (0)
>              - Rows total: 180.00M (179998372)
>              - Splits processed: 0 (0)
>              - Splits rejected: 0 (0)
>              - Splits total: 0 (0)
>           Filter 4 (8.00 MB):
>              - Files processed: 0 (0)
>              - Files rejected: 0 (0)
>              - Files total: 0 (0)
>              - RowGroups processed: 0 (0)
>              - RowGroups rejected: 0 (0)
>              - RowGroups total: 0 (0)
>              - Rows processed: 8.18M (8180257)
>              - Rows rejected: 0 (0)
>              - Rows total: 180.00M (179998372)
>              - Splits processed: 0 (0)
>              - Splits rejected: 0 (0)
>              - Splits total: 0 (0)
>           Filter 5 (8.00 MB):
>              - Files processed: 0 (0)
>              - Files rejected: 0 (0)
>              - Files total: 0 (0)
>              - RowGroups processed: 0 (0)
>              - RowGroups rejected: 0 (0)
>              - RowGroups total: 0 (0)
>              - Rows processed: 8.18M (8180257)
>              - Rows rejected: 0 (0)
>              - Rows total: 180.00M (179998372)
>              - Splits processed: 0 (0)
>              - Splits rejected: 0 (0)
>              - Splits total: 0 (0)
>           Filter 8 (1.00 MB):
>              - Files processed: 0 (0)
>              - Files rejected: 0 (0)
>              - Files total: 0 (0)
>              - RowGroups processed: 0 (0)
>              - RowGroups rejected: 0 (0)
>              - RowGroups total: 0 (0)
>              - Rows processed: 8.41M (8406914)
>              - Rows rejected: 0 (0)
>              - Rows total: 180.00M (179998372)
>              - Splits processed: 0 (0)
>              - Splits rejected: 0 (0)
>              - Splits total: 0 (0)
>           Filter 10 (1.00 MB):
>              - Files processed: 0 (0)
>              - Files rejected: 0 (0)
>              - Files total: 0 (0)
>              - RowGroups processed: 0 (0)
>              - RowGroups rejected: 0 (0)
>              - RowGroups total: 0 (0)
>              - Rows processed: 180.00M (179998372)
>              - Rows rejected: 170.18M (170177099)
>              - Rows total: 180.00M (179998372)
>              - Splits processed: 0 (0)
>              - Splits rejected: 0 (0)
>              - Splits total: 0 (0)
> {noformat}
> Perf top showed 28% of CPU time in impala::BloomFilter::BucketFindAVX2, which corroborates this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org