You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Tim Armstrong (Jira)" <ji...@apache.org> on 2020/03/17 15:37:00 UTC

[jira] [Commented] (IMPALA-6267) MT Scanners do not check runtime filters per-file before processing each split

    [ https://issues.apache.org/jira/browse/IMPALA-6267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17061004#comment-17061004 ] 

Tim Armstrong commented on IMPALA-6267:
---------------------------------------


IMPALA-6267: MT scanners check filters per split

Refactor the code for checking each scan range
against filters so that it can be shared between
the MT and non-MT scan node implementations.
Move it into StartNextScanRange(), which has the
advantage that we can skip issuing then cancelling
the I/O for the range.

Testing:
Added a regression test for the code path that failed
for multithreaded scans before this fix. Looped the
test for a couple of hours to flush out flakiness.

Fix some runtime filter tests where the mt_dop from
the dimensions was not applied. Fix
test_wait_time_cancellation() to work with mt_dop > 0,
where filters are waited for in Open() instead of GetNext(),
which means that the query does not get into the RUNNING state
while waiting for filters. Instead use the profile to detect
that execution started.

Ran core tests.

Change-Id: Ic40eb4cb2419393e6f7cd7bd019add9224946c4d
Reviewed-on: http://gerrit.cloudera.org:8080/15411
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> MT Scanners do not check runtime filters per-file before processing each split
> ------------------------------------------------------------------------------
>
>                 Key: IMPALA-6267
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6267
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 2.10.0
>            Reporter: Tim Armstrong
>            Assignee: Tim Armstrong
>            Priority: Major
>              Labels: perf
>             Fix For: Impala 4.0
>
>
> The old implementation of HdfsScanNode re-checks partition filters per scan range in HdfsScanNode::ProcessSplit() before processing each scan range. HdfsScanNodeMt does not have similar logic.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org