You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2018/08/09 22:31:00 UTC
[jira] [Commented] (IMPALA-6034) Add query option that limits scanned bytes at runtime

    [ https://issues.apache.org/jira/browse/IMPALA-6034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16575498#comment-16575498 ] 

ASF subversion and git services commented on IMPALA-6034:
---------------------------------------------------------

Commit 3e17705ecaba0b6ab9ae929e6c7c409e0b6aea1d in impala's branch refs/heads/master from [~tarmstrong@cloudera.com]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=3e17705 ]

IMPALA-6034: Add scanned bytes limits per query

This adds support for aggregate resource limits at runtime, specified
via query options. If a query exceeds a limit it is terminated. The
checks are periodic so the query may go somewhat over the limits.

SCAN_BYTES_LIMIT is exposed as an advanced query option.

CPU_LIMIT_S is hidden as a development query option because it is flawed
- the CPU user/sys time is only updated upon thread completion, so in
many cases the limit will not take effect until well after the resources
have been used. IMPALA-7318 tracks enabling this.

Query profile is updated to include query wide and per backend metrics
for CPU and scanned bytes. Example from "select count(*) from
tpch_parquet.lineitem":

    Per Node Peak Memory Usage: tarmstrong-box:22000(289.50 KB) tarmstrong-box:22001(249.50 KB) tarmstrong-box:22002(249.50 KB)
    Per Node Bytes Read: tarmstrong-box:22000(100.00 KB) tarmstrong-box:22001(100.00 KB) tarmstrong-box:22002(100.00 KB)
    Per Node User Time: tarmstrong-box:22000(40.000ms) tarmstrong-box:22001(32.000ms) tarmstrong-box:22002(24.000ms)
    Per Node System Time: tarmstrong-box:22000(0.000ns) tarmstrong-box:22001(0.000ns) tarmstrong-box:22002(0.000ns)
     - FiltersReceived: 0 (0)
     - FinalizationTimer: 0.000ns
     - NumBackends: 3 (3)
     - NumFragmentInstances: 4 (4)
     - NumFragments: 2 (2)
     - TotalBytesRead: 300.00 KB (307200)
     - TotalCpuTime: 96.000ms

Testing:
Added tests for various permutations for CPU_LIMIT_S and
SCAN_BYTES_LIMIT

Based on a previous patch by Mostafa Mokhtar
<mm...@cloudera.com>

Change-Id: I3e85f80b70b3fce47e637e9322ed0316ee84f6a9
Reviewed-on: http://gerrit.cloudera.org:8080/11081
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Add query option that limits scanned bytes at runtime
> -----------------------------------------------------
>
>                 Key: IMPALA-6034
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6034
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Distributed Exec
>            Reporter: Mostafa Mokhtar
>            Assignee: Tim Armstrong
>            Priority: Major
>             Fix For: Impala 3.1.0
>
>
> Reject queries that scans large data before executing the query.
> This is a mechanism to protect the cluster from potentially harmful queries.
> MAX_READ_BYTES: [0]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org