You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2020/10/29 16:52:00 UTC

[jira] [Commented] (IMPALA-7097) Print EC info in the query plan and profile

    [ https://issues.apache.org/jira/browse/IMPALA-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223032#comment-17223032 ] 

ASF subversion and git services commented on IMPALA-7097:
---------------------------------------------------------

Commit 1bd27a3ea87622edc869d3ff72ce9b0881451c95 in impala's branch refs/heads/master from Qifan Chen
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=1bd27a3 ]

IMPALA-7097 Print EC info in the query plan and profile

This fix added the functionality to show the number of erasure coded
files and the total size of such files in the scan node in the
query plan and profile. Shown below are two examples for the HDFS file
system.

Non-partitioned table:
00:SCAN HDFS [default.test_show_ec_nonpart, RANDOM]
   HDFS partitions=1/1 files=2 size=1.65KB
   erasure coded: files=2 size=1.65KB
   stored statistics:

Partitioned table:
00:SCAN HDFS [default.test_show_ec_part]
   HDFS partitions=4/4 files=4 size=2.36KB
   erasure coded: files=3 size=1.77KB
   row-size=12B cardinality=999

Testing:
1. Unit testing;
2. Ran Core tests successfully.

Change-Id: I6ea378914624a714fde820d290b3b9c43325c6a1
Reviewed-on: http://gerrit.cloudera.org:8080/16587
Reviewed-by: Aman Sinha <am...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Print EC info in the query plan and profile
> -------------------------------------------
>
>                 Key: IMPALA-7097
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7097
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Frontend
>    Affects Versions: Impala 3.1.0
>            Reporter: Tianyi Wang
>            Assignee: Qifan Chen
>            Priority: Minor
>
> Impala should print EC-related info in the query plan to help user understand the behavior and diagnose performance issues easier. The most trivial design would look like:
> {noformat}
> [localhost:21000] functional> explain select * from functional.alltypes;
> Query: explain select * from functional.alltypes
> +-------------------------------------------------------------+
> | Explain String                                              |
> +-------------------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=32.00KB Threads=3 |
> | Per-Host Resource Estimates: Memory=160.00MB                |
> | Codegen disabled by planner                                 |
> |                                                             |
> | PLAN-ROOT SINK                                              |
> | |                                                           |
> | 01:EXCHANGE [UNPARTITIONED]                                 |
> | |                                                           |
> | 00:SCAN HDFS [functional.alltypes]                          |
> |    partitions=24/24 files=25 size=498.41KB                  |
> |    EC files=24 size=478.45KB                                |
> +-------------------------------------------------------------+
> {noformat}
> In the query profile we should at least print "EC bytes scanned" in the scan node.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org