You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2020/08/02 00:43:00 UTC

[jira] [Commented] (IMPALA-5746) Remote fragments continue to hold onto memory after stopping the coordinator daemon

    [ https://issues.apache.org/jira/browse/IMPALA-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17169433#comment-17169433 ] 

ASF subversion and git services commented on IMPALA-5746:
---------------------------------------------------------

Commit 9d43cfdaeeb1e0a88af3b7aefdc28aa585927a03 in impala's branch refs/heads/master from wzhou-code
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=9d43cfd ]

IMPALA-5746: Cancel all queries scheduled by failed coordinators

Executor registers the updating of cluster membership. When coordinators
are absence from the active cluster membership list, executer cancels
all the running fragments of the queries which are scheduled by the
inactive coordinators since the executer cannot send results back to
the inactive/failed coordinators. This makes executers quickly release
the resources allocated for those running fragments to be cancelled.

Testing:
- Added new test case TestProcessFailures::test_kill_coordinator
  and ran the test case as following command:
    ./bin/impala-py.test tests/custom_cluster/test_process_failures.py\
      ::TestProcessFailures::test_kill_coordinator \
      --exploration_strategy=exhaustive.
- Passed the core test.

Change-Id: I918fcc27649d5d2bbe8b6ef47fbd9810ae5f57bd
Reviewed-on: http://gerrit.cloudera.org:8080/16215
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Remote fragments continue to hold onto memory after stopping the coordinator daemon
> -----------------------------------------------------------------------------------
>
>                 Key: IMPALA-5746
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5746
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Distributed Exec
>    Affects Versions: Impala 2.10.0
>            Reporter: Mostafa Mokhtar
>            Assignee: Wenzhe Zhou
>            Priority: Critical
>             Fix For: Impala 4.0
>
>         Attachments: remote_fragments_holding_memory.txt
>
>
> Repro 
> # Start running queries 
> # Kill the coordinator node 
> # On the running Impalad check the memz tab, remote fragments continue to run and hold on to resources
> Remote fragments held on to memory +30 minutes after stopping the coordinator service. 
> Attached thread dump from an Impalad running remote fragments .
> Snapshot of memz tab 30 minutes after killing the coordinator
> {code}
> Process: Limit=201.73 GB Total=5.32 GB Peak=179.36 GB
>   Free Disk IO Buffers: Total=1.87 GB Peak=1.87 GB
>   RequestPool=root.default: Total=1.35 GB Peak=178.51 GB
>     Query(f64169d4bb3c901c:3a21d8ae00000000): Total=2.64 MB Peak=104.73 MB
>       Fragment f64169d4bb3c901c:3a21d8ae00000051: Total=2.64 MB Peak=2.67 MB
>         AGGREGATION_NODE (id=15): Total=2.54 MB Peak=2.57 MB
>           Exprs: Total=30.12 KB Peak=30.12 KB
>         EXCHANGE_NODE (id=14): Total=0 Peak=0
>         DataStreamRecvr: Total=0 Peak=12.29 KB
>         DataStreamSender (dst_id=17): Total=85.31 KB Peak=85.31 KB
>         CodeGen: Total=1.53 KB Peak=374.50 KB
>       Block Manager: Limit=161.39 GB Total=512.00 KB Peak=1.54 MB
>     Query(2a4f12b3b4b1dc8c:db7e8cf200000000): Total=258.29 MB Peak=412.98 MB
>       Fragment 2a4f12b3b4b1dc8c:db7e8cf20000008c: Total=2.29 MB Peak=2.29 MB
>         SORT_NODE (id=11): Total=4.00 KB Peak=4.00 KB
>         AGGREGATION_NODE (id=20): Total=2.27 MB Peak=2.27 MB
>           Exprs: Total=25.12 KB Peak=25.12 KB
>         EXCHANGE_NODE (id=19): Total=0 Peak=0
>         DataStreamRecvr: Total=0 Peak=0
>         DataStreamSender (dst_id=21): Total=3.88 KB Peak=3.88 KB
>         CodeGen: Total=4.17 KB Peak=1.05 MB
>       Block Manager: Limit=161.39 GB Total=256.25 MB Peak=321.66 MB
>     Query(68421d2a5dea0775:83f5d97200000000): Total=282.77 MB Peak=443.53 MB
>       Fragment 68421d2a5dea0775:83f5d9720000004a: Total=26.77 MB Peak=26.92 MB
>         SORT_NODE (id=8): Total=8.00 KB Peak=8.00 KB
>           Exprs: Total=4.00 KB Peak=4.00 KB
>         ANALYTIC_EVAL_NODE (id=7): Total=4.00 KB Peak=4.00 KB
>           Exprs: Total=4.00 KB Peak=4.00 KB
>         SORT_NODE (id=6): Total=24.00 MB Peak=24.00 MB
>         AGGREGATION_NODE (id=12): Total=2.72 MB Peak=2.83 MB
>           Exprs: Total=85.12 KB Peak=85.12 KB
>         EXCHANGE_NODE (id=11): Total=0 Peak=0
>         DataStreamRecvr: Total=0 Peak=84.80 KB
>         DataStreamSender (dst_id=13): Total=1.27 KB Peak=1.27 KB
>         CodeGen: Total=24.80 KB Peak=4.13 MB
>       Block Manager: Limit=161.39 GB Total=280.50 MB Peak=286.52 MB
>     Query(e94c89fa89a74d27:82812bf900000000): Total=258.29 MB Peak=436.85 MB
>       Fragment e94c89fa89a74d27:82812bf90000008e: Total=2.29 MB Peak=2.29 MB
>         SORT_NODE (id=11): Total=4.00 KB Peak=4.00 KB
>         AGGREGATION_NODE (id=20): Total=2.27 MB Peak=2.27 MB
>           Exprs: Total=25.12 KB Peak=25.12 KB
>         EXCHANGE_NODE (id=19): Total=0 Peak=0
>         DataStreamRecvr: Total=0 Peak=0
>         DataStreamSender (dst_id=21): Total=3.88 KB Peak=3.88 KB
>         CodeGen: Total=4.17 KB Peak=1.05 MB
>       Block Manager: Limit=161.39 GB Total=256.25 MB Peak=321.62 MB
>     Query(4e43dad3bdc935d8:938b8b7e00000000): Total=2.65 MB Peak=105.60 MB
>       Fragment 4e43dad3bdc935d8:938b8b7e00000052: Total=2.65 MB Peak=2.68 MB
>         AGGREGATION_NODE (id=15): Total=2.55 MB Peak=2.57 MB
>           Exprs: Total=30.12 KB Peak=30.12 KB
>         EXCHANGE_NODE (id=14): Total=0 Peak=0
>         DataStreamRecvr: Total=0 Peak=13.68 KB
>         DataStreamSender (dst_id=17): Total=91.41 KB Peak=91.41 KB
>         CodeGen: Total=1.53 KB Peak=374.50 KB
>       Block Manager: Limit=161.39 GB Total=512.00 KB Peak=1.30 MB
>     Query(b34bdd65f1ed017e:5a0291bd00000000): Total=2.37 MB Peak=106.56 MB
>       Fragment b34bdd65f1ed017e:5a0291bd0000004b: Total=2.37 MB Peak=2.37 MB
>         SORT_NODE (id=6): Total=4.00 KB Peak=4.00 KB
>         AGGREGATION_NODE (id=10): Total=2.35 MB Peak=2.35 MB
>           Exprs: Total=34.12 KB Peak=34.12 KB
>         EXCHANGE_NODE (id=9): Total=0 Peak=0
>         DataStreamRecvr: Total=0 Peak=4.23 KB
>         DataStreamSender (dst_id=11): Total=3.45 KB Peak=3.45 KB
>         CodeGen: Total=4.51 KB Peak=1.11 MB
>       Block Manager: Limit=161.39 GB Total=256.00 KB Peak=912.81 KB
>     Query(b74ba58d53b6c45f:3e8228600000000): Total=190.41 MB Peak=425.09 MB
>       Fragment b74ba58d53b6c45f:3e822860000009f: Total=67.90 KB Peak=2.34 MB
>         SORT_NODE (id=14): Total=4.00 KB Peak=4.00 KB
>         HASH_JOIN_NODE (id=13): Total=42.25 KB Peak=42.25 KB
>           Exprs: Total=9.12 KB Peak=9.12 KB
>           Hash Join Builder (join_node_id=13): Total=9.12 KB Peak=9.12 KB
>             Hash Join Builder (join_node_id=13) Exprs: Total=9.12 KB Peak=9.12 KB
>         HDFS_SCAN_NODE (id=11): Total=0 Peak=0
>         EXCHANGE_NODE (id=24): Total=0 Peak=0
>         DataStreamRecvr: Total=0 Peak=0
>         DataStreamSender (dst_id=25): Total=1.05 KB Peak=1.05 KB
>         CodeGen: Total=12.59 KB Peak=2.29 MB
>       Block Manager: Limit=161.39 GB Total=160.75 MB Peak=160.83 MB
>       Fragment b74ba58d53b6c45f:3e8228600000085: Total=2.32 MB Peak=2.32 MB
>         AGGREGATION_NODE (id=21): Total=2.29 MB Peak=2.29 MB
>           Exprs: Total=44.12 KB Peak=44.12 KB
>         EXCHANGE_NODE (id=20): Total=0 Peak=0
>         DataStreamRecvr: Total=0 Peak=0
>         DataStreamSender (dst_id=23): Total=22.09 KB Peak=22.09 KB
>         CodeGen: Total=2.37 KB Peak=546.00 KB
>       Fragment b74ba58d53b6c45f:3e8228600000060: Total=188.02 MB Peak=188.34 MB
>         Runtime Filter Bank: Total=16.00 MB Peak=16.00 MB
>         AGGREGATION_NODE (id=9): Total=1.67 MB Peak=1.67 MB
>           Exprs: Total=44.12 KB Peak=44.12 KB
>         HASH_JOIN_NODE (id=8): Total=1.13 MB Peak=1.15 MB
>           Exprs: Total=9.12 KB Peak=9.12 KB
>           Hash Join Builder (join_node_id=8): Total=1.01 MB Peak=1.02 MB
>             Hash Join Builder (join_node_id=8) Exprs: Total=9.12 KB Peak=9.12 KB
>         HASH_JOIN_NODE (id=7): Total=169.14 MB Peak=169.14 MB
>           Exprs: Total=9.12 KB Peak=9.12 KB
>           Hash Join Builder (join_node_id=7): Total=169.01 MB Peak=169.02 MB
>             Hash Join Builder (join_node_id=7) Exprs: Total=9.12 KB Peak=9.12 KB
>         EXCHANGE_NODE (id=17): Total=0 Peak=0
>         DataStreamRecvr: Total=0 Peak=587.50 KB
>         EXCHANGE_NODE (id=18): Total=0 Peak=0
>         DataStreamRecvr: Total=0 Peak=316.11 KB
>         EXCHANGE_NODE (id=19): Total=0 Peak=0
>         DataStreamRecvr: Total=0 Peak=4.70 KB
>         DataStreamSender (dst_id=20): Total=58.39 KB Peak=58.39 KB
>         CodeGen: Total=16.80 KB Peak=2.83 MB
>     Query(cb4c14997ad6add2:c8f120100000000): Total=190.36 MB Peak=443.00 MB
>       Fragment cb4c14997ad6add2:c8f1201000000a4: Total=67.90 KB Peak=2.34 MB
>         SORT_NODE (id=14): Total=4.00 KB Peak=4.00 KB
>         HASH_JOIN_NODE (id=13): Total=42.25 KB Peak=42.25 KB
>           Exprs: Total=9.12 KB Peak=9.12 KB
>           Hash Join Builder (join_node_id=13): Total=9.12 KB Peak=9.12 KB
>             Hash Join Builder (join_node_id=13) Exprs: Total=9.12 KB Peak=9.12 KB
>         HDFS_SCAN_NODE (id=11): Total=0 Peak=0
>         EXCHANGE_NODE (id=24): Total=0 Peak=0
>         DataStreamRecvr: Total=0 Peak=0
>         DataStreamSender (dst_id=25): Total=1.05 KB Peak=1.05 KB
>         CodeGen: Total=12.59 KB Peak=2.29 MB
>       Block Manager: Limit=161.39 GB Total=160.75 MB Peak=160.83 MB
>       Fragment cb4c14997ad6add2:c8f120100000088: Total=2.33 MB Peak=2.33 MB
>         AGGREGATION_NODE (id=21): Total=2.29 MB Peak=2.29 MB
>           Exprs: Total=44.12 KB Peak=44.12 KB
>         EXCHANGE_NODE (id=20): Total=0 Peak=0
>         DataStreamRecvr: Total=0 Peak=0
>         DataStreamSender (dst_id=23): Total=26.83 KB Peak=26.83 KB
>         CodeGen: Total=2.37 KB Peak=546.00 KB
>       Fragment cb4c14997ad6add2:c8f120100000063: Total=187.97 MB Peak=188.08 MB
>         Runtime Filter Bank: Total=16.00 MB Peak=16.00 MB
>         AGGREGATION_NODE (id=9): Total=1.67 MB Peak=1.67 MB
>           Exprs: Total=44.12 KB Peak=44.12 KB
>         HASH_JOIN_NODE (id=8): Total=1.14 MB Peak=1.15 MB
>           Exprs: Total=9.12 KB Peak=9.12 KB
>           Hash Join Builder (join_node_id=8): Total=1.01 MB Peak=1.02 MB
>             Hash Join Builder (join_node_id=8) Exprs: Total=9.12 KB Peak=9.12 KB
>         HASH_JOIN_NODE (id=7): Total=169.07 MB Peak=169.14 MB
>           Exprs: Total=9.12 KB Peak=9.12 KB
>           Hash Join Builder (join_node_id=7): Total=169.01 MB Peak=169.02 MB
>             Hash Join Builder (join_node_id=7) Exprs: Total=9.12 KB Peak=9.12 KB
>         EXCHANGE_NODE (id=17): Total=0 Peak=0
>         DataStreamRecvr: Total=0 Peak=314.15 KB
>         EXCHANGE_NODE (id=18): Total=0 Peak=0
>         DataStreamRecvr: Total=0 Peak=861.18 KB
>         EXCHANGE_NODE (id=19): Total=0 Peak=0
>         DataStreamRecvr: Total=0 Peak=4.70 KB
>         DataStreamSender (dst_id=20): Total=58.39 KB Peak=58.39 KB
>         CodeGen: Total=16.80 KB Peak=2.83 MB
>     Query(f04a57ce97102dd7:c2a1081700000000): Total=190.31 MB Peak=419.11 MB
>       Fragment f04a57ce97102dd7:c2a1081700000085: Total=2.33 MB Peak=2.33 MB
>         AGGREGATION_NODE (id=21): Total=2.29 MB Peak=2.29 MB
>           Exprs: Total=44.12 KB Peak=44.12 KB
>         EXCHANGE_NODE (id=20): Total=0 Peak=0
>         DataStreamRecvr: Total=0 Peak=0
>         DataStreamSender (dst_id=23): Total=23.67 KB Peak=23.67 KB
>         CodeGen: Total=2.37 KB Peak=546.00 KB
>       Block Manager: Limit=161.39 GB Total=160.75 MB Peak=160.83 MB
>       Fragment f04a57ce97102dd7:c2a1081700000060: Total=187.99 MB Peak=188.07 MB
>         Runtime Filter Bank: Total=16.00 MB Peak=16.00 MB
>         AGGREGATION_NODE (id=9): Total=1.68 MB Peak=1.68 MB
>           Exprs: Total=44.12 KB Peak=44.12 KB
>         HASH_JOIN_NODE (id=8): Total=1.14 MB Peak=1.15 MB
>           Exprs: Total=9.12 KB Peak=9.12 KB
>           Hash Join Builder (join_node_id=8): Total=1.01 MB Peak=1.02 MB
>             Hash Join Builder (join_node_id=8) Exprs: Total=9.12 KB Peak=9.12 KB
>         HASH_JOIN_NODE (id=7): Total=169.09 MB Peak=169.14 MB
>           Exprs: Total=9.12 KB Peak=9.12 KB
>           Hash Join Builder (join_node_id=7): Total=169.01 MB Peak=169.02 MB
>             Hash Join Builder (join_node_id=7) Exprs: Total=9.12 KB Peak=9.12 KB
>         EXCHANGE_NODE (id=17): Total=0 Peak=0
>         DataStreamRecvr: Total=0 Peak=156.71 KB
>         EXCHANGE_NODE (id=18): Total=0 Peak=0
>         DataStreamRecvr: Total=0 Peak=1.32 MB
>         EXCHANGE_NODE (id=19): Total=0 Peak=0
>         DataStreamRecvr: Total=0 Peak=4.70 KB
>         DataStreamSender (dst_id=20): Total=58.39 KB Peak=58.39 KB
>         CodeGen: Total=16.80 KB Peak=2.83 MB
>   Untracked Memory: Total=2.10 GB
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org