You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2021/06/11 07:29:00 UTC

[jira] [Commented] (IMPALA-6294) Concurrent hung with lots of spilling make slow progress due to blocking in DataStreamRecvr and DataStreamSender

    [ https://issues.apache.org/jira/browse/IMPALA-6294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17361492#comment-17361492 ] 

Quanlong Huang commented on IMPALA-6294:
----------------------------------------

FWIW, IMPALA-10578 is a similar issue but finally found the cause is a poor configuration that only one rotational disk is configed for spilling and that disk is also used by logging. The spilling saturates the disk so block loggings and finally block RPCs.

> Concurrent hung with lots of spilling make slow progress due to blocking in DataStreamRecvr and DataStreamSender
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-6294
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6294
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 2.11.0
>            Reporter: Mostafa Mokhtar
>            Assignee: Michael Ho
>            Priority: Critical
>         Attachments: IMPALA-6285 TPCDS Q3 slow broadcast, slow_broadcast_q3_reciever.txt, slow_broadcast_q3_sender.txt
>
>
> While running a highly concurrent spilling workload on a large cluster queries start running slower, even light weight queries that are not running are affected by this slow down. 
> {code}
>           EXCHANGE_NODE (id=9):(Total: 3m1s, non-child: 3m1s, % non-child: 100.00%)
>              - ConvertRowBatchTime: 999.990us
>              - PeakMemoryUsage: 0
>              - RowsReturned: 108.00K (108001)
>              - RowsReturnedRate: 593.00 /sec
>             DataStreamReceiver:
>               BytesReceived(4s000ms): 254.47 KB, 338.82 KB, 338.82 KB, 852.43 KB, 1.32 MB, 1.33 MB, 1.50 MB, 2.53 MB, 2.99 MB, 3.00 MB, 3.00 MB, 3.00 MB, 3.00 MB, 3.00 MB, 3.00 MB, 3.00 MB, 3.00 MB, 3.00 MB, 3.16 MB, 3.49 MB, 3.80 MB, 4.15 MB, 4.55 MB, 4.84 MB, 4.99 MB, 5.07 MB, 5.41 MB, 5.75 MB, 5.92 MB, 6.00 MB, 6.00 MB, 6.00 MB, 6.07 MB, 6.28 MB, 6.33 MB, 6.43 MB, 6.67 MB, 6.91 MB, 7.29 MB, 8.03 MB, 9.12 MB, 9.68 MB, 9.90 MB, 9.97 MB, 10.44 MB, 11.25 MB
>                - BytesReceived: 11.73 MB (12301692)
>                - DeserializeRowBatchTimer: 957.990ms
>                - FirstBatchArrivalWaitTime: 0.000ns
>                - PeakMemoryUsage: 644.44 KB (659904)
>                - SendersBlockedTimer: 0.000ns
>                - SendersBlockedTotalTimer(*): 0.000ns
> {code}
> {code}
>         DataStreamSender (dst_id=9):(Total: 1s819ms, non-child: 1s819ms, % non-child: 100.00%)
>            - BytesSent: 234.64 MB (246033840)
>            - NetworkThroughput(*): 139.58 MB/sec
>            - OverallThroughput: 128.92 MB/sec
>            - PeakMemoryUsage: 33.12 KB (33920)
>            - RowsReturned: 108.00K (108001)
>            - SerializeBatchTime: 133.998ms
>            - TransmitDataRPCTime: 1s680ms
>            - UncompressedRowBatchSize: 446.42 MB (468102200)
> {code}
> Timeouts seen in IMPALA-6285 are caused by this issue
> {code}
> I1206 12:44:14.925405 25274 status.cc:58] RPC recv timed out: Client foo-17.domain.com:22000 timed-out during recv call.
>     @           0x957a6a  impala::Status::Status()
>     @          0x11dd5fe  impala::DataStreamSender::Channel::DoTransmitDataRpc()
>     @          0x11ddcd4  impala::DataStreamSender::Channel::TransmitDataHelper()
>     @          0x11de080  impala::DataStreamSender::Channel::TransmitData()
>     @          0x11e1004  impala::ThreadPool<>::WorkerThread()
>     @           0xd10063  impala::Thread::SuperviseThread()
>     @           0xd107a4  boost::detail::thread_data<>::run()
>     @          0x128997a  (unknown)
>     @     0x7f68c5bc7e25  start_thread
>     @     0x7f68c58f534d  __clone
> {code}
> A similar behavior was also observed with KRPC enabled IMPALA-6048



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org