You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Henry Robinson (JIRA)" <ji...@apache.org> on 2017/05/20 20:40:04 UTC

[jira] [Resolved] (IMPALA-5093) Rare failure to decode LZ4 batch

     [ https://issues.apache.org/jira/browse/IMPALA-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Henry Robinson resolved IMPALA-5093.
------------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 2.10.0

We tracked this down to a lifecycle problem: outgoing sidecars were destroyed by {{Close()}} before the RPC layer had a chance to finish sending them. The fix (for now, while we work on the larger issue of buffer lifetimes in KUDU-2011) is to share ownership of the buffer through the {{RpcSidecar}}.

With this fix, we were able to run a stress test on 7 nodes for over 24 hours with no crashes, where before the test would fail within a few minutes.

> Rare failure to decode LZ4 batch
> --------------------------------
>
>                 Key: IMPALA-5093
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5093
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Distributed Exec
>    Affects Versions: Impala 2.9.0
>            Reporter: Henry Robinson
>            Assignee: Henry Robinson
>            Priority: Critical
>             Fix For: Impala 2.10.0
>
>
> KRPC sometimes hits this {{DCHECK}}
> https://github.com/henryr/Impala/blob/krpc/be/src/runtime/row-batch.cc#L108
> which indicates that {{Lz4Compress::ProcessBlock}} has failed to decompress the incoming row batch. Not much clarity about how this happens yet.
> Stack trace:
> {code}
> 6  0x0000000002c7598e in google::LogMessageFatal::~LogMessageFatal() ()
> #7  0x00000000017914ba in impala::RowBatch::RowBatch (this=0x3d8af3c0, row_desc=..., input_batch=..., mem_tracker=0x13ad1c80) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/runtime/row-batch.cc:108
> #8  0x000000000174c655 in impala::DataStreamRecvr::SenderQueue::AddBatch (this=0xc962800, payload=...) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/runtime/data-stream-recvr.cc:210
> #9  0x000000000174e13a in impala::DataStreamRecvr::AddBatch (this=0xcdda580, payload=...) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/runtime/data-stream-recvr.cc:352
> #10 0x000000000173f076 in impala::DataStreamMgr::AddData (this=0xe4a0b20, fragment_instance_id=..., payload=...) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/runtime/data-stream-mgr.cc:190
> #11 0x00000000018e8c63 in impala::DataStreamService::TransmitData (this=0xdb357c0, request=0x4338c3f0, response=0xd802c00, context=0x11d27b60) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/service/impala-internal-service.cc:77
> #12 0x00000000018ed74e in _ZZN6impala19DataStreamServiceIfC4ERK13scoped_refptrIN4kudu12MetricEntityEERKS1_INS2_3rpc13ResultTrackerEEENKUlPKN6google8protobuf7MessageEPSE_PNS7_10RpcContextEE0_clESG_SH_SJ_ ()
>     at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/service/impala_internal_service.service.cc:157
> #13 0x00000000018eff3b in std::_Function_handler<void(const google::protobuf::Message*, google::protobuf::Message*, kudu::rpc::RpcContext*), impala::DataStreamServiceIf::DataStreamServiceIf(const scoped_refptr<kudu::MetricEntity>&, const scoped_refptr<kudu::rpc::ResultTracker>&)::<lambda(const google::protobuf::Message*, google::protobuf::Message*, kudu::rpc::RpcContext*)> >::_M_invoke(const std::_Any_data &, const google::protobuf::Message *, google::protobuf::Message *, kudu::rpc::RpcContext *) (__functor=..., __args#0=0x4338c3f0,
>     __args#1=0xd802c00, __args#2=0x11d27b60) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/toolchain/gcc-4.9.2/include/c++/4.9.2/functional:2039
> #14 0x0000000001d9fcb4 in std::function<void(const google::protobuf::Message*, google::protobuf::Message*, kudu::rpc::RpcContext*)>::operator()(const google::protobuf::Message *, google::protobuf::Message *, kudu::rpc::RpcContext *) const (this=0xeb320b8,
>     __args#0=0x4338c3f0, __args#1=0xd802c00, __args#2=0x11d27b60) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/toolchain/gcc-4.9.2/include/c++/4.9.2/functional:2439
> #15 0x0000000001d9f6b7 in kudu::rpc::GeneratedServiceIf::Handle (this=0xdb357c0, call=0xcf37480) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/kudu/rpc/service_if.cc:134
> #16 0x00000000016abfb8 in impala::ImpalaServicePool::RunThread (this=0xe85ac80) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/rpc/impala-service-pool.cc:130
> #17 0x00000000016ab5db in impala::ImpalaServicePool::<lambda()>::operator()(void) const (__closure=0x7f5e11a86be8) at /data/jenkins/workspace/impala-private-build-binaries/repos/Impala/be/src/rpc/impala-service-pool.cc:68
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)