You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "zhangsong (JIRA)" <ji...@apache.org> on 2016/12/08 06:47:59 UTC
[jira] [Comment Edited] (KUDU-1794) kuduScanner 's problem causing
impala crash.
[ https://issues.apache.org/jira/browse/KUDU-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15731317#comment-15731317 ]
zhangsong edited comment on KUDU-1794 at 12/8/16 6:47 AM:
----------------------------------------------------------
Yes,this issue map to impala-4334
will try to write a test to reproduce this , (as you know, this issue didn't show up every time.
will pay more attention to it).
was (Author: brucesz):
Yes,this issue map to impala-4334
> kuduScanner 's problem causing impala crash.
> --------------------------------------------
>
> Key: KUDU-1794
> URL: https://issues.apache.org/jira/browse/KUDU-1794
> Project: Kudu
> Issue Type: Bug
> Reporter: zhangsong
>
> Sometimes impalad of my cluster will crash , after study the core file, i found it is the null pointer of data field in ScanResponsePB causing the impalad's crash.
> So i modified a little in "NextBatch" in client.cc
> "
> if (data_->data_in_open_) {
> // We have data from a previous scan.
> VLOG(1) << "Extracting data from scan " << ToString();
> data_->data_in_open_ = false;
> auto scan_response_data_ptr = data_->last_response_.release_data();
> if (PREDICT_FALSE(scan_response_data_ptr == nullptr)) {
> return Status::Corruption(Substitute("Kudu scanner against $0 is in open status,but scan resp has no data.Scan query: $1.Remote: $2",
> data_->table_->name(),data_->configuration()
> .spec().ToString(*data_->table_->schema().schema_),
> data_->ts_->ToString(),
> data_->last_response_.DebugString()));
> "
> Also some modifications in impala part of code:
> "
> if (UNLIKELY(!status.ok())) {
> LOG(ERROR) <<"KuduScanner::GetNextScannerBatch ERROR["<< status.ToString() << "]";
> KUDU_RETURN_IF_ERROR(status, "unable to advance kudu iterator");
> }
> "
> After these modifications i found these errors in impalad's log:
> "E1124 11:46:50.780480 15613 kudu-scanner.cc:422] KuduScanner::GetNextScannerBatch ERROR[Timed out: Scan RPC to 172.22.99.57:7050 timed out after 180.000s]
> "
> and
> "E1124 11:49:24.171380 16127 kudu-scanner.cc:422] KuduScanner::GetNextScannerBatch ERROR[Timed out: Scan RPC to 172.22.99.57:7050 timed out after 164.164s: Remote error: Service unavailable: Scan request on kudu.tserver.TabletServerService from 172.22.99.57:64537 dropped due to backpressure. The service queue is full; it has 150 items.]
> "
> and
> "E1124 11:49:24.171378 16128 kudu-scanner.cc:422] KuduScanner::GetNextScannerBatch ERROR[Timed out: Scan RPC to 172.22.99.57:7050 timed out after 121.593s: Not found: Scanner not found]"
> It seems that there are various reason causing the null pointer of data field of ScanResponsePB , but impalad has no way of knowing them.
> May be last_response_.has_more_results() should return false when this exception happens?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)