You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "zhangsong (JIRA)" <ji...@apache.org> on 2016/12/08 06:47:59 UTC

[jira] [Comment Edited] (KUDU-1794) kuduScanner 's problem causing impala crash.

    [ https://issues.apache.org/jira/browse/KUDU-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15731317#comment-15731317 ] 

zhangsong edited comment on KUDU-1794 at 12/8/16 6:47 AM:
----------------------------------------------------------

Yes,this issue map to impala-4334
will try to write a test to reproduce this , (as you know, this issue didn't show up every time.
will pay more attention to it).


was (Author: brucesz):
Yes,this issue map to impala-4334

> kuduScanner 's problem causing impala crash.
> --------------------------------------------
>
>                 Key: KUDU-1794
>                 URL: https://issues.apache.org/jira/browse/KUDU-1794
>             Project: Kudu
>          Issue Type: Bug
>            Reporter: zhangsong
>
> Sometimes impalad of my cluster will crash , after study the core file, i found it is the null pointer of data field in ScanResponsePB causing the impalad's crash.
> So i modified a little in "NextBatch" in client.cc
> "
> if (data_->data_in_open_) {
>     // We have data from a previous scan.
>     VLOG(1) << "Extracting data from scan " << ToString();
>     data_->data_in_open_ = false;
>     auto scan_response_data_ptr = data_->last_response_.release_data();
>     if (PREDICT_FALSE(scan_response_data_ptr == nullptr))  {
>           return Status::Corruption(Substitute("Kudu scanner against $0 is in open status,but scan resp has no data.Scan query: $1.Remote: $2",
>                                             data_->table_->name(),data_->configuration()
>                                                                         .spec().ToString(*data_->table_->schema().schema_),
>                                             data_->ts_->ToString(),
>                                             data_->last_response_.DebugString()));
> "
> Also some modifications in impala part of code:
> "
>   if (UNLIKELY(!status.ok())) {
>     LOG(ERROR) <<"KuduScanner::GetNextScannerBatch ERROR["<< status.ToString() << "]";
>     KUDU_RETURN_IF_ERROR(status, "unable to advance kudu iterator");
>   }
> "
> After these modifications i found these errors in impalad's log:
> "E1124 11:46:50.780480 15613 kudu-scanner.cc:422] KuduScanner::GetNextScannerBatch ERROR[Timed out: Scan RPC to 172.22.99.57:7050 timed out after 180.000s]
> "
> and
> "E1124 11:49:24.171380 16127 kudu-scanner.cc:422] KuduScanner::GetNextScannerBatch ERROR[Timed out: Scan RPC to 172.22.99.57:7050 timed out after 164.164s: Remote error: Service unavailable: Scan request on kudu.tserver.TabletServerService from 172.22.99.57:64537 dropped due to backpressure. The service queue is full; it has 150 items.]
> "
> and
> "E1124 11:49:24.171378 16128 kudu-scanner.cc:422] KuduScanner::GetNextScannerBatch ERROR[Timed out: Scan RPC to 172.22.99.57:7050 timed out after 121.593s: Not found: Scanner not found]"
> It seems that there are various reason causing the null pointer of data field of ScanResponsePB , but impalad has no way of knowing them.
> May be last_response_.has_more_results() should return false when this exception happens?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)