You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2020/09/22 06:02:00 UTC

[jira] [Commented] (KUDU-2233) Check failure during compactions: pv_delete_redo != nullptr

    [ https://issues.apache.org/jira/browse/KUDU-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17199833#comment-17199833 ] 

ASF subversion and git services commented on KUDU-2233:
-------------------------------------------------------

Commit fcceb8b1a20afff30e15b6248a56ab3e06b61e79 in kudu's branch refs/heads/master from Andrew Wong
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=fcceb8b ]

KUDU-3191: fail replicas when KUDU-2233 is detected

Despite the longstanding fixes that stop bad KUDU-2233 compactions,
users still see the results of already corrupted data, particularly when
upgrading to newer versions that may compact more aggressively than
older versions.

Rather than crashing when hitting a KUDU-2233 failure, this patch
updates the behavior to fail the replica. Similar to disk failures or
CFile checksum corruption, this will trigger re-replication to happen,
and eviction will only happen if there is a healthy majority.

The hope is that fewer users will see this corruption cause problems, as
the corruption will henceforth not crash servers, and only tablets with
a majority corrupted will be unavailable.

Change-Id: I43570b961dfd5eb8518328121585255d32cf2ebb
Reviewed-on: http://gerrit.cloudera.org:8080/16471
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin <as...@cloudera.com>


> Check failure during compactions: pv_delete_redo != nullptr
> -----------------------------------------------------------
>
>                 Key: KUDU-2233
>                 URL: https://issues.apache.org/jira/browse/KUDU-2233
>             Project: Kudu
>          Issue Type: Bug
>          Components: tablet, tserver
>    Affects Versions: 1.4.0
>            Reporter: Andrew Wong
>            Assignee: Andrew Wong
>            Priority: Major
>             Fix For: 1.7.0
>
>
> There have been a couple of reports of a check failure during compactions at least from 1.4, pasted below:
> {noformat}
> F1201 14:55:37.052140 10508 compaction.cc:756] Check failed: pv_delete_redo != nullptr
>  * 
>  ** 
>  *** Check failure stack trace: ***
>  Wrote minidump to /var/log/kudu/minidumps/kudu-tserver/215cde39-7795-0885-0b51038d-771d875e.dmp
>  *** Aborted at 1512161737 (unix time) try "date -d @1512161737" if you are using GNU date ***
>  PC: @ 0x3ec3632625 (unknown)
>  *** SIGABRT (@0x3b98eec0000028e3) received by PID 10467 (TID 0x7f8b02c58700) from PID 10467; stack trace: ***
>  @ 0x3ec3a0f7e0 (unknown)
>  @ 0x3ec3632625 (unknown)
>  @ 0x3ec3633e05 (unknown)
>  @ 0x1b53f59 (unknown)
>  @ 0x8b9f6d google::LogMessage::Fail()
>  @ 0x8bbe2d google::LogMessage::SendToLog()
>  @ 0x8b9aa9 google::LogMessage::Flush()
>  @ 0x8bc8cf google::LogMessageFatal::~LogMessageFatal()
>  @ 0x9db0fe kudu::tablet::FlushCompactionInput()
>  @ 0x9a056a kudu::tablet::Tablet::DoMergeCompactionOrFlush()
>  @ 0x9a372d kudu::tablet::Tablet::Compact()
>  @ 0x9bd8d1 kudu::tablet::CompactRowSetsOp::Perform()
>  @ 0x1b4145f kudu::MaintenanceManager::LaunchOp()
>  @ 0x1b8da06 kudu::ThreadPool::DispatchThread()
>  @ 0x1b888ea kudu::Thread::SuperviseThread()
>  @ 0x3ec3a07aa1 (unknown)
>  @ 0x3ec36e893d (unknown)
>  @ 0x0 (unknown)}}
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)