You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Alexey Serbin (Jira)" <ji...@apache.org> on 2019/10/18 18:07:00 UTC

[jira] [Commented] (KUDU-2978) NVM-based cache test scenario in cfile-test crashes on CentOS6

    [ https://issues.apache.org/jira/browse/KUDU-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954878#comment-16954878 ] 

Alexey Serbin commented on KUDU-2978:
-------------------------------------

With [ad-hoc tracing|http://example.com] https://github.com/alexeyserbin/kudu/commit/659a77e15675857ce5eaa2379f8f1e0cba5a4f95] the crash looks like the following:

{noformat}
E1018 10:58:24.903393 68051 nvm_cache.cc:599] BEFORE AllocateAndRetry: key_len: 16 val_len: 66027 key_len + val_len + sizeof(LRUHandle): 66107
E1018 10:58:24.903409 68051 nvm_cache.cc:606] AFTER AllocateAndRetry: non-NULL
E1018 10:58:24.903416 68051 nvm_cache.cc:609] BEFORE setting data: buf addr 7fecdec1ad80
E1018 10:58:24.903424 68051 nvm_cache.cc:612] SETTING data 1
*** Aborted at 1571421504 (unix time) try "date -d @1571421504" if you are using GNU date ***
PC: @     0x7fed20f5bc02 kudu::(anonymous namespace)::ShardedLRUCache::Allocate()
*** SIGSEGV (@0x7fecdec1adb8) received by PID 68051 (TID 0x7fed1bf48040) from PID 18446744073151819192; stack trace: ***
    @       0x3ae0e0f710 (unknown) at ??:0
    @     0x7fed20f5bc02 kudu::(anonymous namespace)::ShardedLRUCache::Allocate() at ??:0
    @     0x7fed214aec0f kudu::Cache::Allocate() at ??:0
    @     0x7fed214ae5c9 kudu::cfile::BlockCache::Allocate() at ??:0
    @     0x7fed214bd6f0 kudu::cfile::(anonymous namespace)::ScratchMemory::TryAllocateFromCache() at ??:0
    @     0x7fed214bdfde kudu::cfile::CFileReader::ReadBlock() at ??:0
    @     0x7fed214c2036 kudu::cfile::CFileIterator::ReadCurrentDataBlock() at ??:0
    @     0x7fed214c285c kudu::cfile::CFileIterator::QueueCurrentDataBlock() at ??:0
    @     0x7fed214c2efa kudu::cfile::CFileIterator::PrepareBatch() at ??:0
    @     0x7fed214c51a6 kudu::cfile::CFileIterator::CopyNextValues() at ??:0
    @           0x49765e kudu::cfile::TestCFile::TestReadWriteStrings() at /data/8/aserbin/Projects/kudu/src/kudu/cfile/cfile-test.cc:721
    @           0x498913 kudu::cfile::TestCFileBothCacheMemoryTypes_TestReadWriteLargeStrings_Test::TestBody() at /data/8/aserbin/Projects/kudu/src/kudu/cfile/cfile-test.cc:751
    @     0x7fed211aeb98 testing::internal::HandleExceptionsInMethodIfSupported<>() at ??:0
    @     0x7fed2119c1b2 testing::Test::Run() at ??:0
    @     0x7fed2119c2f8 testing::TestInfo::Run() at ??:0
    @     0x7fed2119c3d5 testing::TestCase::Run() at ??:0
    @     0x7fed211a2ed8 testing::internal::UnitTestImpl::RunAllTests() at ??:0
    @     0x7fed211af0a8 testing::internal::HandleExceptionsInMethodIfSupported<>() at ??:0
    @     0x7fed2119c4ad testing::UnitTest::Run() at ??:0
    @     0x7fed219aaf7f RUN_ALL_TESTS() at ??:0
    @     0x7fed219a8d90 main at ??:0
    @       0x3ae0a1ed5d __libc_start_main at ??:0
    @           0x493019 (unknown) at ??:0

{noformat}

> NVM-based cache test scenario in cfile-test crashes on CentOS6
> --------------------------------------------------------------
>
>                 Key: KUDU-2978
>                 URL: https://issues.apache.org/jira/browse/KUDU-2978
>             Project: Kudu
>          Issue Type: Bug
>    Affects Versions: 1.10.0
>            Reporter: Alexey Serbin
>            Assignee: Alexey Serbin
>            Priority: Major
>
> {{TestCFileBothCacheMemoryTypes.TestReadWriteLargeStrings}} started to crash with SIGSEGV pretty often if built and run on CentOS6.
> On other platforms that doesn't happen.
> To reproduce, run
> {noformat}
> ./bin/cfile-test --gtest_filter='*CacheMemoryTypes/TestCFileBothCacheMemoryTypes.TestReadWriteLargeStrings/1'
> {noformat}
> The stack trace looks like the following:
> {noformat}
> *** Aborted at 1571371818 (unix time) try "date -d @1571371818" if you are using GNU date ***
> PC: @     0x7f020c89a719 kudu::(anonymous namespace)::ShardedLRUCache::Allocate()
> *** SIGSEGV (@0x7f01e424a4f8) received by PID 59228 (TID 0x7f020787f040) from PID 18446744073242191096; stack trace: ***
>     @       0x3ae0e0f710 (unknown)
>     @     0x7f020c89a719 kudu::(anonymous namespace)::ShardedLRUCache::Allocate()
>     @     0x7f020cdf36a9 kudu::Cache::Allocate()
>     @     0x7f020cdf2f3e kudu::cfile::BlockCache::Allocate()
>     @     0x7f020ce029aa kudu::cfile::(anonymous namespace)::ScratchMemory::TryAllocateFromCache()
>     @     0x7f020ce03298 kudu::cfile::CFileReader::ReadBlock()
>     @     0x7f020ce072f0 kudu::cfile::CFileIterator::ReadCurrentDataBlock()
>     @     0x7f020ce07b16 kudu::cfile::CFileIterator::QueueCurrentDataBlock()
>     @     0x7f020ce081b4 kudu::cfile::CFileIterator::PrepareBatch()
>     @     0x7f020ce0a460 kudu::cfile::CFileIterator::CopyNextValues()
>     @           0x498b1e kudu::cfile::TestCFile::TestReadWriteStrings()
>     @           0x499dd3 kudu::cfile::TestCFileBothCacheMemoryTypes_TestReadWriteLargeStrings_Test::TestBody()
>     @     0x7f020caeeb98 testing::internal::HandleExceptionsInMethodIfSupported<>()
>     @     0x7f020cadc1b2 testing::Test::Run()
>     @     0x7f020cadc2f8 testing::TestInfo::Run()
>     @     0x7f020cadc3d5 testing::TestCase::Run()
>     @     0x7f020cae2ed8 testing::internal::UnitTestImpl::RunAllTests()
>     @     0x7f020caef0a8 testing::internal::HandleExceptionsInMethodIfSupported<>()
>     @     0x7f020cadc4ad testing::UnitTest::Run()
>     @     0x7f020d2f0f7f RUN_ALL_TESTS()
>     @     0x7f020d2eed90 main
>     @       0x3ae0a1ed5d __libc_start_main
>     @           0x4944d9 (unknown)
> Segmentation fault (core dumped)
> {noformat}
> The suspects were few changelists:
> * {{5e3af4e2ae45ee5f700b9c6c28d56ff84ffeb319}}: [util] modernize signature of Cache interface methods
> * {{74a1d7706d99db2d9a14ed5d7c64afbcef853b20}}: [util] change return type of Cache::Allocate()
> * {{946e2bc05419e3a552fed5a9d28e83861ff1eea1}}: KUDU-2605: replace nvml with memkind
> I reverted the first two (one-by-one), but the issue is still there.
> The test built with code right before the third changelist (i.e. at snapshot of revision  {{8410f0ca44e17ef6242cc9b25da49b568ddb0955}}) doesn't crash.
> The code with reverts of first two is in: https://github.com/alexeyserbin/kudu/commits/nvm-cache-crash



--
This message was sent by Atlassian Jira
(v8.3.4#803005)