You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "YifanZhang (Jira)" <ji...@apache.org> on 2021/05/26 11:12:00 UTC
[jira] [Commented] (KUDU-2064) Overall log cache usage doesn't
respect the limit
[ https://issues.apache.org/jira/browse/KUDU-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17351716#comment-17351716 ]
YifanZhang commented on KUDU-2064:
----------------------------------
I also found actual log cache usage exceeded the log_cache_size_limit/global_log_cache_limit in a tserver's mem-tracker page(kudu version1.12.0):
||Id
||Parent
||Limit
||Current Consumption
||Peak Consumption
||
|root|none|none|44.97G|76.44G|
|block_cache-sharded_lru_cache|root|none|40.01G|40.02G|
|server|root|none|2.50G|26.29G|
|log_cache|root|1.00G|2.46G|10.89G|
|log_cache:adbee30f32664a48bc24f80b1e53d425:cbcc9aa7ac9c4167a7ba0b540c95c83a|log_cache|128.00M|854.01M|858.10M|
|log_cache:adbee30f32664a48bc24f80b1e53d425:4b2cbe4fd0d64e7d998a8abddbc1fb47|log_cache|128.00M|793.87M|794.58M|
|log_cache:adbee30f32664a48bc24f80b1e53d425:ea0d65bc2f384757b2259a19829fab9c|log_cache|128.00M|254.86M|429.48M|
|log_cache:adbee30f32664a48bc24f80b1e53d425:65065df878a64d1bae52fcd0bf6a2e45|log_cache|128.00M|215.48M|392.56M|
But the tablet that consumes largest log cache is TOMBSTONED, I'm not sure if the cache is actually occupied or the MemTracker is not updated.
I also saw some kernel_stack_watchdog traces in the log:
{code:java}
W0526 11:35:35.414122 27289 kernel_stack_watchdog.cc:198] Thread 190027 stuck at /home/zhangyifan8/work/kudu-xm/src/kudu/consensus/log.cc:405 for 118ms:
Kernel stack:
[<ffffffff810f8d36>] futex_wait_queue_me+0xc6/0x130
[<ffffffff810f9a1b>] futex_wait+0x17b/0x280
[<ffffffff810fb756>] do_futex+0x106/0x5a0
[<ffffffff810fbc70>] SyS_futex+0x80/0x180
[<ffffffff816c0715>] system_call_fastpath+0x1c/0x21
[<ffffffffffffffff>] 0xffffffffffffffff
User stack:
@ 0x7fe923e72370 (unknown)
@ 0x2318d54 kudu::RowOperationsPB::~RowOperationsPB()
@ 0x20d0300 kudu::tserver::WriteRequestPB::SharedDtor()
@ 0x20d37a8 kudu::tserver::WriteRequestPB::~WriteRequestPB()
@ 0x2095703 kudu::consensus::ReplicateMsg::SharedDtor()
@ 0x209b038 kudu::consensus::ReplicateMsg::~ReplicateMsg()
@ 0xc3d617 kudu::consensus::LogCache::EvictSomeUnlocked()
@ 0xc3e052 _ZNSt17_Function_handlerIFvRKN4kudu6StatusEEZNS0_9consensus8LogCache16AppendOperationsERKSt6vectorI13scoped_refptrINS5_19RefCountedReplicateEESaISA_EERKSt8functionIS4_EEUlS3_E_E9_M_invokeERKSt9_Any_dataS3_
@ 0xc89ea9 kudu::log::Log::AppendThread::HandleBatches()
@ 0xc8a7ad kudu::log::Log::AppendThread::ProcessQueue()
@ 0x2295cfe kudu::ThreadPool::DispatchThread()
@ 0x228ecaf kudu::Thread::SuperviseThread()
@ 0x7fe923e6adc5 start_thread
@ 0x7fe92214c73d __clone
{code}
This often happens when there is a large number of write requests and results in slow writes.
> Overall log cache usage doesn't respect the limit
> -------------------------------------------------
>
> Key: KUDU-2064
> URL: https://issues.apache.org/jira/browse/KUDU-2064
> Project: Kudu
> Issue Type: Bug
> Components: log
> Affects Versions: 1.4.0
> Reporter: Jean-Daniel Cryans
> Priority: Major
> Labels: data-scalability
>
> Looking at a fairly loaded machine (10TB of data in LBM, close to 10k tablets), I can see in the mem-trackers page that the log cache is using 1.83GB, that it peaked at 2.82GB, with a 1GB limit. It's consistent on other similarly loaded tservers. It's unexpected.
> Looking at the per-tablet breakdown, they all have between 0 and a handful of MBs.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)