You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "YifanZhang (Jira)" <ji...@apache.org> on 2021/05/26 11:12:00 UTC
[jira] [Commented] (KUDU-2064) Overall log cache usage doesn't respect the limit

    [ https://issues.apache.org/jira/browse/KUDU-2064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17351716#comment-17351716 ] 

YifanZhang commented on KUDU-2064:
----------------------------------

I also found actual log cache usage exceeded the log_cache_size_limit/global_log_cache_limit in a tserver's mem-tracker page(kudu version1.12.0):
||Id
 ||Parent
 ||Limit
 ||Current Consumption
 ||Peak Consumption
 ||
|root|none|none|44.97G|76.44G|
|block_cache-sharded_lru_cache|root|none|40.01G|40.02G|
|server|root|none|2.50G|26.29G|
|log_cache|root|1.00G|2.46G|10.89G|
|log_cache:adbee30f32664a48bc24f80b1e53d425:cbcc9aa7ac9c4167a7ba0b540c95c83a|log_cache|128.00M|854.01M|858.10M|
|log_cache:adbee30f32664a48bc24f80b1e53d425:4b2cbe4fd0d64e7d998a8abddbc1fb47|log_cache|128.00M|793.87M|794.58M|
|log_cache:adbee30f32664a48bc24f80b1e53d425:ea0d65bc2f384757b2259a19829fab9c|log_cache|128.00M|254.86M|429.48M|
|log_cache:adbee30f32664a48bc24f80b1e53d425:65065df878a64d1bae52fcd0bf6a2e45|log_cache|128.00M|215.48M|392.56M|

But the tablet that consumes largest log cache is TOMBSTONED, I'm not sure if the cache is actually occupied or the MemTracker is not updated.

I also saw some kernel_stack_watchdog traces in the log:
{code:java}
W0526 11:35:35.414122 27289 kernel_stack_watchdog.cc:198] Thread 190027 stuck at /home/zhangyifan8/work/kudu-xm/src/kudu/consensus/log.cc:405 for 118ms:
Kernel stack:
[<ffffffff810f8d36>] futex_wait_queue_me+0xc6/0x130
[<ffffffff810f9a1b>] futex_wait+0x17b/0x280
[<ffffffff810fb756>] do_futex+0x106/0x5a0
[<ffffffff810fbc70>] SyS_futex+0x80/0x180
[<ffffffff816c0715>] system_call_fastpath+0x1c/0x21
[<ffffffffffffffff>] 0xffffffffffffffff

User stack:
    @     0x7fe923e72370  (unknown)
    @          0x2318d54  kudu::RowOperationsPB::~RowOperationsPB()
    @          0x20d0300  kudu::tserver::WriteRequestPB::SharedDtor()
    @          0x20d37a8  kudu::tserver::WriteRequestPB::~WriteRequestPB()
    @          0x2095703  kudu::consensus::ReplicateMsg::SharedDtor()
    @          0x209b038  kudu::consensus::ReplicateMsg::~ReplicateMsg()
    @           0xc3d617  kudu::consensus::LogCache::EvictSomeUnlocked()
    @           0xc3e052  _ZNSt17_Function_handlerIFvRKN4kudu6StatusEEZNS0_9consensus8LogCache16AppendOperationsERKSt6vectorI13scoped_refptrINS5_19RefCountedReplicateEESaISA_EERKSt8functionIS4_EEUlS3_E_E9_M_invokeERKSt9_Any_dataS3_
    @           0xc89ea9  kudu::log::Log::AppendThread::HandleBatches()
    @           0xc8a7ad  kudu::log::Log::AppendThread::ProcessQueue()
    @          0x2295cfe  kudu::ThreadPool::DispatchThread()
    @          0x228ecaf  kudu::Thread::SuperviseThread()
    @     0x7fe923e6adc5  start_thread
    @     0x7fe92214c73d  __clone
{code}
This often happens when there is a large number of write requests and results in slow writes.

 

> Overall log cache usage doesn't respect the limit
> -------------------------------------------------
>
>                 Key: KUDU-2064
>                 URL: https://issues.apache.org/jira/browse/KUDU-2064
>             Project: Kudu
>          Issue Type: Bug
>          Components: log
>    Affects Versions: 1.4.0
>            Reporter: Jean-Daniel Cryans
>            Priority: Major
>              Labels: data-scalability
>
> Looking at a fairly loaded machine (10TB of data in LBM, close to 10k tablets), I can see in the mem-trackers page that the log cache is using 1.83GB, that it peaked at 2.82GB, with a 1GB limit. It's consistent on other similarly loaded tservers. It's unexpected.
> Looking at the per-tablet breakdown, they all have between 0 and a handful of MBs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)