You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Alexey Serbin (Jira)" <ji...@apache.org> on 2022/10/13 17:33:00 UTC

[jira] [Updated] (KUDU-3406) CompactRowSetsOp can allocate much more memory than specified by the hard memory limit

     [ https://issues.apache.org/jira/browse/KUDU-3406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alexey Serbin updated KUDU-3406:
--------------------------------
    Labels: compaction stability  (was: )

> CompactRowSetsOp can allocate much more memory than specified by the hard memory limit
> --------------------------------------------------------------------------------------
>
>                 Key: KUDU-3406
>                 URL: https://issues.apache.org/jira/browse/KUDU-3406
>             Project: Kudu
>          Issue Type: Bug
>          Components: master, tserver
>    Affects Versions: 1.13.0, 1.14.0, 1.15.0, 1.16.0
>            Reporter: Alexey Serbin
>            Assignee: Ashwani Raina
>            Priority: Critical
>              Labels: compaction, stability
>         Attachments: 270.svg, 283.svg, 296.svg, 308.svg, 332.svg, 344.svg, fs_list.before
>
>
> In some scenarios, rowsets can accumulate a lot of data, so {{kudu-master}} and {{kudu-tserver}} processes grow far beyond the hard memory limit (controlled by the {{\-\-memory_limit_hard_bytes}} flag) when running CompactRowSetsOp.  In some cases, a Kudu server process consumes all the available memory, so that the OS might invoke the OOM killer.
> At this point I'm not yet sure about the exact versions affected, and what leads to accumulating so much data in flushed rowsets, but I know that 1.13, 1.14, 1.15 and 1.16 are  affected.  It's also not clear whether the actual regression is in allowing the flushed rowsets growing that big.
> There is a reproduction scenario for this bug with {{kudu-master}} using the real data from the fields.  With that data, {{kudu fs list}} reveals a rowset with many UNDOs: see the attached {{fs_list.before}} file.  When starting {{kudu-master}} with the data, the process memory usage eventually peaked with about 25GByte of RSS while running CompactRowSetsOp.
> I also attached several SVG files generated by the TCMalloc's pprof from the memory profile snapshots output by {{kudu-master}} when configured to dump allocation stats every 512 MBytes.  I generated the SVG reports for profiles attributed to the highest memory usage:
> {noformat}
> Dumping heap profile to /opt/tmp/master/nn1/profile.0270.heap (24573 MB currently in use)
> Dumping heap profile to /opt/tmp/master/nn1/profile.0283.heap (64594 MB allocated cumulatively, 13221 MB currently in use)
> Dumping heap profile to /opt/tmp/master/nn1/profile.0296.heap (77908 MB allocated cumulatively, 12110 MB currently in use)
> Dumping heap profile to /opt/tmp/master/nn1/profile.0308.heap (90197 MB allocated cumulatively, 12406 MB currently in use)
> Dumping heap profile to /opt/tmp/master/nn1/profile.0332.heap (114775 MB allocated cumulatively, 23884 MB currently in use)
> Dumping heap profile to /opt/tmp/master/nn1/profile.0344.heap (127064 MB allocated cumulatively, 12648 MB currently in use)
> {noformat}
> The report from the compaction doesn't look like anything extraordinary (except for the duration):
> {noformat}
> I20221012 10:45:49.684247 101750 maintenance_manager.cc:603] P 68dbea0ec022440d9fc282099a8656cb: CompactRowSetsOp(00000000000000000000000000000000) complete. Timing: real 522.617s     user 471.783s   sys 46.588s Metrics: {"bytes_written":1665145,"cfile_cache_hit":846,"cfile_cache_hit_bytes":14723646,"cfile_cache_miss":1786556,"cfile_cache_miss_bytes":4065589152,"cfile_init":7,"delta_iterators_relevant":1558,"dirs.queue_time_us":220086,"dirs.run_cpu_time_us":89219,"dirs.run_wall_time_us":89163,"drs_written":1,"fdatasync":15,"fdatasync_us":150709,"lbm_read_time_us":11120726,"lbm_reads_1-10_ms":1,"lbm_reads_lt_1ms":1786583,"lbm_write_time_us":14120016,"lbm_writes_1-10_ms":3,"lbm_writes_lt_1ms":894069,"mutex_wait_us":108,"num_input_rowsets":5,"rows_written":4043,"spinlock_wait_cycles":14720,"thread_start_us":741,"threads_started":9,"wal-append.queue_time_us":307}
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)