You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "DawnZhang (JIRA)" <ji...@apache.org> on 2016/11/29 09:56:58 UTC

[jira] [Comment Edited] (KUDU-1762) suspected tablet memory leak

    [ https://issues.apache.org/jira/browse/KUDU-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15704818#comment-15704818 ] 

DawnZhang edited comment on KUDU-1762 at 11/29/16 9:56 AM:
-----------------------------------------------------------

add more detail here (the same server as [~cfreely] posted in description)

about every 7 days we hit the soft memory limit (see cloudera manager's picture in attachment)

then tablet server got error log like this:

{code}
./kudu-tserver.data01.invalid-user.log.WARNING.20161121-112841.27176:W1128 20:50:31.423696 15822 consensus_peers.cc:332] T 7c07295dc21a4119814ef4f8f9b73e31 P 0c79993cd5504785a68f07c52463a4dc -> Peer 61f5bb2bca2b46c8bffc5394eba5ec03 (data03.ymtcluster.sa:7050): Couldn't send request to peer 61f5bb2bca2b46c8bffc5394eba5ec03 for tablet 7c07295dc21a4119814ef4f8f9b73e31. Status: Remote error: Service unavailable: Soft memory limit exceeded (at 70.51% of capacity). Retrying in the next heartbeat period. Already tried 1 times.
{code}

then client got error like this (jdbc):

{code}
SELECT COUNT(1) AS count FROM `event_wos_p1_42`

java.sql.SQLException:
Unable to open scanner: Timed out: Client connection negotiation failed: client connection to 10.9.78.115:7050: Timeout exceeded waiting to connect: Network error: Client connection negotiation failednect: Connection refused (error 111)
{code}

our memory configuration:

memory_limit_hard_bytes = 4GB
block_cache_capacity_mb = 512MB

we deploy kudu on many different clusters (all running similar workload, all centos 6.4-6.7) but only this one has memory problem.
now we restart this cluster's kudu about every 7 days.

our typical workload is like this:

create a new table with about 600 columns every 1 hour.
insert some data in it ( about 100 millions to 500 million rows)
drop some old tables (left only about 5 tables not dropped)


was (Author: dawn110110):
add more detail here (the same server as [~cfreely] posted in description)

about every 7 days we hit the soft memory limit (see cloudera manager's picture in attachment)

then tablet server got error log like this:

./kudu-tserver.data01.invalid-user.log.WARNING.20161121-112841.27176:W1128 20:50:31.423696 15822 consensus_peers.cc:332] T 7c07295dc21a4119814ef4f8f9b73e31 P 0c79993cd5504785a68f07c52463a4dc -> Peer 61f5bb2bca2b46c8bffc5394eba5ec03 (data03.ymtcluster.sa:7050): Couldn't send request to peer 61f5bb2bca2b46c8bffc5394eba5ec03 for tablet 7c07295dc21a4119814ef4f8f9b73e31. Status: Remote error: Service unavailable: Soft memory limit exceeded (at 70.51% of capacity). Retrying in the next heartbeat period. Already tried 1 times.

then client got error like this (jdbc):

SELECT COUNT(1) AS count FROM `event_wos_p1_42`

java.sql.SQLException:
Unable to open scanner: Timed out: Client connection negotiation failed: client connection to 10.9.78.115:7050: Timeout exceeded waiting to connect: Network error: Client connection negotiation failednect: Connection refused (error 111)

our memory configuration:

memory_limit_hard_bytes = 4GB
block_cache_capacity_mb = 512MB

we deploy kudu on many different clusters (all running similar workload, all centos 6.4-6.7) but only this one has memory problem.
now we restart this cluster's kudu about every 7 days.

our typical workload is like this:

create a new table with about 600 columns every 1 hour.
insert some data in it ( about 100 millions to 500 million rows)
drop some old tables (left only about 5 tables not dropped)

> suspected tablet memory leak
> ----------------------------
>
>                 Key: KUDU-1762
>                 URL: https://issues.apache.org/jira/browse/KUDU-1762
>             Project: Kudu
>          Issue Type: Bug
>          Components: tablet
>    Affects Versions: 1.0.1
>         Environment: CentOS 6.5
> Kudu 1.0.1 (rev e60b610253f4303b24d41575f7bafbc5d69edddb)
>            Reporter: Fu Lili
>            Priority: Critical
>         Attachments: 0B2CE7BB-EF26-4EA1-B824-3584D7D79256.png
>
>
> here is the memory total info:
> {quote}
> ------------------------------------------------
> MALLOC:     1691715680 ( 1613.3 MiB) Bytes in use by application
> MALLOC: +    178733056 (  170.5 MiB) Bytes in page heap freelist
> MALLOC: +     37483104 (   35.7 MiB) Bytes in central cache freelist
> MALLOC: +      4071488 (    3.9 MiB) Bytes in transfer cache freelist
> MALLOC: +     13739264 (   13.1 MiB) Bytes in thread cache freelists
> MALLOC: +     12202144 (   11.6 MiB) Bytes in malloc metadata
> MALLOC:   ------------
> MALLOC: =   1937944736 ( 1848.2 MiB) Actual memory used (physical + swap)
> MALLOC: +       311296 (    0.3 MiB) Bytes released to OS (aka unmapped)
> MALLOC:   ------------
> MALLOC: =   1938256032 ( 1848.5 MiB) Virtual address space used
> MALLOC:
> MALLOC:         174694              Spans in use
> MALLOC:            201              Thread heaps in use
> MALLOC:           8192              Tcmalloc page size
> ------------------------------------------------
> Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
> Bytes released to the OS take up virtual address space but no physical memory.
> {quote}
> but in memroy detail, sum of all the sub Current Consumption is far less than the to the root Current Consumption。
> ||Id||Parent||Limit||Current Consumption||Peak consumption||
> |root|none|4.00G|1.58G|1.74G|
> |log_cache|root|1.00G|480.8K|5.32M|
> |log_cache:0c79993cd5504785a68f07c52463a4dc:70c8d889b0314b04a240fcb02c24a012|log_cache|128.00M|160B|160B|
> |log_cache:0c79993cd5504785a68f07c52463a4dc:16d3c8193579445f8f766da6c7abc237|log_cache|128.00M|160B|160B|
> |log_cache:0c79993cd5504785a68f07c52463a4dc:2c69c5cb9eb04eb48323a9268afc36a7|log_cache|128.00M|160B|160B|
> |log_cache:0c79993cd5504785a68f07c52463a4dc:2b11d9220dab4a5f952c5b1c10a68ccd|log_cache|128.00M|69.2K|139.5K|
> |log_cache:0c79993cd5504785a68f07c52463a4dc:cec045be60af4f759497234d8815238b|log_cache|128.00M|68.6K|138.7K|
> |log_cache:0c79993cd5504785a68f07c52463a4dc:cea7a54cebd242e4997da641f5b32e3a|log_cache|128.00M|68.5K|139.3K|
> |log_cache:0c79993cd5504785a68f07c52463a4dc:9625dfde17774690a888b55024ac797a|log_cache|128.00M|68.5K|140.0K|
> |log_cache:0c79993cd5504785a68f07c52463a4dc:6046b33901ca43d0975f59cf7e491186|log_cache|128.00M|0B|133.0K|
> |log_cache:0c79993cd5504785a68f07c52463a4dc:1a18ab0915f0407b922fa7ecbe7a2f46|log_cache|128.00M|0B|132.6K|
> |log_cache:0c79993cd5504785a68f07c52463a4dc:ac54d1c1813a4e39943971cb56f248ef|log_cache|128.00M|0B|130.5K|
> |log_cache:0c79993cd5504785a68f07c52463a4dc:4438580df6cc4d469393b9d6adee68d8|log_cache|128.00M|0B|131.2K|
> |log_cache:0c79993cd5504785a68f07c52463a4dc:2f1cef7d2a494575b941baa22b8a3dc9|log_cache|128.00M|0B|131.6K|
> |log_cache:0c79993cd5504785a68f07c52463a4dc:d2ad22d202c04b2d98f1c5800df1c3b5|log_cache|128.00M|0B|132.5K|
> |log_cache:0c79993cd5504785a68f07c52463a4dc:b19b21d6b4c84f9895aad9e81559d019|log_cache|128.00M|0B|131.0K|
> |log_cache:0c79993cd5504785a68f07c52463a4dc:27e9531cd5814b1c9637493f05860b19|log_cache|128.00M|0B|131.1K|
> |log_cache:0c79993cd5504785a68f07c52463a4dc:425a19940239447faa0eaab4e380d644|log_cache|128.00M|68.5K|146.9K|
> |log_cache:0c79993cd5504785a68f07c52463a4dc:178bd7bc39a941a887f393b0a7848066|log_cache|128.00M|68.5K|139.9K|
> |log_cache:0c79993cd5504785a68f07c52463a4dc:91524acd28a440318918f11292ac8fdc|log_cache|128.00M|0B|132.0K|
> |log_cache:0c79993cd5504785a68f07c52463a4dc:be6f093aabf9460b97fc35dd026820b6|log_cache|128.00M|0B|130.4K|
> |log_cache:0c79993cd5504785a68f07c52463a4dc:dd8dd794f0f44426a3c46ce8f4b54652|log_cache|128.00M|0B|131.2K|
> |log_cache:0c79993cd5504785a68f07c52463a4dc:ed128ca7b19c4e3eaa48e9e3eb341492|log_cache|128.00M|68.5K|141.5K|
> |block_cache-sharded_lru_cache|root|none|257.05M|257.05M|
> |code_cache-sharded_lru_cache|root|none|112B|113B|
> |server|root|none|2.06M|121.97M|
> |tablet-70c8d889b0314b04a240fcb02c24a012|server|none|265B|265B|
> |txn_tracker|tablet-70c8d889b0314b04a240fcb02c24a012|64.00M|0B|0B|
> |MemRowSet-0|tablet-70c8d889b0314b04a240fcb02c24a012|none|265B|265B|
> |DeltaMemStores|tablet-70c8d889b0314b04a240fcb02c24a012|none|0B|0B|
> |tablet-16d3c8193579445f8f766da6c7abc237|server|none|265B|265B|
> |txn_tracker|tablet-16d3c8193579445f8f766da6c7abc237|64.00M|0B|0B|
> |MemRowSet-0|tablet-16d3c8193579445f8f766da6c7abc237|none|265B|265B|
> |DeltaMemStores|tablet-16d3c8193579445f8f766da6c7abc237|none|0B|0B|
> |tablet-2c69c5cb9eb04eb48323a9268afc36a7|server|none|265B|265B|
> |txn_tracker|tablet-2c69c5cb9eb04eb48323a9268afc36a7|64.00M|0B|0B|
> |MemRowSet-0|tablet-2c69c5cb9eb04eb48323a9268afc36a7|none|265B|265B|
> |DeltaMemStores|tablet-2c69c5cb9eb04eb48323a9268afc36a7|none|0B|0B|
> |tablet-2b11d9220dab4a5f952c5b1c10a68ccd|server|none|25.7K|193.7K|
> |MemRowSet-5|tablet-2b11d9220dab4a5f952c5b1c10a68ccd|none|25.4K|25.4K|
> |txn_tracker|tablet-2b11d9220dab4a5f952c5b1c10a68ccd|64.00M|0B|70.2K|
> |DeltaMemStores|tablet-2b11d9220dab4a5f952c5b1c10a68ccd|none|265B|1.0K|
> |tablet-cec045be60af4f759497234d8815238b|server|none|58.6K|192.9K|
> |MemRowSet-5|tablet-cec045be60af4f759497234d8815238b|none|58.3K|58.3K|
> |txn_tracker|tablet-cec045be60af4f759497234d8815238b|64.00M|0B|70.1K|
> |DeltaMemStores|tablet-cec045be60af4f759497234d8815238b|none|265B|1.0K|
> |tablet-cea7a54cebd242e4997da641f5b32e3a|server|none|124.4K|193.5K|
> |MemRowSet-5|tablet-cea7a54cebd242e4997da641f5b32e3a|none|124.1K|124.1K|
> |txn_tracker|tablet-cea7a54cebd242e4997da641f5b32e3a|64.00M|0B|70.0K|
> |DeltaMemStores|tablet-cea7a54cebd242e4997da641f5b32e3a|none|265B|795B|
> |tablet-9625dfde17774690a888b55024ac797a|server|none|530B|326.6K|
> |MemRowSet-22|tablet-9625dfde17774690a888b55024ac797a|none|265B|265B|
> |txn_tracker|tablet-9625dfde17774690a888b55024ac797a|64.00M|0B|71.3K|
> |DeltaMemStores|tablet-9625dfde17774690a888b55024ac797a|none|265B|1.3K|
> |tablet-6046b33901ca43d0975f59cf7e491186|server|none|530B|587.7K|
> |MemRowSet-22|tablet-6046b33901ca43d0975f59cf7e491186|none|265B|265B|
> |txn_tracker|tablet-6046b33901ca43d0975f59cf7e491186|64.00M|0B|139.3K|
> |DeltaMemStores|tablet-6046b33901ca43d0975f59cf7e491186|none|265B|1.0K|
> |tablet-1a18ab0915f0407b922fa7ecbe7a2f46|server|none|530B|383.4K|
> |MemRowSet-22|tablet-1a18ab0915f0407b922fa7ecbe7a2f46|none|265B|265B|
> |txn_tracker|tablet-1a18ab0915f0407b922fa7ecbe7a2f46|64.00M|0B|70.5K|
> |DeltaMemStores|tablet-1a18ab0915f0407b922fa7ecbe7a2f46|none|265B|1.0K|
> |tablet-ac54d1c1813a4e39943971cb56f248ef|server|none|530B|324.5K|
> |MemRowSet-11|tablet-ac54d1c1813a4e39943971cb56f248ef|none|265B|265B|
> |txn_tracker|tablet-ac54d1c1813a4e39943971cb56f248ef|64.00M|0B|69.7K|
> |DeltaMemStores|tablet-ac54d1c1813a4e39943971cb56f248ef|none|265B|1.0K|
> |tablet-4438580df6cc4d469393b9d6adee68d8|server|none|530B|325.4K|
> |MemRowSet-11|tablet-4438580df6cc4d469393b9d6adee68d8|none|265B|265B|
> |txn_tracker|tablet-4438580df6cc4d469393b9d6adee68d8|64.00M|0B|69.9K|
> |DeltaMemStores|tablet-4438580df6cc4d469393b9d6adee68d8|none|265B|1.0K|
> |tablet-2f1cef7d2a494575b941baa22b8a3dc9|server|none|530B|325.3K|
> |MemRowSet-11|tablet-2f1cef7d2a494575b941baa22b8a3dc9|none|265B|265B|
> |txn_tracker|tablet-2f1cef7d2a494575b941baa22b8a3dc9|64.00M|0B|70.3K|
> |DeltaMemStores|tablet-2f1cef7d2a494575b941baa22b8a3dc9|none|265B|1.0K|
> |tablet-d2ad22d202c04b2d98f1c5800df1c3b5|server|none|530B|326.0K|
> |MemRowSet-22|tablet-d2ad22d202c04b2d98f1c5800df1c3b5|none|265B|265B|
> |txn_tracker|tablet-d2ad22d202c04b2d98f1c5800df1c3b5|64.00M|0B|136.7K|
> |DeltaMemStores|tablet-d2ad22d202c04b2d98f1c5800df1c3b5|none|265B|1.0K|
> |tablet-b19b21d6b4c84f9895aad9e81559d019|server|none|530B|326.3K|
> |MemRowSet-22|tablet-b19b21d6b4c84f9895aad9e81559d019|none|265B|265B|
> |txn_tracker|tablet-b19b21d6b4c84f9895aad9e81559d019|64.00M|0B|70.0K|
> |DeltaMemStores|tablet-b19b21d6b4c84f9895aad9e81559d019|none|265B|1.0K|
> |tablet-27e9531cd5814b1c9637493f05860b19|server|none|530B|327.8K|
> |MemRowSet-22|tablet-27e9531cd5814b1c9637493f05860b19|none|265B|265B|
> |txn_tracker|tablet-27e9531cd5814b1c9637493f05860b19|64.00M|0B|71.8K|
> |DeltaMemStores|tablet-27e9531cd5814b1c9637493f05860b19|none|265B|1.3K|
> |tablet-425a19940239447faa0eaab4e380d644|server|none|795B|332.9K|
> |MemRowSet-11|tablet-425a19940239447faa0eaab4e380d644|none|265B|265B|
> |txn_tracker|tablet-425a19940239447faa0eaab4e380d644|64.00M|0B|76.8K|
> |DeltaMemStores|tablet-425a19940239447faa0eaab4e380d644|none|530B|1.0K|
> |tablet-178bd7bc39a941a887f393b0a7848066|server|none|530B|325.8K|
> |MemRowSet-10|tablet-178bd7bc39a941a887f393b0a7848066|none|265B|265B|
> |txn_tracker|tablet-178bd7bc39a941a887f393b0a7848066|64.00M|0B|70.4K|
> |DeltaMemStores|tablet-178bd7bc39a941a887f393b0a7848066|none|265B|1.0K|
> |tablet-91524acd28a440318918f11292ac8fdc|server|none|530B|326.4K|
> |MemRowSet-11|tablet-91524acd28a440318918f11292ac8fdc|none|265B|265B|
> |txn_tracker|tablet-91524acd28a440318918f11292ac8fdc|64.00M|0B|72.0K|
> |DeltaMemStores|tablet-91524acd28a440318918f11292ac8fdc|none|265B|1.0K|
> |tablet-be6f093aabf9460b97fc35dd026820b6|server|none|530B|588.6K|
> |MemRowSet-22|tablet-be6f093aabf9460b97fc35dd026820b6|none|265B|265B|
> |txn_tracker|tablet-be6f093aabf9460b97fc35dd026820b6|64.00M|0B|72.3K|
> |DeltaMemStores|tablet-be6f093aabf9460b97fc35dd026820b6|none|265B|1.0K|
> |tablet-dd8dd794f0f44426a3c46ce8f4b54652|server|none|530B|325.7K|
> |MemRowSet-22|tablet-dd8dd794f0f44426a3c46ce8f4b54652|none|265B|265B|
> |txn_tracker|tablet-dd8dd794f0f44426a3c46ce8f4b54652|64.00M|0B|72.4K|
> |DeltaMemStores|tablet-dd8dd794f0f44426a3c46ce8f4b54652|none|265B|795B|
> |tablet-ed128ca7b19c4e3eaa48e9e3eb341492|server|none|530B|325.6K|
> |MemRowSet-22|tablet-ed128ca7b19c4e3eaa48e9e3eb341492|none|265B|265B|
> |txn_tracker|tablet-ed128ca7b19c4e3eaa48e9e3eb341492|64.00M|0B|71.7K|
> |DeltaMemStores|tablet-ed128ca7b19c4e3eaa48e9e3eb341492|none|265B|1.0K|
> |log_block_manager|server|none|1.84M|6.19M|
> |result-tracker|server|none|0B|0B



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)