You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@zookeeper.apache.org by "Szabolcs Bukros (Jira)" <ji...@apache.org> on 2022/06/29 15:26:00 UTC

[jira] [Created] (ZOOKEEPER-4566) Create tool for recursive snapshot analysis

|  ![](cid:jira-generated-image-avatar-648955b9-8749-47e4-9031-dea073ab4d30) |
[Szabolcs
Bukros](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=bszabolcs)
**created** an issue  
---|---  
|  
---  
|  [ZooKeeper](https://issues.apache.org/jira/browse/ZOOKEEPER) /
[![Improvement](cid:jira-generated-image-avatar-a809b9ba-
df85-42b6-8b1c-c0098d41939e)](https://issues.apache.org/jira/browse/ZOOKEEPER-4566)
[ZOOKEEPER-4566](https://issues.apache.org/jira/browse/ZOOKEEPER-4566)  
---  
[Create tool for recursive snapshot
analysis](https://issues.apache.org/jira/browse/ZOOKEEPER-4566)  
| Issue Type: |  ![Improvement](cid:jira-generated-image-avatar-a809b9ba-
df85-42b6-8b1c-c0098d41939e) Improvement  
---|---  
Assignee: |  Unassigned  
Created: |  29/Jun/22 15:25  
Priority: |  ![Major](cid:jira-generated-image-static-major-
fd50aa61-800f-4c6b-835c-87d4fb9bbf59) Major  
Reporter: |  [Szabolcs
Bukros](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=bszabolcs)  
|

I needed to analyze snapshots to determine which application caused a massive
snapshot size increase by recursively checking child node count and data size
for nodes, but could not find a tool for the job. Loading the snapshots one by
one and using a ZooKeeper client proved too slow and SnapshotFormatter was
very fast but processing the output to get the relevant data for my usecase
proved more work than writing a tool that has the output I need. So I wrote
SnapshotSumFormatter based on SnapshotFormatter:

    
    
    USAGE: SnapshotSumFormatter snapshot_file starting_node max_depth
    

The tool recursively travels the child nodes under "starting_node" and
collects both node count and summarizes the data stored in every node under
the current one. This helps to identify problematic jobs/applications that
either store too much data or does not properly clean up. "max_depth" defines
the depth where the tool still writes to the output. 0 means there is no depth
limit, every node's stats will be displayed, 1 means it will only contain the
starting node's and it's children's stats, 2 ads another level and so on. This
ONLY affects the level of details displayed, NOT the calculation.

An example output looks like this (with "SnapshotSumFormatter <snapshot_file>
/ 2"):

    
    
    /
       children: 1250511
       data: 1952186580
    -- /zookeeper
    --   children: 1
    --   data: 0
    -- /solr
    --   children: 1773
    --   data: 8419162
    ---- /solr/configs
    ----   children: 1640
    ----   data: 8407643
    ---- /solr/overseer
    ----   children: 6
    ----   data: 0
    ---- /solr/live_nodes
    ----   children: 3
    ----   data: 0
    

I think this might prove useful for others too and would like to share it.  
  
---  
|  |  [ ![Add Comment](cid:jira-generated-image-static-comment-
icon-b7d7dfe8-e467-4584-931d-88f36389820b)
](https://issues.apache.org/jira/browse/ZOOKEEPER-4566#add-comment "Add
Comment") |  [Add
Comment](https://issues.apache.org/jira/browse/ZOOKEEPER-4566#add-comment "Add
Comment")  
---|---  
  
|  This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9) |  |
![Atlassian logo](https://issues.apache.org/jira/images/mail/atlassian-email-
logo.png)  
---