You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by "qihuang.zheng" <qi...@fraudmetrix.cn> on 2015/10/29 09:08:31 UTC

nodetool status Load not same with disk used

We have some nodes Load too large, but some are normal. 
[qihuang.zheng@cass047221 forseti]$ /usr/install/cassandra/bin/nodetool status
-- Address        Load    Tokens Owns  Host ID                             Rack
UN 192.168.47.221 2.66 TB  256   8.7%  87e100ed-85c4-44cb-9d9f-2d602d016038 RAC1
UN 192.168.47.204 614.58 GB 256   8.2%  91ad3d42-4207-46fe-8188-34c3f0b2dbd2 RAC1


I check the node with df command, and find disk used only 715G.
[qihuang.zheng@cass047221 forseti]$ df -h
Filesystem   Size Used Avail Use% Mounted on
/dev/sda2    20G 8.6G  11G 47% /
tmpfs      16G   0  16G  0% /dev/shm
/dev/sda1    190M  58M 123M 32% /boot
/dev/sda4    3.5T 715G 2.6T 22% /home


and this is a normal node’s disk used:
[qihuang.zheng@cass047204 ~]$ df -h
Filesystem      Size Used Avail Use% Mounted on
tmpfs         16G   0  16G  0% /dev/shm
/dev/sda1       485M  57M 403M 13% /boot
/dev/mapper/VolGroup-lv_home
           3.4T 659G 2.6T 21% /home


How does nodetool status Load come from? should't It based on sstable file size which also based on disk used?


Tks, qihuang.zheng

Re: nodetool status Load not same with disk used

Posted by Robert Coli <rc...@eventbrite.com>.
On Thu, Oct 29, 2015 at 1:08 AM, qihuang.zheng <qihuang.zheng@fraudmetrix.cn
> wrote:

> *We have some nodes Load too large, but some are normal.  *
>

tl;dr - Clear the snapshots on the nodes which are too large.

Longer :

Are you sure that the nodes which are too large differ in the actual *data*
size, or do they just contain snapshots?

Cassandra snapshots are hard links to SSTables, which means a number of odd
things :

1) Snapshots grow in actual disk usage over time, as they only consume
"extra" disk space when the SSTable they are a hard link to is removed from
the data directory.

2) Unless you use du --apparent-size, the order in which du sees files
determines which file is counted as using the disk, so you might see weird
results from du in the data directory if you are also involving the
snapshots.

   --apparent-size
              print  apparent sizes, rather than disk usage; although the
apparent size is usually smaller, it may be larger due to
              holes in (`sparse') files, internal fragmentation, indirect
blocks, and the like

=Rob