You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Jason Kania <ja...@ymail.com> on 2016/10/08 17:42:19 UTC

Understanding cassandra data directory contents

Hello,
I am using Cassandra 3.0.9 and I have encountered an issue where the nodes in my 3 node cluster have vastly different amounts of data even though they should be roughly the same. When I looked through the data directory for my database on two of the nodes, I see a number of directories with the same prefix, eg:
periodicReading-76eb7510096811e68a7421c8b9466352,periodicReading-453d55a0501d11e68623a9d2b6f96e86...

Only one directory with a specific table name prefix has a current date and the rest are older.
In contrast, on the node with the least space used, each directory has a unique prefix (not shared).
I am wondering what the contents of a Cassandra database directory should look like. Are there supposed to be multiple entries for a given table or just one?
If just one, what would be a procedure to determine if the other directories with the same table are junk that can be removed.

Thanks,
Jason

Re: Understanding cassandra data directory contents

Posted by Vladimir Yudovin <vl...@winguzone.com>.
Snapshots are created inside of table folder (one with ID suffix):



$ nodetool snapshot music

Requested creating snapshot(s) for [music] with snapshot name [1476165047920]

Snapshot directory: 1476165047920



$pwd

cassandra/data/data/music/songs-6060ae608dd811e68e340f08799f1f06/snapshots/1476165047920




Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra on Azure and SoftLayer.
Launch your cluster in minutes.





---- On Mon, 10 Oct 2016 17:00:03 -0400Nicolas Douillet &lt;nicolas.douillet@gmail.com&gt; wrote ----




Hi Json, 



I'm not familiar enough with Cassandra 3, but it might be snapshots. Snapshots are usually hardlinks to sstable directories.



Try this : 

    nodetool clearsnapshot



Does it change anything?



--

Nicolas




Le sam. 8 oct. 2016 à 21:26, Jason Kania &lt;jason.kania@ymail.com&gt; a écrit :





Hi Vladamir,



Thanks for the response. I assume then that it is safe to remove the directories that are not current as per the system_schema.tables table. I have dozens of the same table and haven't dropped and added nearly that many times. Do any of the nodetool or other commands clean up these unused directories?



Thanks,



Jason Kania


From: Vladimir Yudovin &lt;vladyu@winguzone.com&gt;
 To: user@cassandra.apache.org; Jason Kania &lt;jason.kania@ymail.com&gt; 
 Sent: Saturday, October 8, 2016 2:05 PM
 Subject: Re: Understanding cassandra data directory contents







Each table has unique id (suffix). If you drop and then recreate table with the same name it gets new id.



Try

SELECT keyspace_name, table_name, id FROM system_schema.tables ;

to determinate actual ID.



You can limit request to specific keyspace or table.





Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra on Azure and SoftLayer.
Launch your cluster in minutes.





---- On Sat, 08 Oct 2016 13:42:19 -0400 Jason Kania&lt;jason.kania@ymail.com&gt; wrote ---- 


Hello,



I am using Cassandra 3.0.9 and I have encountered an issue where the nodes in my 3 node cluster have vastly different amounts of data even though they should be roughly the same. When I looked through the data directory for my database on two of the nodes, I see a number of directories with the same prefix, eg:



periodicReading-76eb7510096811e68a7421c8b9466352,

periodicReading-453d55a0501d11e68623a9d2b6f96e86

...



Only one directory with a specific table name prefix has a current date and the rest are older.



In contrast, on the node with the least space used, each directory has a unique prefix (not shared).



I am wondering what the contents of a Cassandra database directory should look like. Are there supposed to be multiple entries for a given table or just one?



If just one, what would be a procedure to determine if the other directories with the same table are junk that can be removed.



Thanks,



Jason

























Re: Understanding cassandra data directory contents

Posted by Nicolas Douillet <ni...@gmail.com>.
Hi Json,

I'm not familiar enough with Cassandra 3, but it might be snapshots.
Snapshots are usually hardlinks to sstable directories.

Try this :
    nodetool clearsnapshot

Does it change anything?

--
Nicolas

Le sam. 8 oct. 2016 à 21:26, Jason Kania <ja...@ymail.com> a écrit :

> Hi Vladamir,
>
> Thanks for the response. I assume then that it is safe to remove the
> directories that are not current as per the system_schema.tables table. I
> have dozens of the same table and haven't dropped and added nearly that
> many times. Do any of the nodetool or other commands clean up these unused
> directories?
>
> Thanks,
>
> Jason Kania
>
> ------------------------------
> *From:* Vladimir Yudovin <vl...@winguzone.com>
> *To:* user@cassandra.apache.org; Jason Kania <ja...@ymail.com>
> *Sent:* Saturday, October 8, 2016 2:05 PM
> *Subject:* Re: Understanding cassandra data directory contents
>
> Each table has unique id (suffix). If you drop and then recreate table
> with the same name it gets new id.
>
> Try
> *SELECT keyspace_name, table_name, id FROM system_schema.tables ;*
> to determinate actual ID.
>
> You can limit request to specific keyspace or table.
>
>
> Best regards, Vladimir Yudovin,
>
>
> *Winguzone <https://winguzone.com/?from=list> - Hosted Cloud Cassandra on
> Azure and SoftLayer.Launch your cluster in minutes.*
>
>
> ---- On Sat, 08 Oct 2016 13:42:19 -0400 *Jason
> Kania<jason.kania@ymail.com <ja...@ymail.com>>* wrote ----
>
> Hello,
>
> I am using Cassandra 3.0.9 and I have encountered an issue where the nodes
> in my 3 node cluster have vastly different amounts of data even though they
> should be roughly the same. When I looked through the data directory for my
> database on two of the nodes, I see a number of directories with the same
> prefix, eg:
>
> periodicReading-76eb7510096811e68a7421c8b9466352,
> periodicReading-453d55a0501d11e68623a9d2b6f96e86
> ...
>
> Only one directory with a specific table name prefix has a current date
> and the rest are older.
>
> In contrast, on the node with the least space used, each directory has a
> unique prefix (not shared).
>
> I am wondering what the contents of a Cassandra database directory should
> look like. Are there supposed to be multiple entries for a given table or
> just one?
>
> If just one, what would be a procedure to determine if the other
> directories with the same table are junk that can be removed.
>
> Thanks,
>
> Jason
>
>
>
>
>
>

Re: Understanding cassandra data directory contents

Posted by Jason Kania <ja...@ymail.com>.
Hi Vladamir,
Thanks for the response. I assume then that it is safe to remove the directories that are not current as per the system_schema.tables table. I have dozens of the same table and haven't dropped and added nearly that many times. Do any of the nodetool or other commands clean up these unused directories?

Thanks,
Jason Kania

      From: Vladimir Yudovin <vl...@winguzone.com>
 To: user@cassandra.apache.org; Jason Kania <ja...@ymail.com> 
 Sent: Saturday, October 8, 2016 2:05 PM
 Subject: Re: Understanding cassandra data directory contents
   
Each table has unique id (suffix). If you drop and then recreate table with the same name it gets new id.

Try
SELECT keyspace_name, table_name, id FROM system_schema.tables ;
to determinate actual ID.

You can limit request to specific keyspace or table.


Best regards, Vladimir Yudovin, 
Winguzone - Hosted Cloud Cassandra on Azure and SoftLayer.
Launch your cluster in minutes.



---- On Sat, 08 Oct 2016 13:42:19 -0400 Jason Kania<ja...@ymail.com> wrote ---- 

Hello,
I am using Cassandra 3.0.9 and I have encountered an issue where the nodes in my 3 node cluster have vastly different amounts of data even though they should be roughly the same. When I looked through the data directory for my database on two of the nodes, I see a number of directories with the same prefix, eg:
periodicReading-76eb7510096811e68a7421c8b9466352,periodicReading-453d55a0501d11e68623a9d2b6f96e86...

Only one directory with a specific table name prefix has a current date and the rest are older.
In contrast, on the node with the least space used, each directory has a unique prefix (not shared).
I am wondering what the contents of a Cassandra database directory should look like. Are there supposed to be multiple entries for a given table or just one?
If just one, what would be a procedure to determine if the other directories with the same table are junk that can be removed.

Thanks,
Jason





   

Re: Understanding cassandra data directory contents

Posted by Vladimir Yudovin <vl...@winguzone.com>.
Each table has unique id (suffix). If you drop and then recreate table with the same name it gets new id.

Try
SELECT keyspace_name, table_name, id FROM system_schema.tables ;
to determinate actual ID.

You can limit request to specific keyspace or table.


Best regards, Vladimir Yudovin, 
Winguzone - Hosted Cloud Cassandra on Azure and SoftLayer.
Launch your cluster in minutes.




---- On Sat, 08 Oct 2016 13:42:19 -0400 Jason Kania&lt;jason.kania@ymail.com&gt; wrote ---- 

Hello,


I am using Cassandra 3.0.9 and I have encountered an issue where the nodes in my 3 node cluster have vastly different amounts of data even though they should be roughly the same. When I looked through the data directory for my database on two of the nodes, I see a number of directories with the same prefix, eg:


periodicReading-76eb7510096811e68a7421c8b9466352,
periodicReading-453d55a0501d11e68623a9d2b6f96e86
...



Only one directory with a specific table name prefix has a current date and the rest are older.


In contrast, on the node with the least space used, each directory has a unique prefix (not shared).


I am wondering what the contents of a Cassandra database directory should look like. Are there supposed to be multiple entries for a given table or just one?


If just one, what would be a procedure to determine if the other directories with the same table are junk that can be removed.



Thanks,


Jason