You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Stefan Miklosovic (Jira)" <ji...@apache.org> on 2020/12/11 15:18:00 UTC

[jira] [Updated] (CASSANDRA-16335) Expose data dirs in ColumnFamilyStoreMBean

     [ https://issues.apache.org/jira/browse/CASSANDRA-16335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stefan Miklosovic updated CASSANDRA-16335:
------------------------------------------
    Description: 
As of now, I am not currently aware of any way how to get the information where a CF stores its data. While this might look like a detail, it is important for backup and restore purposes. Lets consider this workflow:


1) There is a keyspace "abc" with table "def", on disk, it will look like /my/data/abc/def-123445/...

2) I take a backup, all SSTables are restored somewhere under same path /backups/abc/def-12345/....

3) I delete this table by CQL, data ends up in "dropped"

4) I create this table again, but now it will generate other ID - like /my/data/abc/def-6789/...

5) I want to restore /my/data/abc/def-123445/... but right now there are two structures - 
{code:java}
├── data
│   ├── abc
│   │   ├── def-12345...
│   │   │   ├── backups
│   │   │   └── snapshots
│   │   │       └── dropped-1607699318139-ghi
│   │   │           ├── manifest.json
│   │   │           ├── na-1-big-CompressionInfo.db
│   │   │           ├── na-1-big-Data.db
│   │   │           ├── na-1-big-Digest.crc32
│   │   │           ├── na-1-big-Filter.db
│   │   │           ├── na-1-big-Index.db
│   │   │           ├── na-1-big-Statistics.db
│   │   │           ├── na-1-big-Summary.db
│   │   │           ├── na-1-big-TOC.txt
│   │   │           └── schema.cql
│   │   └── def-6789...
│   │       ├── backups
│   │       ├── na-1-big-CompressionInfo.db
│   │       ├── na-1-big-Data.db
│   │       ├── na-1-big-Digest.crc32
│   │       ├── na-1-big-Filter.db
│   │       ├── na-1-big-Index.db
│   │       ├── na-1-big-Statistics.db
│   │       ├── na-1-big-Summary.db
│   │       └── na-1-big-TOC.txt
{code}

The question now is, what directory I should restore this to? Sure, into the "active" one, but I can not possibly know which one it is, because one of the is not used anymore.

I was trying to get this information from CFSMB but that information is not exposed.

Is there any way how to retrieve via JMX where a table actually stores its data?

I have put this together: https://github.com/apache/cassandra/pull/850/files

  was:
As of now, I am not currently aware of any way how to get the information where a CF stores its data. While this might look like a detail, it is important for back and restore purposes. Lets consider this workflow:


1) There is a keyspace "abc" with table "def", on disk, it will look like /my/data/abc/def-123445/...

2) I take a backup, all SSTables are restored somewhere under same path /backups/abc/def-12345/....

3) I delete this table by CQL, data ends up in "dropped"

4) I create this table again, but now it will generate other ID - like /my/data/abc/def-6789/...

5) I want to restore /my/data/abc/def-123445/... but right now there are two structures - 
{code:java}
├── data
│   ├── abc
│   │   ├── def-12345...
│   │   │   ├── backups
│   │   │   └── snapshots
│   │   │       └── dropped-1607699318139-ghi
│   │   │           ├── manifest.json
│   │   │           ├── na-1-big-CompressionInfo.db
│   │   │           ├── na-1-big-Data.db
│   │   │           ├── na-1-big-Digest.crc32
│   │   │           ├── na-1-big-Filter.db
│   │   │           ├── na-1-big-Index.db
│   │   │           ├── na-1-big-Statistics.db
│   │   │           ├── na-1-big-Summary.db
│   │   │           ├── na-1-big-TOC.txt
│   │   │           └── schema.cql
│   │   └── def-6789...
│   │       ├── backups
│   │       ├── na-1-big-CompressionInfo.db
│   │       ├── na-1-big-Data.db
│   │       ├── na-1-big-Digest.crc32
│   │       ├── na-1-big-Filter.db
│   │       ├── na-1-big-Index.db
│   │       ├── na-1-big-Statistics.db
│   │       ├── na-1-big-Summary.db
│   │       └── na-1-big-TOC.txt
{code}

The question now is, what directory I should restore this to? Sure, into the "active" one, but I can not possibly know which one it is, because one of the is not used anymore.

I was trying to get this information from CFSMB but that information is not exposed.

Is there any way how to retrieve via JMX where a table actually stores its data?

I have put this together: https://github.com/apache/cassandra/pull/850/files


> Expose data dirs in ColumnFamilyStoreMBean 
> -------------------------------------------
>
>                 Key: CASSANDRA-16335
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16335
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Stefan Miklosovic
>            Assignee: Stefan Miklosovic
>            Priority: Normal
>
> As of now, I am not currently aware of any way how to get the information where a CF stores its data. While this might look like a detail, it is important for backup and restore purposes. Lets consider this workflow:
> 1) There is a keyspace "abc" with table "def", on disk, it will look like /my/data/abc/def-123445/...
> 2) I take a backup, all SSTables are restored somewhere under same path /backups/abc/def-12345/....
> 3) I delete this table by CQL, data ends up in "dropped"
> 4) I create this table again, but now it will generate other ID - like /my/data/abc/def-6789/...
> 5) I want to restore /my/data/abc/def-123445/... but right now there are two structures - 
> {code:java}
> ├── data
> │   ├── abc
> │   │   ├── def-12345...
> │   │   │   ├── backups
> │   │   │   └── snapshots
> │   │   │       └── dropped-1607699318139-ghi
> │   │   │           ├── manifest.json
> │   │   │           ├── na-1-big-CompressionInfo.db
> │   │   │           ├── na-1-big-Data.db
> │   │   │           ├── na-1-big-Digest.crc32
> │   │   │           ├── na-1-big-Filter.db
> │   │   │           ├── na-1-big-Index.db
> │   │   │           ├── na-1-big-Statistics.db
> │   │   │           ├── na-1-big-Summary.db
> │   │   │           ├── na-1-big-TOC.txt
> │   │   │           └── schema.cql
> │   │   └── def-6789...
> │   │       ├── backups
> │   │       ├── na-1-big-CompressionInfo.db
> │   │       ├── na-1-big-Data.db
> │   │       ├── na-1-big-Digest.crc32
> │   │       ├── na-1-big-Filter.db
> │   │       ├── na-1-big-Index.db
> │   │       ├── na-1-big-Statistics.db
> │   │       ├── na-1-big-Summary.db
> │   │       └── na-1-big-TOC.txt
> {code}
> The question now is, what directory I should restore this to? Sure, into the "active" one, but I can not possibly know which one it is, because one of the is not used anymore.
> I was trying to get this information from CFSMB but that information is not exposed.
> Is there any way how to retrieve via JMX where a table actually stores its data?
> I have put this together: https://github.com/apache/cassandra/pull/850/files



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org