You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Marcus Eriksson (Jira)" <ji...@apache.org> on 2021/03/03 08:10:00 UTC
[jira] [Commented] (CASSANDRA-16335) Expose data dirs in
ColumnFamilyStoreMBean
[ https://issues.apache.org/jira/browse/CASSANDRA-16335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17294362#comment-17294362 ]
Marcus Eriksson commented on CASSANDRA-16335:
---------------------------------------------
seems this broke a few unit tests: https://app.circleci.com/pipelines/github/krummas/cassandra/636/workflows/ad435235-d61c-4bfa-ac1c-be197c1e241d/jobs/4402/tests
> Expose data dirs in ColumnFamilyStoreMBean
> -------------------------------------------
>
> Key: CASSANDRA-16335
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16335
> Project: Cassandra
> Issue Type: Improvement
> Components: Local/Config
> Reporter: Stefan Miklosovic
> Assignee: Stefan Miklosovic
> Priority: Low
> Time Spent: 10m
> Remaining Estimate: 0h
>
> As of now, I am not currently aware of any way how to get the information where a CF stores its data. While this might look like a detail, it is important for backup and restore purposes. Lets consider this workflow:
> 1) There is a keyspace "abc" with table "def", on disk, it will look like /my/data/abc/def-123445/...
> 2) I take a backup, all SSTables are put somewhere under path /backups/abc/def-12345/....
> 3) I delete this table by CQL, data ends up in "dropped"
> 4) I create this table again, but now it will generate other ID - like /my/data/abc/def-6789/...
> 5) I want to restore /backups/abc/def-123445/... but right now there are two structures -
> {code:java}
> ├── data
> │ ├── abc
> │ │ ├── def-12345...
> │ │ │ ├── backups
> │ │ │ └── snapshots
> │ │ │ └── dropped-1607699318139-ghi
> │ │ │ ├── manifest.json
> │ │ │ ├── na-1-big-CompressionInfo.db
> │ │ │ ├── na-1-big-Data.db
> │ │ │ ├── na-1-big-Digest.crc32
> │ │ │ ├── na-1-big-Filter.db
> │ │ │ ├── na-1-big-Index.db
> │ │ │ ├── na-1-big-Statistics.db
> │ │ │ ├── na-1-big-Summary.db
> │ │ │ ├── na-1-big-TOC.txt
> │ │ │ └── schema.cql
> │ │ └── def-6789...
> │ │ ├── backups
> │ │ ├── na-1-big-CompressionInfo.db
> │ │ ├── na-1-big-Data.db
> │ │ ├── na-1-big-Digest.crc32
> │ │ ├── na-1-big-Filter.db
> │ │ ├── na-1-big-Index.db
> │ │ ├── na-1-big-Statistics.db
> │ │ ├── na-1-big-Summary.db
> │ │ └── na-1-big-TOC.txt
> {code}
> The question now is, what directory I should restore this to? Sure, into the "active" one, but I can not possibly know which one it is, because one of them is not used anymore and I do not want to do something very smelly like listing directories on disk and checking which one does not contain "dropped" directory ... Yes, one might use importing of SSTables - that is introduced in Cassandra 4, but for Cassandra 3, one can either copy it over or do hardlinks and refresh.
> The second scenario is like this:
> There is just one "active" table, no structure with "dropped" dir exists, but its id (that part after table name) differs. If I want to copy files over and refresh, I need to resolve this discrepancy and copy SSTables into a directory ending on id which differs from id from backup.
> I was trying to get this information from CFSMB but that information is not exposed.
> Is there any way how to retrieve via JMX where a table actually stores its data?
> I have put this together: https://github.com/apache/cassandra/pull/850/files
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org