You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Daniel Hölbling-Inzko <da...@bitmovin.com> on 2017/05/11 09:37:41 UTC

Cassandra Snapshots and directories

Hi,
I am going through this guide to do backup/restore of cassandra data to a
new cluster:
http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_backup_snapshot_restore_t.html#task_ds_cmf_11r_gk

When creating a snapshot I get the snapshot files mixed in with the normal
data files and backup files, so it's all over the place and very hard
(especially with lots of tables per keyspace) to transfer ONLY the snapshot.
(Mostly since there is a snapshot directory per table..)

Am I missing something or is there some arcane shell command that filters
out only the snapshots?
Because this way it's much easier to just backup the whole data directory.

greetings Daniel

Re: Cassandra Snapshots and directories

Posted by Anthony Grasso <an...@gmail.com>.
Hi Daniel,

Yes, you are right it does require some additional work to rsync just the
snapshots.

What about doing something like this to make rsync syntax for the backup
easier?

# in the Cassandra data directory, iterate through the keyspaces
for ks in $(find . -type d -iname backup)
do
  # iterate through each column family in the keyspace
  for cf in $(ls ${ks})
  do
    # get the directory without the 'backup' path component in it
    out_ks=$(echo ${ks} | cut -d'/' -f2,3)

    # make backup directory and perform the rsync
    mkdir -p <YOUR_BACKUP_DIR>/${out_ks}/${cf}
    rsync -azP ${ks}/${cf}/ <YOUR_BACKUP_DIR>/${out_ks}/${cf}
  done
done

Regards,
Anthony

On 12 May 2017 at 18:00, Daniel Hölbling-Inzko <
daniel.hoelbling-inzko@bitmovin.com> wrote:

> Hi Varun,
> yes you are right - that's the structure that gets created. But if I want
> to backup ALL columnfamilies at once this requires a quite complex rsync as
> Vladimir mentioned.
> I can't just copy over the /data/keyspace directory as that contains all
> the data AND all the snapshots. I really have to go through this
> columnfamily by columnfamily which is annoying.
>
> greetings Daniel
>
> On Thu, 11 May 2017 at 22:48 Varun Gupta <va...@uber.com> wrote:
>
>>
>> I did not get your question completely, with "snapshot files are mixed
>> with files and backup files".
>>
>> When you call nodetool snapshot, it will create a directory with snapshot
>> name if specified or current timestamp at /data/<keyspace>/<
>> columnfamily>/backup/<snapshotname>. This directory will have all
>> sstables, metadata files and schema.cql (if using 3.0.9 or higher).
>>
>>
>> On Thu, May 11, 2017 at 2:37 AM, Daniel Hölbling-Inzko <
>> daniel.hoelbling-inzko@bitmovin.com> wrote:
>>
>>> Hi,
>>> I am going through this guide to do backup/restore of cassandra data to
>>> a new cluster:
>>> http://docs.datastax.com/en/cassandra/2.1/cassandra/
>>> operations/ops_backup_snapshot_restore_t.html#task_ds_cmf_11r_gk
>>>
>>> When creating a snapshot I get the snapshot files mixed in with the
>>> normal data files and backup files, so it's all over the place and very
>>> hard (especially with lots of tables per keyspace) to transfer ONLY the
>>> snapshot.
>>> (Mostly since there is a snapshot directory per table..)
>>>
>>> Am I missing something or is there some arcane shell command that
>>> filters out only the snapshots?
>>> Because this way it's much easier to just backup the whole data
>>> directory.
>>>
>>> greetings Daniel
>>>
>>
>>

Re: Cassandra Snapshots and directories

Posted by Daniel Hölbling-Inzko <da...@bitmovin.com>.
Hi Varun,
yes you are right - that's the structure that gets created. But if I want
to backup ALL columnfamilies at once this requires a quite complex rsync as
Vladimir mentioned.
I can't just copy over the /data/keyspace directory as that contains all
the data AND all the snapshots. I really have to go through this
columnfamily by columnfamily which is annoying.

greetings Daniel

On Thu, 11 May 2017 at 22:48 Varun Gupta <va...@uber.com> wrote:

>
> I did not get your question completely, with "snapshot files are mixed
> with files and backup files".
>
> When you call nodetool snapshot, it will create a directory with snapshot
> name if specified or current timestamp at
> /data/<keyspace>/<columnfamily>/backup/<snapshotname>. This directory will
> have all sstables, metadata files and schema.cql (if using 3.0.9 or higher).
>
>
> On Thu, May 11, 2017 at 2:37 AM, Daniel Hölbling-Inzko <
> daniel.hoelbling-inzko@bitmovin.com> wrote:
>
>> Hi,
>> I am going through this guide to do backup/restore of cassandra data to a
>> new cluster:
>>
>> http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_backup_snapshot_restore_t.html#task_ds_cmf_11r_gk
>>
>> When creating a snapshot I get the snapshot files mixed in with the
>> normal data files and backup files, so it's all over the place and very
>> hard (especially with lots of tables per keyspace) to transfer ONLY the
>> snapshot.
>> (Mostly since there is a snapshot directory per table..)
>>
>> Am I missing something or is there some arcane shell command that filters
>> out only the snapshots?
>> Because this way it's much easier to just backup the whole data directory.
>>
>> greetings Daniel
>>
>
>

Re: Cassandra Snapshots and directories

Posted by Varun Gupta <va...@uber.com>.
I did not get your question completely, with "snapshot files are mixed with
files and backup files".

When you call nodetool snapshot, it will create a directory with snapshot
name if specified or current timestamp at
/data/<keyspace>/<columnfamily>/backup/<snapshotname>. This directory will
have all sstables, metadata files and schema.cql (if using 3.0.9 or higher).


On Thu, May 11, 2017 at 2:37 AM, Daniel Hölbling-Inzko <
daniel.hoelbling-inzko@bitmovin.com> wrote:

> Hi,
> I am going through this guide to do backup/restore of cassandra data to a
> new cluster:
> http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_backup_
> snapshot_restore_t.html#task_ds_cmf_11r_gk
>
> When creating a snapshot I get the snapshot files mixed in with the normal
> data files and backup files, so it's all over the place and very hard
> (especially with lots of tables per keyspace) to transfer ONLY the snapshot.
> (Mostly since there is a snapshot directory per table..)
>
> Am I missing something or is there some arcane shell command that filters
> out only the snapshots?
> Because this way it's much easier to just backup the whole data directory.
>
> greetings Daniel
>