You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Hannu Kröger <hk...@gmail.com> on 2019/02/19 13:19:32 UTC

Restore a table with dropped columns to a new cluster fails

Hi,

I would like to bring this issue to your attention.

Link to the ticket:
https://issues.apache.org/jira/browse/CASSANDRA-14336 <https://issues.apache.org/jira/browse/CASSANDRA-14336>

Basically if a table contains dropped columns and you try to restore a snapshot to a new cluster, that will fail because of an error like "java.lang.RuntimeException: Unknown column XXX during deserialization”.

I feel this is quite serious problem for backup and restore functionality of Cassandra. You cannot restore a backup to a new cluster if columns have been dropped.

There have been other similar tickets that have been apparently closed but based on my test with 3.11.4, the issue still persists.

Best Regards,
Hannu Kröger

Re: Restore a table with dropped columns to a new cluster fails

Posted by Mitch Gitman <mg...@gmail.com>.

Fabulous tip. Thanks, Sean. I will definitely check out dsbulk.

Great to see it's a Cassandra-general tool and not just limited to DataStax
Enterprise.

On Fri, Jul 24, 2020 at 12:58 PM Durity, Sean R <SE...@homedepot.com>
wrote:

> I would use dsbulk to unload and load. Then the schemas don’t really
> matter. You define which fields in the resulting file are loaded into which
> columns. You also won’t have the limitations and slowness of COPY TO/FROM.
>
>
>
>
>
> Sean Durity
>
>
>
> *From:* Mitch Gitman <mg...@gmail.com>
> *Sent:* Friday, July 24, 2020 2:22 PM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Re: Restore a table with dropped columns to a new
> cluster fails
>
>
>
> I'm reviving this thread because I'm looking for a non-hacky way to
> migrate data from one cluster to another using nodetool snapshot and
> sstableloader without having to preserve dropped columns in the new schema.
> In my view, that's just cruft and confusion that keeps building.
>
> The best idea I can come up with is to do the following in the source
> cluster:
>
>    1. Use the cqlsh COPY FROM command to export the data in the table.
>    2. Drop the table.
>    3. Re-create the table.
>    4. Use the cqlsh COPY TO command to import the data into the new
>    incarnation of the table.
>
>
> This approach is predicated on two assumptions:
>
>    - The re-created table has no knowledge of the history of the old
>    table by the same name.
>    - The amount of data in the table doesn't exceed what the COPY command
>    can handle.
>
>
> If the dropped columns exist in the table in an environment where there's
> a lot of data, then we'd have to use some other mechanism to capture and
> reload the data.
>
> If you see something wrong about this approach or you have a better way to
> do it, I'd be glad to hear from you.
>
>
>
> On Tue, Feb 19, 2019 at 11:31 AM Jeff Jirsa <jj...@gmail.com> wrote:
>
> You can also manually add the dropped column to the appropriate table to
> eliminate the issue. Has to be done by a human, a new cluster would have no
> way of learning about a dropped column, and the missing metadata cannot be
> inferred.
>
>
>
>
>
> On Tue, Feb 19, 2019 at 10:58 AM Elliott Sims <el...@backblaze.com>
> wrote:
>
> When a snapshot is taken, it includes a "schema.cql" file.  That should be
> sufficient to restore whatever you need to restore.  I'd argue that neither
> automatically resurrecting a dropped table nor silently failing to restore
> it is a good behavior, so it's not unreasonable to have the user re-create
> the table then choose if they want to re-drop it.
>
>
>
> On Tue, Feb 19, 2019 at 7:28 AM Hannu Kröger <hk...@gmail.com> wrote:
>
> Hi,
>
>
>
> I would like to bring this issue to your attention.
>
>
>
> Link to the ticket:
>
> https://issues.apache.org/jira/browse/CASSANDRA-14336 [issues.apache.org]
> <https://urldefense.com/v3/__https:/issues.apache.org/jira/browse/CASSANDRA-14336__;!!M-nmYVHPHQ!eJ1PiiThRyq9y1v7PnYgHnaxFUJ6Lloy4Zs_wSgCcg_DSsLcbHgZxGqhKQ0vCapZPSmg3JY$>
>
>
>
> Basically if a table contains dropped columns and you try to restore a
> snapshot to a new cluster, that will fail because of an error like
> "java.lang.RuntimeException: Unknown column XXX during deserialization”.
>
>
>
> I feel this is quite serious problem for backup and restore functionality
> of Cassandra. You cannot restore a backup to a new cluster if columns have
> been dropped.
>
>
>
> There have been other similar tickets that have been apparently closed but
> based on my test with 3.11.4, the issue still persists.
>
>
>
> Best Regards,
>
> Hannu Kröger
>
>
> ------------------------------
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>

RE: Restore a table with dropped columns to a new cluster fails

Posted by "Durity, Sean R" <SE...@homedepot.com>.

I would use dsbulk to unload and load. Then the schemas don’t really matter. You define which fields in the resulting file are loaded into which columns. You also won’t have the limitations and slowness of COPY TO/FROM.

Sean Durity

From: Mitch Gitman <mg...@gmail.com>
Sent: Friday, July 24, 2020 2:22 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Restore a table with dropped columns to a new cluster fails

I'm reviving this thread because I'm looking for a non-hacky way to migrate data from one cluster to another using nodetool snapshot and sstableloader without having to preserve dropped columns in the new schema. In my view, that's just cruft and confusion that keeps building.

The best idea I can come up with is to do the following in the source cluster:

  1.  Use the cqlsh COPY FROM command to export the data in the table.
  2.  Drop the table.
  3.  Re-create the table.
  4.  Use the cqlsh COPY TO command to import the data into the new incarnation of the table.

This approach is predicated on two assumptions:

  *   The re-created table has no knowledge of the history of the old table by the same name.
  *   The amount of data in the table doesn't exceed what the COPY command can handle.

If the dropped columns exist in the table in an environment where there's a lot of data, then we'd have to use some other mechanism to capture and reload the data.

If you see something wrong about this approach or you have a better way to do it, I'd be glad to hear from you.

On Tue, Feb 19, 2019 at 11:31 AM Jeff Jirsa <jj...@gmail.com>> wrote:
You can also manually add the dropped column to the appropriate table to eliminate the issue. Has to be done by a human, a new cluster would have no way of learning about a dropped column, and the missing metadata cannot be inferred.

On Tue, Feb 19, 2019 at 10:58 AM Elliott Sims <el...@backblaze.com>> wrote:
When a snapshot is taken, it includes a "schema.cql" file.  That should be sufficient to restore whatever you need to restore.  I'd argue that neither automatically resurrecting a dropped table nor silently failing to restore it is a good behavior, so it's not unreasonable to have the user re-create the table then choose if they want to re-drop it.

On Tue, Feb 19, 2019 at 7:28 AM Hannu Kröger <hk...@gmail.com>> wrote:
Hi,

I would like to bring this issue to your attention.

Link to the ticket:
https://issues.apache.org/jira/browse/CASSANDRA-14336 [issues.apache.org]<https://urldefense.com/v3/__https:/issues.apache.org/jira/browse/CASSANDRA-14336__;!!M-nmYVHPHQ!eJ1PiiThRyq9y1v7PnYgHnaxFUJ6Lloy4Zs_wSgCcg_DSsLcbHgZxGqhKQ0vCapZPSmg3JY$>

Basically if a table contains dropped columns and you try to restore a snapshot to a new cluster, that will fail because of an error like "java.lang.RuntimeException: Unknown column XXX during deserialization”.

I feel this is quite serious problem for backup and restore functionality of Cassandra. You cannot restore a backup to a new cluster if columns have been dropped.

There have been other similar tickets that have been apparently closed but based on my test with 3.11.4, the issue still persists.

Best Regards,
Hannu Kröger

________________________________

The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.

Re: Restore a table with dropped columns to a new cluster fails

Posted by Mitch Gitman <mg...@gmail.com>.

I'm reviving this thread because I'm looking for a non-hacky way to migrate
data from one cluster to another using nodetool snapshot and sstableloader
without having to preserve dropped columns in the new schema. In my view,
that's just cruft and confusion that keeps building.

The best idea I can come up with is to do the following in the source
cluster:

   1. Use the cqlsh COPY FROM command to export the data in the table.
   2. Drop the table.
   3. Re-create the table.
   4. Use the cqlsh COPY TO command to import the data into the new
   incarnation of the table.

This approach is predicated on two assumptions:

   - The re-created table has no knowledge of the history of the old table
   by the same name.
   - The amount of data in the table doesn't exceed what the COPY command
   can handle.

If the dropped columns exist in the table in an environment where there's a
lot of data, then we'd have to use some other mechanism to capture and
reload the data.

If you see something wrong about this approach or you have a better way to
do it, I'd be glad to hear from you.

On Tue, Feb 19, 2019 at 11:31 AM Jeff Jirsa <jj...@gmail.com> wrote:

> You can also manually add the dropped column to the appropriate table to
> eliminate the issue. Has to be done by a human, a new cluster would have no
> way of learning about a dropped column, and the missing metadata cannot be
> inferred.
>
>
> On Tue, Feb 19, 2019 at 10:58 AM Elliott Sims <el...@backblaze.com>
> wrote:
>
>> When a snapshot is taken, it includes a "schema.cql" file.  That should
>> be sufficient to restore whatever you need to restore.  I'd argue that
>> neither automatically resurrecting a dropped table nor silently failing to
>> restore it is a good behavior, so it's not unreasonable to have the user
>> re-create the table then choose if they want to re-drop it.
>>
>>
>> On Tue, Feb 19, 2019 at 7:28 AM Hannu Kröger <hk...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I would like to bring this issue to your attention.
>>>
>>> Link to the ticket:
>>> https://issues.apache.org/jira/browse/CASSANDRA-14336
>>>
>>> Basically if a table contains dropped columns and you try to restore a
>>> snapshot to a new cluster, that will fail because of an error like
>>> "java.lang.RuntimeException: Unknown column XXX during deserialization”.
>>>
>>> I feel this is quite serious problem for backup and restore
>>> functionality of Cassandra. You cannot restore a backup to a new cluster if
>>> columns have been dropped.
>>>
>>> There have been other similar tickets that have been apparently closed
>>> but based on my test with 3.11.4, the issue still persists.
>>>
>>> Best Regards,
>>> Hannu Kröger
>>>
>>

Re: Restore a table with dropped columns to a new cluster fails

Posted by Jeff Jirsa <jj...@gmail.com>.

You can also manually add the dropped column to the appropriate table to
eliminate the issue. Has to be done by a human, a new cluster would have no
way of learning about a dropped column, and the missing metadata cannot be
inferred.


On Tue, Feb 19, 2019 at 10:58 AM Elliott Sims <el...@backblaze.com> wrote:

> When a snapshot is taken, it includes a "schema.cql" file.  That should be
> sufficient to restore whatever you need to restore.  I'd argue that neither
> automatically resurrecting a dropped table nor silently failing to restore
> it is a good behavior, so it's not unreasonable to have the user re-create
> the table then choose if they want to re-drop it.
>
>
> On Tue, Feb 19, 2019 at 7:28 AM Hannu Kröger <hk...@gmail.com> wrote:
>
>> Hi,
>>
>> I would like to bring this issue to your attention.
>>
>> Link to the ticket:
>> https://issues.apache.org/jira/browse/CASSANDRA-14336
>>
>> Basically if a table contains dropped columns and you try to restore a
>> snapshot to a new cluster, that will fail because of an error like
>> "java.lang.RuntimeException: Unknown column XXX during deserialization”.
>>
>> I feel this is quite serious problem for backup and restore functionality
>> of Cassandra. You cannot restore a backup to a new cluster if columns have
>> been dropped.
>>
>> There have been other similar tickets that have been apparently closed
>> but based on my test with 3.11.4, the issue still persists.
>>
>> Best Regards,
>> Hannu Kröger
>>
>

Re: Restore a table with dropped columns to a new cluster fails

Posted by Elliott Sims <el...@backblaze.com>.

When a snapshot is taken, it includes a "schema.cql" file.  That should be
sufficient to restore whatever you need to restore.  I'd argue that neither
automatically resurrecting a dropped table nor silently failing to restore
it is a good behavior, so it's not unreasonable to have the user re-create
the table then choose if they want to re-drop it.

On Tue, Feb 19, 2019 at 7:28 AM Hannu Kröger <hk...@gmail.com> wrote:

> Hi,
>
> I would like to bring this issue to your attention.
>
> Link to the ticket:
> https://issues.apache.org/jira/browse/CASSANDRA-14336
>
> Basically if a table contains dropped columns and you try to restore a
> snapshot to a new cluster, that will fail because of an error like
> "java.lang.RuntimeException: Unknown column XXX during deserialization”.
>
> I feel this is quite serious problem for backup and restore functionality
> of Cassandra. You cannot restore a backup to a new cluster if columns have
> been dropped.
>
> There have been other similar tickets that have been apparently closed but
> based on my test with 3.11.4, the issue still persists.
>
> Best Regards,
> Hannu Kröger
>