You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Philip Ó Condúin <ph...@gmail.com> on 2018/10/24 10:46:10 UTC

Insert from Select - CQL

Hi All,

I have a problem that I'm trying to work out and can't find anything online
that may help me.

I have been asked to delete 4K records from a Column Family that has a
total of 1.8 million rows.  I have been given an excel spreadsheet with a
list of the 4K PRIMARY KEY numbers to be deleted.  Great, the delete will
be easy anyway.

But before I delete them I want to take a backup of what I'm deleting
before I do, so that if the customer comes along and says they got the
wrong numbers then I can quickly restore one or all of them.
I have been trying to figure out how I can generate inserts from a select
but it looks like this is not possible.

I'm using centos and Cassandra 2.11

Does anyone have any ideas of what I can do to generate inserts based on
primary key numbers in an excel spreadsheet?

Kind Regards,
Phil

Re: Insert from Select - CQL

Posted by "Max C." <mc...@core43.com>.
I’ve never been a big fan of the “COPY” statement.

My preference for stuff like this (though I am definitely in the minority I think!) — particularly for the amount of data you’re talking about — is to use the open source tool “cassandradump” — which is similar to mysqldump but for cassandra.

Original:
https://github.com/gianlucaborello/cassandradump <https://github.com/gianlucaborello/cassandradump>

My version which changes a number of things to make it a little easier to work with (and more mysqldump-like):
https://github.com/terwilliger42/cassandradump/commits/develop-tw42 <https://github.com/terwilliger42/cassandradump/commits/develop-tw42>

- Updated for Python 3 (only works on Python 3; tested on Python 3.5) <https://github.com/terwilliger42/cassandradump/commit/3aab25d74eb373fdf279d8eec89df7d57b3e07df>
- Added progress bar
- Added default reading/writing to stdin/stdout
- Added ability to prompt for password
- --protocol-version no longer required for C* 2.x+
- Added capability to export/import into a different keyspace
- Split commandline args into two subparsers "import" and "export"
- Added --truncate option
- Added shorter CLI params, similar to mysqldump (-h for --host, -u for --username, -p for --password, etc)

- Max

> On Oct 25, 2018, at 5:07 am, Philip Ó Condúin <ph...@gmail.com> wrote:
> 
> Hi Alain,
> 
> That is exactly what I did yesterday in the end.  I ran the selects and output the results to a file, I ran some greps on that file to leave myself with just the data rows removing any white space and headers.
> I then copied this data into a notepad on my local machine and saved it as a csv.  Luckily the results of the selects were delimited by pipe "|" so I imported the csv into a spreadsheet and was able to separate the values into columns.
> 
> From here I was able to build up the insert statements and now have 4K insert statements as a backup.
> 
> Thanks a lot for your reply.
> 
> Kind regards,
> Phil
> 
> On Thu, 25 Oct 2018 at 11:59, Alain RODRIGUEZ <arodrime@gmail.com <ma...@gmail.com>> wrote:
> 
> Does anyone have any ideas of what I can do to generate inserts based on primary key numbers in an excel spreadsheet?
> 
> A quick thought:
> 
> What about using a column of the spreadsheet to actually store the SELECT result and generate the INSERT statement (and I would probably do the DELETE too) corresponding to each row using the power of the spreadsheet to write the query once and have it for all the partitions with the proper values?
> 
> The spreadsheet would then be your backup somehow.
> 
> We are a bit far from any Cassandra advice, but that's my first thought on your problem, use the spreadsheet :).
> Another option is probably to SELECT these rows and INSERT them into some other Cassandra table (same cluster or not). Here you would have to code it I think (client app of any kind)
> This might not a good fit, but just in case, you might want to check at the 'COPY' statement: https://stackoverflow.com/questions/21363046/how-to-select-data-from-a-table-and-insert-into-another-table <https://stackoverflow.com/questions/21363046/how-to-select-data-from-a-table-and-insert-into-another-table>
> I'm not too sure what suits you the best.
> 
> C*heers,
> -----------------------
> Alain Rodriguez - alain@thelastpickle.com <ma...@thelastpickle.com>
> France / Spain
> 
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com <http://www.thelastpickle.com/>
> Le mer. 24 oct. 2018 à 12:46, Philip Ó Condúin <philipoconduin@gmail.com <ma...@gmail.com>> a écrit :
> Hi All,
> 
> I have a problem that I'm trying to work out and can't find anything online that may help me.
> 
> I have been asked to delete 4K records from a Column Family that has a total of 1.8 million rows.  I have been given an excel spreadsheet with a list of the 4K PRIMARY KEY numbers to be deleted.  Great, the delete will be easy anyway.
> 
> But before I delete them I want to take a backup of what I'm deleting before I do, so that if the customer comes along and says they got the wrong numbers then I can quickly restore one or all of them.
> I have been trying to figure out how I can generate inserts from a select but it looks like this is not possible.
> 
> I'm using centos and Cassandra 2.11
> 
> Does anyone have any ideas of what I can do to generate inserts based on primary key numbers in an excel spreadsheet?
> 
> Kind Regards,
> Phil
> 
> 
> 
> 
> -- 
> Regards,
> Phil


Re: Insert from Select - CQL

Posted by Philip Ó Condúin <ph...@gmail.com>.
Hi Alain,

That is exactly what I did yesterday in the end.  I ran the selects and
output the results to a file, I ran some greps on that file to leave myself
with just the data rows removing any white space and headers.
I then copied this data into a notepad on my local machine and saved it as
a csv.  Luckily the results of the selects were delimited by pipe "|" so I
imported the csv into a spreadsheet and was able to separate the values
into columns.

From here I was able to build up the insert statements and now have 4K
insert statements as a backup.

Thanks a lot for your reply.

Kind regards,
Phil

On Thu, 25 Oct 2018 at 11:59, Alain RODRIGUEZ <ar...@gmail.com> wrote:

>
> Does anyone have any ideas of what I can do to generate inserts based on
>> primary key numbers in an excel spreadsheet?
>
>
> A quick thought:
>
> What about using a column of the spreadsheet to actually store the SELECT
> result and generate the INSERT statement (and I would probably do the
> DELETE too) corresponding to each row using the power of the spreadsheet to
> write the query once and have it for all the partitions with the proper
> values?
>
> The spreadsheet would then be your backup somehow.
>
> We are a bit far from any Cassandra advice, but that's my first thought on
> your problem, use the spreadsheet :).
> Another option is probably to SELECT these rows and INSERT them into some
> other Cassandra table (same cluster or not). Here you would have to code it
> I think (client app of any kind)
> This might not a good fit, but just in case, you might want to check at
> the 'COPY' statement:
> https://stackoverflow.com/questions/21363046/how-to-select-data-from-a-table-and-insert-into-another-table
> I'm not too sure what suits you the best.
>
> C*heers,
> -----------------------
> Alain Rodriguez - alain@thelastpickle.com
> France / Spain
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> Le mer. 24 oct. 2018 à 12:46, Philip Ó Condúin <ph...@gmail.com>
> a écrit :
>
>> Hi All,
>>
>> I have a problem that I'm trying to work out and can't find anything
>> online that may help me.
>>
>> I have been asked to delete 4K records from a Column Family that has a
>> total of 1.8 million rows.  I have been given an excel spreadsheet with a
>> list of the 4K PRIMARY KEY numbers to be deleted.  Great, the delete will
>> be easy anyway.
>>
>> But before I delete them I want to take a backup of what I'm deleting
>> before I do, so that if the customer comes along and says they got the
>> wrong numbers then I can quickly restore one or all of them.
>> I have been trying to figure out how I can generate inserts from a select
>> but it looks like this is not possible.
>>
>> I'm using centos and Cassandra 2.11
>>
>> Does anyone have any ideas of what I can do to generate inserts based on
>> primary key numbers in an excel spreadsheet?
>>
>> Kind Regards,
>> Phil
>>
>>
>>

-- 
Regards,
Phil

Re: Insert from Select - CQL

Posted by Alain RODRIGUEZ <ar...@gmail.com>.
> Does anyone have any ideas of what I can do to generate inserts based on
> primary key numbers in an excel spreadsheet?


A quick thought:

What about using a column of the spreadsheet to actually store the SELECT
result and generate the INSERT statement (and I would probably do the
DELETE too) corresponding to each row using the power of the spreadsheet to
write the query once and have it for all the partitions with the proper
values?

The spreadsheet would then be your backup somehow.

We are a bit far from any Cassandra advice, but that's my first thought on
your problem, use the spreadsheet :).
Another option is probably to SELECT these rows and INSERT them into some
other Cassandra table (same cluster or not). Here you would have to code it
I think (client app of any kind)
This might not a good fit, but just in case, you might want to check at the
'COPY' statement:
https://stackoverflow.com/questions/21363046/how-to-select-data-from-a-table-and-insert-into-another-table
I'm not too sure what suits you the best.

C*heers,
-----------------------
Alain Rodriguez - alain@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

Le mer. 24 oct. 2018 à 12:46, Philip Ó Condúin <ph...@gmail.com> a
écrit :

> Hi All,
>
> I have a problem that I'm trying to work out and can't find anything
> online that may help me.
>
> I have been asked to delete 4K records from a Column Family that has a
> total of 1.8 million rows.  I have been given an excel spreadsheet with a
> list of the 4K PRIMARY KEY numbers to be deleted.  Great, the delete will
> be easy anyway.
>
> But before I delete them I want to take a backup of what I'm deleting
> before I do, so that if the customer comes along and says they got the
> wrong numbers then I can quickly restore one or all of them.
> I have been trying to figure out how I can generate inserts from a select
> but it looks like this is not possible.
>
> I'm using centos and Cassandra 2.11
>
> Does anyone have any ideas of what I can do to generate inserts based on
> primary key numbers in an excel spreadsheet?
>
> Kind Regards,
> Phil
>
>
>