You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by adrien ruffie <ad...@hotmail.fr> on 2020/01/17 07:57:28 UTC

COPY command with where condition

Hello all,

In my company we want to export a big dataset of our cassandra's ring.
We search to use COPY command but I don't find if and how can a WHERE condition can be use ?

Because we need to export only several data which must be return by a WHERE closure, specially
and unfortunately with ALLOW FILTERING due to several old tables which were poorly conceptualized...

Do you know a means to do that please ?

Thank all and best regards

Adrian

Re: COPY command with where condition

Posted by Michael Shuler <mi...@pbandjelly.org>.
On 1/17/20 9:50 AM, adrien ruffie wrote:
> Thank you very much,
> 
>   so I do this request with for example -->
> 
> ./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT * 
> FROM probe_sensors WHERE localisation_id = 208812 ALLOW FILTERING" -url 
> /home/dump
> 
> 
> But I get the following error
> com.datastax.dsbulk.executor.api.exception.BulkExecutionException: 
> Statement execution failed: SELECT * FROM crt_sensors WHERE site_id = 
> 208812 ALLOW FILTERING (Cassandra timeout during read query at 
> consistency LOCAL_ONE (1 responses were required but only 0 replica 
> responded))
> 
> but I configured my driver with following driver.conf, but nothing work 
> correctly. Do you know what is the problem ?
> 
> datastax-java-driver {
>      basic {
> 
> 
>          contact-points = ["data1com:9042","data2.com:9042"]

typo?

mshuler@hana:~$ echo "QUIT" | nc -w 10 data2.com 9042
data2.com [35.208.148.117] 9042 (?) : Connection timed out

> 
>          request {
>              timeout = "2000000"
>              consistency = "LOCAL_ONE"
> 
>          }
>      }
>      advanced {
> 
>          auth-provider {
>              class = PlainTextAuthProvider
>              username = "superuser"
>              password = "mypass"
> 
>          }
>      }
> }

Kind regards,
Michael

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org


Re: COPY command with where condition

Posted by Alex Ott <al...@gmail.com>.
I think that you may avoid timeout if you specify token condition inside
WHERE, like:

-query "SELECT * FROM probe_sensors WHERE token(...) > :start and
token(...) <= :end AND localisation_id = 208812 ALLOW FILTERING"

replace ... with list of partition key names


On Fri, Jan 17, 2020 at 7:47 PM Chris Splinter <ch...@gmail.com>
wrote:

> Do you know your partition keys?
>
> One option could be to enumerate that list of partition keys in separate
> cmds to make the individual operations less expensive for the cluster.
>
> For example:
> Say your partition key column is called id and the ids in your database
> are [1,2,3]
>
> You could do
> ./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT *
> FROM probe_sensors WHERE id = 1 AND localisation_id = 208812" -url
> /home/dump
> ./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT *
> FROM probe_sensors WHERE id = 2 AND localisation_id = 208812" -url
> /home/dump
> ./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT *
> FROM probe_sensors WHERE id = 3 AND localisation_id = 208812" -url
> /home/dump
>
>
> Does that option work for you?
>
>
>
> On Fri, Jan 17, 2020 at 12:17 PM adrien ruffie <ad...@hotmail.fr>
> wrote:
>
>> I don't really know for the moment in production environment, but for
>> developpment environment the table contains more than 10.000.000 rows.
>> But we need just a sub dataset of this table not the entirety ...
>> ------------------------------
>> *De :* Chris Splinter <ch...@gmail.com>
>> *Envoyé :* vendredi 17 janvier 2020 17:40
>> *À :* adrien ruffie <ad...@hotmail.fr>
>> *Cc :* user@cassandra.apache.org <us...@cassandra.apache.org>; Erick
>> Ramirez <fl...@gmail.com>
>> *Objet :* Re: COPY command with where condition
>>
>> What you are seeing there is a standard read timeout, how many rows do
>> you expect back from that query?
>>
>> On Fri, Jan 17, 2020 at 9:50 AM adrien ruffie <ad...@hotmail.fr>
>> wrote:
>>
>> Thank you very much,
>>
>>  so I do this request with for example -->
>>
>> ./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT *
>> FROM probe_sensors WHERE localisation_id = 208812 ALLOW FILTERING" -url
>> /home/dump
>>
>>
>> But I get the following error
>> com.datastax.dsbulk.executor.api.exception.BulkExecutionException:
>> Statement execution failed: SELECT * FROM crt_sensors WHERE site_id =
>> 208812 ALLOW FILTERING (Cassandra timeout during read query at consistency
>> LOCAL_ONE (1 responses were required but only 0 replica responded))
>>
>> but I configured my driver with following driver.conf, but nothing work
>> correctly. Do you know what is the problem ?
>>
>> datastax-java-driver {
>>     basic {
>>
>>
>>         contact-points = ["data1com:9042","data2.com:9042"]
>>
>>         request {
>>             timeout = "2000000"
>>             consistency = "LOCAL_ONE"
>>
>>         }
>>     }
>>     advanced {
>>
>>         auth-provider {
>>             class = PlainTextAuthProvider
>>             username = "superuser"
>>             password = "mypass"
>>
>>         }
>>     }
>> }
>> ------------------------------
>> *De :* Chris Splinter <ch...@gmail.com>
>> *Envoyé :* vendredi 17 janvier 2020 16:17
>> *À :* user@cassandra.apache.org <us...@cassandra.apache.org>
>> *Cc :* Erick Ramirez <fl...@gmail.com>
>> *Objet :* Re: COPY command with where condition
>>
>> DSBulk has an option that lets you specify the query ( including a WHERE
>> clause )
>>
>> See Example 19 in this blog post for details:
>> https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading
>>
>> On Fri, Jan 17, 2020 at 7:34 AM Jean Tremblay <
>> jean.tremblay@zen-innovations.com> wrote:
>>
>> Did you think about using a Materialised View to generate what you want
>> to keep, and then use DSBulk to extract the data?
>>
>> On 17 Jan 2020, at 14:30 , adrien ruffie <ad...@hotmail.fr>
>> wrote:
>>
>> Sorry I come back to a quick question about the bulk loader ...
>>
>> https://www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader
>>
>> I read this : "Operations such as converting strings to lowercase,
>> arithmetic on input columns, or filtering out rows based on some criteria,
>> are not supported. "
>>
>> Consequently, it's still not possible to use a WHERE clause with DSBulk,
>> right ?
>>
>> I don't really know how I can do it, in order to don't keep the wholeness
>> of business data already stored and which don't need to export...
>>
>>
>>
>> ------------------------------
>> *De :* adrien ruffie <ad...@hotmail.fr>
>> *Envoyé :* vendredi 17 janvier 2020 11:39
>> *À :* Erick Ramirez <fl...@gmail.com>; user@cassandra.apache.org <
>> user@cassandra.apache.org>
>> *Objet :* RE: COPY command with where condition
>>
>> Thank a lot !
>> It's a good news for DSBulk ! I will take a look around this solution.
>>
>> best regards,
>> Adrian
>> ------------------------------
>> *De :* Erick Ramirez <fl...@gmail.com>
>> *Envoyé :* vendredi 17 janvier 2020 10:02
>> *À :* user@cassandra.apache.org <us...@cassandra.apache.org>
>> *Objet :* Re: COPY command with where condition
>>
>> The COPY command doesn't support filtering and it doesn't perform well
>> for large tables.
>>
>> Have you considered the DSBulk tool from DataStax? Previously, it only
>> worked with DataStax Enterprise but a few weeks ago, it was made free and
>> works with open-source Apache Cassandra. For details, see this blogpost
>> <https://www.datastax.com/blog/2019/12/tools-for-apache-cassandra>.
>> Cheers!
>>
>> On Fri, Jan 17, 2020 at 6:57 PM adrien ruffie <ad...@hotmail.fr>
>> wrote:
>>
>> Hello all,
>>
>> In my company we want to export a big dataset of our cassandra's ring.
>> We search to use COPY command but I don't find if and how can a WHERE
>> condition can be use ?
>>
>> Because we need to export only several data which must be return by a
>> WHERE closure, specially
>> and unfortunately with ALLOW FILTERING due to several old tables which
>> were poorly conceptualized...
>>
>> Do you know a means to do that please ?
>>
>> Thank all and best regards
>>
>> Adrian
>>
>>
>>

-- 
With best wishes,                    Alex Ott
http://alexott.net/
Twitter: alexott_en (English), alexott (Russian)

Re: [EXTERNAL] Re: COPY command with where condition

Posted by Jean Carlo <je...@gmail.com>.
Hello

Nobody has mentioned but you can use spark cassandra connector also.
Preferably if your data set is so big that a simple copy to csv cannot
handle it

Saludos

Jean Carlo

"The best way to predict the future is to invent it" Alan Kay


On Fri, Jan 17, 2020 at 8:11 PM Durity, Sean R <SE...@homedepot.com>
wrote:

> sstablekeys (in the tools directory?) can extract the actual keys from
> your sstables. You have to run it on each node and then combine and de-dupe
> the final results, but I have used this technique with a query generator to
> extract data more efficiently.
>
>
>
>
>
> Sean Durity
>
>
>
> *From:* Chris Splinter <ch...@gmail.com>
> *Sent:* Friday, January 17, 2020 1:47 PM
> *To:* adrien ruffie <ad...@hotmail.fr>
> *Cc:* user@cassandra.apache.org; Erick Ramirez <fl...@gmail.com>
> *Subject:* [EXTERNAL] Re: COPY command with where condition
>
>
>
> Do you know your partition keys?
>
>
>
> One option could be to enumerate that list of partition keys in separate
> cmds to make the individual operations less expensive for the cluster.
>
>
>
> For example:
>
> Say your partition key column is called id and the ids in your database
> are [1,2,3]
>
>
>
> You could do
>
> ./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT *
> FROM probe_sensors WHERE id = 1 AND localisation_id = 208812" -url
> /home/dump
>
> ./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT *
> FROM probe_sensors WHERE id = 2 AND localisation_id = 208812" -url
> /home/dump
>
> ./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT *
> FROM probe_sensors WHERE id = 3 AND localisation_id = 208812" -url
> /home/dump
>
>
>
>
>
> Does that option work for you?
>
>
>
>
>
>
>
> On Fri, Jan 17, 2020 at 12:17 PM adrien ruffie <ad...@hotmail.fr>
> wrote:
>
> I don't really know for the moment in production environment, but for
> developpment environment the table contains more than 10.000.000 rows.
>
> But we need just a sub dataset of this table not the entirety ...
> ------------------------------
>
> *De :* Chris Splinter <ch...@gmail.com>
> *Envoyé :* vendredi 17 janvier 2020 17:40
> *À :* adrien ruffie <ad...@hotmail.fr>
> *Cc :* user@cassandra.apache.org <us...@cassandra.apache.org>; Erick
> Ramirez <fl...@gmail.com>
> *Objet :* Re: COPY command with where condition
>
>
>
> What you are seeing there is a standard read timeout, how many rows do you
> expect back from that query?
>
>
>
> On Fri, Jan 17, 2020 at 9:50 AM adrien ruffie <ad...@hotmail.fr>
> wrote:
>
> Thank you very much,
>
>
>
>  so I do this request with for example -->
>
>
>
> ./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT *
> FROM probe_sensors WHERE localisation_id = 208812 ALLOW FILTERING" -url
> /home/dump
>
>
>
>
>
> But I get the following error
>
> com.datastax.dsbulk.executor.api.exception.BulkExecutionException:
> Statement execution failed: SELECT * FROM crt_sensors WHERE site_id =
> 208812 ALLOW FILTERING (Cassandra timeout during read query at consistency
> LOCAL_ONE (1 responses were required but only 0 replica responded))
>
>
>
> but I configured my driver with following driver.conf, but nothing work
> correctly. Do you know what is the problem ?
>
>
>
> datastax-java-driver {
>
>     basic {
>
>
>
>
>
>         contact-points = ["data1com:9042","data2.com:9042 [data2.com]
> <https://urldefense.com/v3/__http:/data2.com:9042__;!!M-nmYVHPHQ!aPA4KExKulLx_PrHwhUQwPy881v1sjBkj35R1lAx2EUxSkRCLwmtNon0SMW0XbLKH7jCV5U$>
> "]
>
>
>
>         request {
>
>             timeout = "2000000"
>
>             consistency = "LOCAL_ONE"
>
>
>
>         }
>
>     }
>
>     advanced {
>
>
>
>         auth-provider {
>
>             class = PlainTextAuthProvider
>
>             username = "superuser"
>
>             password = "mypass"
>
>
>
>         }
>
>     }
>
> }
> ------------------------------
>
> *De :* Chris Splinter <ch...@gmail.com>
> *Envoyé :* vendredi 17 janvier 2020 16:17
> *À :* user@cassandra.apache.org <us...@cassandra.apache.org>
> *Cc :* Erick Ramirez <fl...@gmail.com>
> *Objet :* Re: COPY command with where condition
>
>
>
> DSBulk has an option that lets you specify the query ( including a WHERE
> clause )
>
>
>
> See Example 19 in this blog post for details: https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading
> [datastax.com]
> <https://urldefense.com/v3/__https:/www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading__;!!M-nmYVHPHQ!aPA4KExKulLx_PrHwhUQwPy881v1sjBkj35R1lAx2EUxSkRCLwmtNon0SMW0XbLKBUuw2Cc$>
>
>
>
> On Fri, Jan 17, 2020 at 7:34 AM Jean Tremblay <
> jean.tremblay@zen-innovations.com> wrote:
>
> Did you think about using a Materialised View to generate what you want to
> keep, and then use DSBulk to extract the data?
>
>
>
> On 17 Jan 2020, at 14:30 , adrien ruffie <ad...@hotmail.fr>
> wrote:
>
>
>
> Sorry I come back to a quick question about the bulk loader ...
>
>
>
> https://www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader
> [datastax.com]
> <https://urldefense.com/v3/__https:/www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader__;!!M-nmYVHPHQ!aPA4KExKulLx_PrHwhUQwPy881v1sjBkj35R1lAx2EUxSkRCLwmtNon0SMW0XbLKLr1rFjk$>
>
>
>
> I read this : "Operations such as converting strings to lowercase,
> arithmetic on input columns, or filtering out rows based on some criteria,
> are not supported. "
>
>
>
> Consequently, it's still not possible to use a WHERE clause with DSBulk,
> right ?
>
>
>
> I don't really know how I can do it, in order to don't keep the wholeness
> of business data already stored and which don't need to export...
>
>
>
>
>
>
> ------------------------------
>
> *De :* adrien ruffie <ad...@hotmail.fr>
> *Envoyé :* vendredi 17 janvier 2020 11:39
> *À :* Erick Ramirez <fl...@gmail.com>; user@cassandra.apache.org <
> user@cassandra.apache.org>
> *Objet :* RE: COPY command with where condition
>
>
>
> Thank a lot !
>
> It's a good news for DSBulk ! I will take a look around this solution.
>
>
>
> best regards,
>
> Adrian
> ------------------------------
>
> *De :* Erick Ramirez <fl...@gmail.com>
> *Envoyé :* vendredi 17 janvier 2020 10:02
> *À :* user@cassandra.apache.org <us...@cassandra.apache.org>
> *Objet :* Re: COPY command with where condition
>
>
>
> The COPY command doesn't support filtering and it doesn't perform well for
> large tables.
>
>
>
> Have you considered the DSBulk tool from DataStax? Previously, it only
> worked with DataStax Enterprise but a few weeks ago, it was made free and
> works with open-source Apache Cassandra. For details, see this blogpost
> [datastax.com]
> <https://urldefense.com/v3/__https:/www.datastax.com/blog/2019/12/tools-for-apache-cassandra__;!!M-nmYVHPHQ!aPA4KExKulLx_PrHwhUQwPy881v1sjBkj35R1lAx2EUxSkRCLwmtNon0SMW0XbLKg1mXfCU$>.
> Cheers!
>
>
>
> On Fri, Jan 17, 2020 at 6:57 PM adrien ruffie <ad...@hotmail.fr>
> wrote:
>
> Hello all,
>
>
>
> In my company we want to export a big dataset of our cassandra's ring.
>
> We search to use COPY command but I don't find if and how can a WHERE
> condition can be use ?
>
>
>
> Because we need to export only several data which must be return by a
> WHERE closure, specially
>
> and unfortunately with ALLOW FILTERING due to several old tables which
> were poorly conceptualized...
>
>
>
> Do you know a means to do that please ?
>
>
>
> Thank all and best regards
>
>
>
> Adrian
>
>
>
>
> ------------------------------
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>

RE: [EXTERNAL] Re: COPY command with where condition

Posted by "Durity, Sean R" <SE...@homedepot.com>.
sstablekeys (in the tools directory?) can extract the actual keys from your sstables. You have to run it on each node and then combine and de-dupe the final results, but I have used this technique with a query generator to extract data more efficiently.


Sean Durity

From: Chris Splinter <ch...@gmail.com>
Sent: Friday, January 17, 2020 1:47 PM
To: adrien ruffie <ad...@hotmail.fr>
Cc: user@cassandra.apache.org; Erick Ramirez <fl...@gmail.com>
Subject: [EXTERNAL] Re: COPY command with where condition

Do you know your partition keys?

One option could be to enumerate that list of partition keys in separate cmds to make the individual operations less expensive for the cluster.

For example:
Say your partition key column is called id and the ids in your database are [1,2,3]

You could do
./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT * FROM probe_sensors WHERE id = 1 AND localisation_id = 208812" -url /home/dump
./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT * FROM probe_sensors WHERE id = 2 AND localisation_id = 208812" -url /home/dump
./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT * FROM probe_sensors WHERE id = 3 AND localisation_id = 208812" -url /home/dump


Does that option work for you?



On Fri, Jan 17, 2020 at 12:17 PM adrien ruffie <ad...@hotmail.fr>> wrote:
I don't really know for the moment in production environment, but for developpment environment the table contains more than 10.000.000 rows.
But we need just a sub dataset of this table not the entirety ...
________________________________
De : Chris Splinter <ch...@gmail.com>>
Envoyé : vendredi 17 janvier 2020 17:40
À : adrien ruffie <ad...@hotmail.fr>>
Cc : user@cassandra.apache.org<ma...@cassandra.apache.org> <us...@cassandra.apache.org>>; Erick Ramirez <fl...@gmail.com>>
Objet : Re: COPY command with where condition

What you are seeing there is a standard read timeout, how many rows do you expect back from that query?

On Fri, Jan 17, 2020 at 9:50 AM adrien ruffie <ad...@hotmail.fr>> wrote:
Thank you very much,

 so I do this request with for example -->

./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT * FROM probe_sensors WHERE localisation_id = 208812 ALLOW FILTERING" -url /home/dump


But I get the following error
com.datastax.dsbulk.executor.api.exception.BulkExecutionException: Statement execution failed: SELECT * FROM crt_sensors WHERE site_id = 208812 ALLOW FILTERING (Cassandra timeout during read query at consistency LOCAL_ONE (1 responses were required but only 0 replica responded))

but I configured my driver with following driver.conf, but nothing work correctly. Do you know what is the problem ?

datastax-java-driver {
    basic {


        contact-points = ["data1com:9042","data2.com:9042 [data2.com]<https://urldefense.com/v3/__http:/data2.com:9042__;!!M-nmYVHPHQ!aPA4KExKulLx_PrHwhUQwPy881v1sjBkj35R1lAx2EUxSkRCLwmtNon0SMW0XbLKH7jCV5U$>"]

        request {
            timeout = "2000000"
            consistency = "LOCAL_ONE"

        }
    }
    advanced {

        auth-provider {
            class = PlainTextAuthProvider
            username = "superuser"
            password = "mypass"

        }
    }
}
________________________________
De : Chris Splinter <ch...@gmail.com>>
Envoyé : vendredi 17 janvier 2020 16:17
À : user@cassandra.apache.org<ma...@cassandra.apache.org> <us...@cassandra.apache.org>>
Cc : Erick Ramirez <fl...@gmail.com>>
Objet : Re: COPY command with where condition

DSBulk has an option that lets you specify the query ( including a WHERE clause )

See Example 19 in this blog post for details: https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading [datastax.com]<https://urldefense.com/v3/__https:/www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading__;!!M-nmYVHPHQ!aPA4KExKulLx_PrHwhUQwPy881v1sjBkj35R1lAx2EUxSkRCLwmtNon0SMW0XbLKBUuw2Cc$>

On Fri, Jan 17, 2020 at 7:34 AM Jean Tremblay <je...@zen-innovations.com>> wrote:
Did you think about using a Materialised View to generate what you want to keep, and then use DSBulk to extract the data?


On 17 Jan 2020, at 14:30 , adrien ruffie <ad...@hotmail.fr>> wrote:

Sorry I come back to a quick question about the bulk loader ...

https://www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader [datastax.com]<https://urldefense.com/v3/__https:/www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader__;!!M-nmYVHPHQ!aPA4KExKulLx_PrHwhUQwPy881v1sjBkj35R1lAx2EUxSkRCLwmtNon0SMW0XbLKLr1rFjk$>

I read this : "Operations such as converting strings to lowercase, arithmetic on input columns, or filtering out rows based on some criteria, are not supported. "

Consequently, it's still not possible to use a WHERE clause with DSBulk, right ?

I don't really know how I can do it, in order to don't keep the wholeness of business data already stored and which don't need to export...



________________________________
De : adrien ruffie <ad...@hotmail.fr>>
Envoyé : vendredi 17 janvier 2020 11:39
À : Erick Ramirez <fl...@gmail.com>>; user@cassandra.apache.org<ma...@cassandra.apache.org> <us...@cassandra.apache.org>>
Objet : RE: COPY command with where condition

Thank a lot !
It's a good news for DSBulk ! I will take a look around this solution.

best regards,
Adrian
________________________________
De : Erick Ramirez <fl...@gmail.com>>
Envoyé : vendredi 17 janvier 2020 10:02
À : user@cassandra.apache.org<ma...@cassandra.apache.org> <us...@cassandra.apache.org>>
Objet : Re: COPY command with where condition

The COPY command doesn't support filtering and it doesn't perform well for large tables.

Have you considered the DSBulk tool from DataStax? Previously, it only worked with DataStax Enterprise but a few weeks ago, it was made free and works with open-source Apache Cassandra. For details, see this blogpost [datastax.com]<https://urldefense.com/v3/__https:/www.datastax.com/blog/2019/12/tools-for-apache-cassandra__;!!M-nmYVHPHQ!aPA4KExKulLx_PrHwhUQwPy881v1sjBkj35R1lAx2EUxSkRCLwmtNon0SMW0XbLKg1mXfCU$>. Cheers!

On Fri, Jan 17, 2020 at 6:57 PM adrien ruffie <ad...@hotmail.fr>> wrote:
Hello all,

In my company we want to export a big dataset of our cassandra's ring.
We search to use COPY command but I don't find if and how can a WHERE condition can be use ?

Because we need to export only several data which must be return by a WHERE closure, specially
and unfortunately with ALLOW FILTERING due to several old tables which were poorly conceptualized...

Do you know a means to do that please ?

Thank all and best regards

Adrian


________________________________

The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.

Re: COPY command with where condition

Posted by Chris Splinter <ch...@gmail.com>.
Do you know your partition keys?

One option could be to enumerate that list of partition keys in separate
cmds to make the individual operations less expensive for the cluster.

For example:
Say your partition key column is called id and the ids in your database are
[1,2,3]

You could do
./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT *
FROM probe_sensors WHERE id = 1 AND localisation_id = 208812" -url
/home/dump
./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT *
FROM probe_sensors WHERE id = 2 AND localisation_id = 208812" -url
/home/dump
./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT *
FROM probe_sensors WHERE id = 3 AND localisation_id = 208812" -url
/home/dump


Does that option work for you?



On Fri, Jan 17, 2020 at 12:17 PM adrien ruffie <ad...@hotmail.fr>
wrote:

> I don't really know for the moment in production environment, but for
> developpment environment the table contains more than 10.000.000 rows.
> But we need just a sub dataset of this table not the entirety ...
> ------------------------------
> *De :* Chris Splinter <ch...@gmail.com>
> *Envoyé :* vendredi 17 janvier 2020 17:40
> *À :* adrien ruffie <ad...@hotmail.fr>
> *Cc :* user@cassandra.apache.org <us...@cassandra.apache.org>; Erick
> Ramirez <fl...@gmail.com>
> *Objet :* Re: COPY command with where condition
>
> What you are seeing there is a standard read timeout, how many rows do you
> expect back from that query?
>
> On Fri, Jan 17, 2020 at 9:50 AM adrien ruffie <ad...@hotmail.fr>
> wrote:
>
> Thank you very much,
>
>  so I do this request with for example -->
>
> ./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT *
> FROM probe_sensors WHERE localisation_id = 208812 ALLOW FILTERING" -url
> /home/dump
>
>
> But I get the following error
> com.datastax.dsbulk.executor.api.exception.BulkExecutionException:
> Statement execution failed: SELECT * FROM crt_sensors WHERE site_id =
> 208812 ALLOW FILTERING (Cassandra timeout during read query at consistency
> LOCAL_ONE (1 responses were required but only 0 replica responded))
>
> but I configured my driver with following driver.conf, but nothing work
> correctly. Do you know what is the problem ?
>
> datastax-java-driver {
>     basic {
>
>
>         contact-points = ["data1com:9042","data2.com:9042"]
>
>         request {
>             timeout = "2000000"
>             consistency = "LOCAL_ONE"
>
>         }
>     }
>     advanced {
>
>         auth-provider {
>             class = PlainTextAuthProvider
>             username = "superuser"
>             password = "mypass"
>
>         }
>     }
> }
> ------------------------------
> *De :* Chris Splinter <ch...@gmail.com>
> *Envoyé :* vendredi 17 janvier 2020 16:17
> *À :* user@cassandra.apache.org <us...@cassandra.apache.org>
> *Cc :* Erick Ramirez <fl...@gmail.com>
> *Objet :* Re: COPY command with where condition
>
> DSBulk has an option that lets you specify the query ( including a WHERE
> clause )
>
> See Example 19 in this blog post for details:
> https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading
>
> On Fri, Jan 17, 2020 at 7:34 AM Jean Tremblay <
> jean.tremblay@zen-innovations.com> wrote:
>
> Did you think about using a Materialised View to generate what you want to
> keep, and then use DSBulk to extract the data?
>
> On 17 Jan 2020, at 14:30 , adrien ruffie <ad...@hotmail.fr>
> wrote:
>
> Sorry I come back to a quick question about the bulk loader ...
>
> https://www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader
>
> I read this : "Operations such as converting strings to lowercase,
> arithmetic on input columns, or filtering out rows based on some criteria,
> are not supported. "
>
> Consequently, it's still not possible to use a WHERE clause with DSBulk,
> right ?
>
> I don't really know how I can do it, in order to don't keep the wholeness
> of business data already stored and which don't need to export...
>
>
>
> ------------------------------
> *De :* adrien ruffie <ad...@hotmail.fr>
> *Envoyé :* vendredi 17 janvier 2020 11:39
> *À :* Erick Ramirez <fl...@gmail.com>; user@cassandra.apache.org <
> user@cassandra.apache.org>
> *Objet :* RE: COPY command with where condition
>
> Thank a lot !
> It's a good news for DSBulk ! I will take a look around this solution.
>
> best regards,
> Adrian
> ------------------------------
> *De :* Erick Ramirez <fl...@gmail.com>
> *Envoyé :* vendredi 17 janvier 2020 10:02
> *À :* user@cassandra.apache.org <us...@cassandra.apache.org>
> *Objet :* Re: COPY command with where condition
>
> The COPY command doesn't support filtering and it doesn't perform well for
> large tables.
>
> Have you considered the DSBulk tool from DataStax? Previously, it only
> worked with DataStax Enterprise but a few weeks ago, it was made free and
> works with open-source Apache Cassandra. For details, see this blogpost
> <https://www.datastax.com/blog/2019/12/tools-for-apache-cassandra>.
> Cheers!
>
> On Fri, Jan 17, 2020 at 6:57 PM adrien ruffie <ad...@hotmail.fr>
> wrote:
>
> Hello all,
>
> In my company we want to export a big dataset of our cassandra's ring.
> We search to use COPY command but I don't find if and how can a WHERE
> condition can be use ?
>
> Because we need to export only several data which must be return by a
> WHERE closure, specially
> and unfortunately with ALLOW FILTERING due to several old tables which
> were poorly conceptualized...
>
> Do you know a means to do that please ?
>
> Thank all and best regards
>
> Adrian
>
>
>

RE: COPY command with where condition

Posted by adrien ruffie <ad...@hotmail.fr>.
I don't really know for the moment in production environment, but for developpment environment the table contains more than 10.000.000 rows.
But we need just a sub dataset of this table not the entirety ...
________________________________
De : Chris Splinter <ch...@gmail.com>
Envoyé : vendredi 17 janvier 2020 17:40
À : adrien ruffie <ad...@hotmail.fr>
Cc : user@cassandra.apache.org <us...@cassandra.apache.org>; Erick Ramirez <fl...@gmail.com>
Objet : Re: COPY command with where condition

What you are seeing there is a standard read timeout, how many rows do you expect back from that query?

On Fri, Jan 17, 2020 at 9:50 AM adrien ruffie <ad...@hotmail.fr>> wrote:
Thank you very much,

 so I do this request with for example -->

./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT * FROM probe_sensors WHERE localisation_id = 208812 ALLOW FILTERING" -url /home/dump


But I get the following error
com.datastax.dsbulk.executor.api.exception.BulkExecutionException: Statement execution failed: SELECT * FROM crt_sensors WHERE site_id = 208812 ALLOW FILTERING (Cassandra timeout during read query at consistency LOCAL_ONE (1 responses were required but only 0 replica responded))

but I configured my driver with following driver.conf, but nothing work correctly. Do you know what is the problem ?

datastax-java-driver {
    basic {


        contact-points = ["data1com:9042","data2.com:9042<http://data2.com:9042>"]

        request {
            timeout = "2000000"
            consistency = "LOCAL_ONE"

        }
    }
    advanced {

        auth-provider {
            class = PlainTextAuthProvider
            username = "superuser"
            password = "mypass"

        }
    }
}
________________________________
De : Chris Splinter <ch...@gmail.com>>
Envoyé : vendredi 17 janvier 2020 16:17
À : user@cassandra.apache.org<ma...@cassandra.apache.org> <us...@cassandra.apache.org>>
Cc : Erick Ramirez <fl...@gmail.com>>
Objet : Re: COPY command with where condition

DSBulk has an option that lets you specify the query ( including a WHERE clause )

See Example 19 in this blog post for details: https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading

On Fri, Jan 17, 2020 at 7:34 AM Jean Tremblay <je...@zen-innovations.com>> wrote:
Did you think about using a Materialised View to generate what you want to keep, and then use DSBulk to extract the data?

On 17 Jan 2020, at 14:30 , adrien ruffie <ad...@hotmail.fr>> wrote:

Sorry I come back to a quick question about the bulk loader ...

https://www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader

I read this : "Operations such as converting strings to lowercase, arithmetic on input columns, or filtering out rows based on some criteria, are not supported. "

Consequently, it's still not possible to use a WHERE clause with DSBulk, right ?

I don't really know how I can do it, in order to don't keep the wholeness of business data already stored and which don't need to export...



________________________________
De : adrien ruffie <ad...@hotmail.fr>>
Envoyé : vendredi 17 janvier 2020 11:39
À : Erick Ramirez <fl...@gmail.com>>; user@cassandra.apache.org<ma...@cassandra.apache.org> <us...@cassandra.apache.org>>
Objet : RE: COPY command with where condition

Thank a lot !
It's a good news for DSBulk ! I will take a look around this solution.

best regards,
Adrian
________________________________
De : Erick Ramirez <fl...@gmail.com>>
Envoyé : vendredi 17 janvier 2020 10:02
À : user@cassandra.apache.org<ma...@cassandra.apache.org> <us...@cassandra.apache.org>>
Objet : Re: COPY command with where condition

The COPY command doesn't support filtering and it doesn't perform well for large tables.

Have you considered the DSBulk tool from DataStax? Previously, it only worked with DataStax Enterprise but a few weeks ago, it was made free and works with open-source Apache Cassandra. For details, see this blogpost<https://www.datastax.com/blog/2019/12/tools-for-apache-cassandra>. Cheers!

On Fri, Jan 17, 2020 at 6:57 PM adrien ruffie <ad...@hotmail.fr>> wrote:
Hello all,

In my company we want to export a big dataset of our cassandra's ring.
We search to use COPY command but I don't find if and how can a WHERE condition can be use ?

Because we need to export only several data which must be return by a WHERE closure, specially
and unfortunately with ALLOW FILTERING due to several old tables which were poorly conceptualized...

Do you know a means to do that please ?

Thank all and best regards

Adrian


Re: COPY command with where condition

Posted by Chris Splinter <ch...@gmail.com>.
What you are seeing there is a standard read timeout, how many rows do you
expect back from that query?

On Fri, Jan 17, 2020 at 9:50 AM adrien ruffie <ad...@hotmail.fr>
wrote:

> Thank you very much,
>
>  so I do this request with for example -->
>
> ./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT *
> FROM probe_sensors WHERE localisation_id = 208812 ALLOW FILTERING" -url
> /home/dump
>
>
> But I get the following error
> com.datastax.dsbulk.executor.api.exception.BulkExecutionException:
> Statement execution failed: SELECT * FROM crt_sensors WHERE site_id =
> 208812 ALLOW FILTERING (Cassandra timeout during read query at consistency
> LOCAL_ONE (1 responses were required but only 0 replica responded))
>
> but I configured my driver with following driver.conf, but nothing work
> correctly. Do you know what is the problem ?
>
> datastax-java-driver {
>     basic {
>
>
>         contact-points = ["data1com:9042","data2.com:9042"]
>
>         request {
>             timeout = "2000000"
>             consistency = "LOCAL_ONE"
>
>         }
>     }
>     advanced {
>
>         auth-provider {
>             class = PlainTextAuthProvider
>             username = "superuser"
>             password = "mypass"
>
>         }
>     }
> }
> ------------------------------
> *De :* Chris Splinter <ch...@gmail.com>
> *Envoyé :* vendredi 17 janvier 2020 16:17
> *À :* user@cassandra.apache.org <us...@cassandra.apache.org>
> *Cc :* Erick Ramirez <fl...@gmail.com>
> *Objet :* Re: COPY command with where condition
>
> DSBulk has an option that lets you specify the query ( including a WHERE
> clause )
>
> See Example 19 in this blog post for details:
> https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading
>
> On Fri, Jan 17, 2020 at 7:34 AM Jean Tremblay <
> jean.tremblay@zen-innovations.com> wrote:
>
> Did you think about using a Materialised View to generate what you want to
> keep, and then use DSBulk to extract the data?
>
> On 17 Jan 2020, at 14:30 , adrien ruffie <ad...@hotmail.fr>
> wrote:
>
> Sorry I come back to a quick question about the bulk loader ...
>
> https://www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader
>
> I read this : "Operations such as converting strings to lowercase,
> arithmetic on input columns, or filtering out rows based on some criteria,
> are not supported. "
>
> Consequently, it's still not possible to use a WHERE clause with DSBulk,
> right ?
>
> I don't really know how I can do it, in order to don't keep the wholeness
> of business data already stored and which don't need to export...
>
>
>
> ------------------------------
> *De :* adrien ruffie <ad...@hotmail.fr>
> *Envoyé :* vendredi 17 janvier 2020 11:39
> *À :* Erick Ramirez <fl...@gmail.com>; user@cassandra.apache.org <
> user@cassandra.apache.org>
> *Objet :* RE: COPY command with where condition
>
> Thank a lot !
> It's a good news for DSBulk ! I will take a look around this solution.
>
> best regards,
> Adrian
> ------------------------------
> *De :* Erick Ramirez <fl...@gmail.com>
> *Envoyé :* vendredi 17 janvier 2020 10:02
> *À :* user@cassandra.apache.org <us...@cassandra.apache.org>
> *Objet :* Re: COPY command with where condition
>
> The COPY command doesn't support filtering and it doesn't perform well for
> large tables.
>
> Have you considered the DSBulk tool from DataStax? Previously, it only
> worked with DataStax Enterprise but a few weeks ago, it was made free and
> works with open-source Apache Cassandra. For details, see this blogpost
> <https://www.datastax.com/blog/2019/12/tools-for-apache-cassandra>.
> Cheers!
>
> On Fri, Jan 17, 2020 at 6:57 PM adrien ruffie <ad...@hotmail.fr>
> wrote:
>
> Hello all,
>
> In my company we want to export a big dataset of our cassandra's ring.
> We search to use COPY command but I don't find if and how can a WHERE
> condition can be use ?
>
> Because we need to export only several data which must be return by a
> WHERE closure, specially
> and unfortunately with ALLOW FILTERING due to several old tables which
> were poorly conceptualized...
>
> Do you know a means to do that please ?
>
> Thank all and best regards
>
> Adrian
>
>
>

RE: COPY command with where condition

Posted by adrien ruffie <ad...@hotmail.fr>.
Thank you very much,

 so I do this request with for example -->

./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT * FROM probe_sensors WHERE localisation_id = 208812 ALLOW FILTERING" -url /home/dump


But I get the following error
com.datastax.dsbulk.executor.api.exception.BulkExecutionException: Statement execution failed: SELECT * FROM crt_sensors WHERE site_id = 208812 ALLOW FILTERING (Cassandra timeout during read query at consistency LOCAL_ONE (1 responses were required but only 0 replica responded))

but I configured my driver with following driver.conf, but nothing work correctly. Do you know what is the problem ?

datastax-java-driver {
    basic {


        contact-points = ["data1com:9042","data2.com:9042"]

        request {
            timeout = "2000000"
            consistency = "LOCAL_ONE"

        }
    }
    advanced {

        auth-provider {
            class = PlainTextAuthProvider
            username = "superuser"
            password = "mypass"

        }
    }
}
________________________________
De : Chris Splinter <ch...@gmail.com>
Envoyé : vendredi 17 janvier 2020 16:17
À : user@cassandra.apache.org <us...@cassandra.apache.org>
Cc : Erick Ramirez <fl...@gmail.com>
Objet : Re: COPY command with where condition

DSBulk has an option that lets you specify the query ( including a WHERE clause )

See Example 19 in this blog post for details: https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading

On Fri, Jan 17, 2020 at 7:34 AM Jean Tremblay <je...@zen-innovations.com>> wrote:
Did you think about using a Materialised View to generate what you want to keep, and then use DSBulk to extract the data?

On 17 Jan 2020, at 14:30 , adrien ruffie <ad...@hotmail.fr>> wrote:

Sorry I come back to a quick question about the bulk loader ...

https://www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader

I read this : "Operations such as converting strings to lowercase, arithmetic on input columns, or filtering out rows based on some criteria, are not supported. "

Consequently, it's still not possible to use a WHERE clause with DSBulk, right ?

I don't really know how I can do it, in order to don't keep the wholeness of business data already stored and which don't need to export...



________________________________
De : adrien ruffie <ad...@hotmail.fr>>
Envoyé : vendredi 17 janvier 2020 11:39
À : Erick Ramirez <fl...@gmail.com>>; user@cassandra.apache.org<ma...@cassandra.apache.org> <us...@cassandra.apache.org>>
Objet : RE: COPY command with where condition

Thank a lot !
It's a good news for DSBulk ! I will take a look around this solution.

best regards,
Adrian
________________________________
De : Erick Ramirez <fl...@gmail.com>>
Envoyé : vendredi 17 janvier 2020 10:02
À : user@cassandra.apache.org<ma...@cassandra.apache.org> <us...@cassandra.apache.org>>
Objet : Re: COPY command with where condition

The COPY command doesn't support filtering and it doesn't perform well for large tables.

Have you considered the DSBulk tool from DataStax? Previously, it only worked with DataStax Enterprise but a few weeks ago, it was made free and works with open-source Apache Cassandra. For details, see this blogpost<https://www.datastax.com/blog/2019/12/tools-for-apache-cassandra>. Cheers!

On Fri, Jan 17, 2020 at 6:57 PM adrien ruffie <ad...@hotmail.fr>> wrote:
Hello all,

In my company we want to export a big dataset of our cassandra's ring.
We search to use COPY command but I don't find if and how can a WHERE condition can be use ?

Because we need to export only several data which must be return by a WHERE closure, specially
and unfortunately with ALLOW FILTERING due to several old tables which were poorly conceptualized...

Do you know a means to do that please ?

Thank all and best regards

Adrian


Re: COPY command with where condition

Posted by Chris Splinter <ch...@gmail.com>.
DSBulk has an option that lets you specify the query ( including a WHERE
clause )

See Example 19 in this blog post for details:
https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading

On Fri, Jan 17, 2020 at 7:34 AM Jean Tremblay <
jean.tremblay@zen-innovations.com> wrote:

> Did you think about using a Materialised View to generate what you want to
> keep, and then use DSBulk to extract the data?
>
> On 17 Jan 2020, at 14:30 , adrien ruffie <ad...@hotmail.fr>
> wrote:
>
> Sorry I come back to a quick question about the bulk loader ...
>
> https://www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader
>
> I read this : "Operations such as converting strings to lowercase,
> arithmetic on input columns, or filtering out rows based on some criteria,
> are not supported. "
>
> Consequently, it's still not possible to use a WHERE clause with DSBulk,
> right ?
>
> I don't really know how I can do it, in order to don't keep the wholeness
> of business data already stored and which don't need to export...
>
>
>
> ------------------------------
> *De :* adrien ruffie <ad...@hotmail.fr>
> *Envoyé :* vendredi 17 janvier 2020 11:39
> *À :* Erick Ramirez <fl...@gmail.com>; user@cassandra.apache.org <
> user@cassandra.apache.org>
> *Objet :* RE: COPY command with where condition
>
> Thank a lot !
> It's a good news for DSBulk ! I will take a look around this solution.
>
> best regards,
> Adrian
> ------------------------------
> *De :* Erick Ramirez <fl...@gmail.com>
> *Envoyé :* vendredi 17 janvier 2020 10:02
> *À :* user@cassandra.apache.org <us...@cassandra.apache.org>
> *Objet :* Re: COPY command with where condition
>
> The COPY command doesn't support filtering and it doesn't perform well for
> large tables.
>
> Have you considered the DSBulk tool from DataStax? Previously, it only
> worked with DataStax Enterprise but a few weeks ago, it was made free and
> works with open-source Apache Cassandra. For details, see this blogpost
> <https://www.datastax.com/blog/2019/12/tools-for-apache-cassandra>.
> Cheers!
>
> On Fri, Jan 17, 2020 at 6:57 PM adrien ruffie <ad...@hotmail.fr>
> wrote:
>
> Hello all,
>
> In my company we want to export a big dataset of our cassandra's ring.
> We search to use COPY command but I don't find if and how can a WHERE
> condition can be use ?
>
> Because we need to export only several data which must be return by a
> WHERE closure, specially
> and unfortunately with ALLOW FILTERING due to several old tables which
> were poorly conceptualized...
>
> Do you know a means to do that please ?
>
> Thank all and best regards
>
> Adrian
>
>
>

Re: COPY command with where condition

Posted by Jean Tremblay <je...@zen-innovations.com>.
Did you think about using a Materialised View to generate what you want to keep, and then use DSBulk to extract the data?

> On 17 Jan 2020, at 14:30 , adrien ruffie <ad...@hotmail.fr> wrote:
> 
> Sorry I come back to a quick question about the bulk loader ...
> 
> https://www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader <https://www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader>
> 
> I read this : "Operations such as converting strings to lowercase, arithmetic on input columns, or filtering out rows based on some criteria, are not supported. "
> 
> Consequently, it's still not possible to use a WHERE clause with DSBulk, right ?
> 
> I don't really know how I can do it, in order to don't keep the wholeness of business data already stored and which don't need to export...
> 
> 
> 
> De : adrien ruffie <ad...@hotmail.fr>
> Envoyé : vendredi 17 janvier 2020 11:39
> À : Erick Ramirez <fl...@gmail.com>; user@cassandra.apache.org <us...@cassandra.apache.org>
> Objet : RE: COPY command with where condition
>  
> Thank a lot !
> It's a good news for DSBulk ! I will take a look around this solution.
> 
> best regards,
> Adrian
> De : Erick Ramirez <fl...@gmail.com>
> Envoyé : vendredi 17 janvier 2020 10:02
> À : user@cassandra.apache.org <us...@cassandra.apache.org>
> Objet : Re: COPY command with where condition
>  
> The COPY command doesn't support filtering and it doesn't perform well for large tables.
> 
> Have you considered the DSBulk tool from DataStax? Previously, it only worked with DataStax Enterprise but a few weeks ago, it was made free and works with open-source Apache Cassandra. For details, see this blogpost <https://www.datastax.com/blog/2019/12/tools-for-apache-cassandra>. Cheers!
> 
> On Fri, Jan 17, 2020 at 6:57 PM adrien ruffie <adriennolarsen@hotmail.fr <ma...@hotmail.fr>> wrote:
> Hello all,
> 
> In my company we want to export a big dataset of our cassandra's ring.
> We search to use COPY command but I don't find if and how can a WHERE condition can be use ?
> 
> Because we need to export only several data which must be return by a WHERE closure, specially
> and unfortunately with ALLOW FILTERING due to several old tables which were poorly conceptualized...
> 
> Do you know a means to do that please ?
> 
> Thank all and best regards
> 
> Adrian   


RE: COPY command with where condition

Posted by adrien ruffie <ad...@hotmail.fr>.
Sorry I come back to a quick question about the bulk loader ...

https://www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader

I read this : "Operations such as converting strings to lowercase, arithmetic on input columns, or filtering out rows based on some criteria, are not supported. "

Consequently, it's still not possible to use a WHERE clause with DSBulk, right ?

I don't really know how I can do it, in order to don't keep the wholeness of business data already stored and which don't need to export...



________________________________
De : adrien ruffie <ad...@hotmail.fr>
Envoyé : vendredi 17 janvier 2020 11:39
À : Erick Ramirez <fl...@gmail.com>; user@cassandra.apache.org <us...@cassandra.apache.org>
Objet : RE: COPY command with where condition

Thank a lot !
It's a good news for DSBulk ! I will take a look around this solution.

best regards,
Adrian
________________________________
De : Erick Ramirez <fl...@gmail.com>
Envoyé : vendredi 17 janvier 2020 10:02
À : user@cassandra.apache.org <us...@cassandra.apache.org>
Objet : Re: COPY command with where condition

The COPY command doesn't support filtering and it doesn't perform well for large tables.

Have you considered the DSBulk tool from DataStax? Previously, it only worked with DataStax Enterprise but a few weeks ago, it was made free and works with open-source Apache Cassandra. For details, see this blogpost<https://www.datastax.com/blog/2019/12/tools-for-apache-cassandra>. Cheers!

On Fri, Jan 17, 2020 at 6:57 PM adrien ruffie <ad...@hotmail.fr>> wrote:
Hello all,

In my company we want to export a big dataset of our cassandra's ring.
We search to use COPY command but I don't find if and how can a WHERE condition can be use ?

Because we need to export only several data which must be return by a WHERE closure, specially
and unfortunately with ALLOW FILTERING due to several old tables which were poorly conceptualized...

Do you know a means to do that please ?

Thank all and best regards

Adrian

RE: COPY command with where condition

Posted by adrien ruffie <ad...@hotmail.fr>.
Thank a lot !
It's a good news for DSBulk ! I will take a look around this solution.

best regards,
Adrian
________________________________
De : Erick Ramirez <fl...@gmail.com>
Envoyé : vendredi 17 janvier 2020 10:02
À : user@cassandra.apache.org <us...@cassandra.apache.org>
Objet : Re: COPY command with where condition

The COPY command doesn't support filtering and it doesn't perform well for large tables.

Have you considered the DSBulk tool from DataStax? Previously, it only worked with DataStax Enterprise but a few weeks ago, it was made free and works with open-source Apache Cassandra. For details, see this blogpost<https://www.datastax.com/blog/2019/12/tools-for-apache-cassandra>. Cheers!

On Fri, Jan 17, 2020 at 6:57 PM adrien ruffie <ad...@hotmail.fr>> wrote:
Hello all,

In my company we want to export a big dataset of our cassandra's ring.
We search to use COPY command but I don't find if and how can a WHERE condition can be use ?

Because we need to export only several data which must be return by a WHERE closure, specially
and unfortunately with ALLOW FILTERING due to several old tables which were poorly conceptualized...

Do you know a means to do that please ?

Thank all and best regards

Adrian

Re: COPY command with where condition

Posted by Erick Ramirez <fl...@gmail.com>.
The COPY command doesn't support filtering and it doesn't perform well for
large tables.

Have you considered the DSBulk tool from DataStax? Previously, it only
worked with DataStax Enterprise but a few weeks ago, it was made free and
works with open-source Apache Cassandra. For details, see this blogpost
<https://www.datastax.com/blog/2019/12/tools-for-apache-cassandra>. Cheers!

On Fri, Jan 17, 2020 at 6:57 PM adrien ruffie <ad...@hotmail.fr>
wrote:

> Hello all,
>
> In my company we want to export a big dataset of our cassandra's ring.
> We search to use COPY command but I don't find if and how can a WHERE
> condition can be use ?
>
> Because we need to export only several data which must be return by a
> WHERE closure, specially
> and unfortunately with ALLOW FILTERING due to several old tables which
> were poorly conceptualized...
>
> Do you know a means to do that please ?
>
> Thank all and best regards
>
> Adrian
>