You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Carl Mueller <ca...@smartthings.com.INVALID> on 2019/05/23 20:25:28 UTC

schema for testing that has a lot of edge cases

Does anyone have any schema / schema generation that can be used for
general testing that has lots of complicated aspects and data?

For example, it has a bunch of different rk/ck variations, column data
types, altered /added columns and data (which can impact sstables and
compaction),

Mischeivous data to prepopulate (such as
https://github.com/minimaxir/big-list-of-naughty-strings for strings, ugly
keys in maps, semi-evil column names) of sufficient size to get on most
nodes of a 3-5 node cluster

superwide rows
large key values

version specific stuff to 2.1, 2.2, 3.x, 4.x

I'd be happy to centralize this in a github if this doesn't exist anywhere
yet

Re: schema for testing that has a lot of edge cases

Posted by Carl Mueller <ca...@smartthings.com.INVALID>.

Thanks.

On Mon, May 27, 2019 at 12:24 PM Alain RODRIGUEZ <ar...@gmail.com> wrote:

> Hello Carl,
>
> What you try to do sounds like a good match with one of the tool we
> open-sourced and actively maintain:
> https://github.com/thelastpickle/tlp-stress.
>
> TLP Stress allows you to use defined profiles (see
> https://github.com/thelastpickle/tlp-stress/tree/master/src/main/kotlin/com/thelastpickle/tlpstress/profiles)
> or create your own profiles and/or schemas. Contributions are welcome. You
> can tune workloads, the read/write ratio, the number of distinct
> partitions, number of operations to run...
>
> You might need multiple client to maximize the throughput, depending on
> instances in use and your own testing goals.
>
> version specific stuff to 2.1, 2.2, 3.x, 4.x
>
>
> In case that might be of some use as well, we like to use it combined with
> another of our tools: TLP Cluster (
> https://github.com/thelastpickle/tlp-cluster). We can the easily create
> and destroy Cassandra environments (on AWS) including Cassandra servers,
> client and monitoring (Prometheus).
>
> You can have a look anyway, I think both projects might be of interest to
> reach your goal.
>
> C*heers,
> -----------------------
> Alain Rodriguez - alain@thelastpickle.com
> France / Spain
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
>
> Le jeu. 23 mai 2019 à 21:25, Carl Mueller
> <ca...@smartthings.com.invalid> a écrit :
>
>> Does anyone have any schema / schema generation that can be used for
>> general testing that has lots of complicated aspects and data?
>>
>> For example, it has a bunch of different rk/ck variations, column data
>> types, altered /added columns and data (which can impact sstables and
>> compaction),
>>
>> Mischeivous data to prepopulate (such as
>> https://github.com/minimaxir/big-list-of-naughty-strings for strings,
>> ugly keys in maps, semi-evil column names) of sufficient size to get on
>> most nodes of a 3-5 node cluster
>>
>> superwide rows
>> large key values
>>
>> version specific stuff to 2.1, 2.2, 3.x, 4.x
>>
>> I'd be happy to centralize this in a github if this doesn't exist
>> anywhere yet
>>
>>
>>

Re: schema for testing that has a lot of edge cases

Posted by Alain RODRIGUEZ <ar...@gmail.com>.

Hello Carl,

What you try to do sounds like a good match with one of the tool we
open-sourced and actively maintain:
https://github.com/thelastpickle/tlp-stress.

TLP Stress allows you to use defined profiles (see
https://github.com/thelastpickle/tlp-stress/tree/master/src/main/kotlin/com/thelastpickle/tlpstress/profiles)
or create your own profiles and/or schemas. Contributions are welcome. You
can tune workloads, the read/write ratio, the number of distinct
partitions, number of operations to run...

You might need multiple client to maximize the throughput, depending on
instances in use and your own testing goals.

version specific stuff to 2.1, 2.2, 3.x, 4.x


In case that might be of some use as well, we like to use it combined with
another of our tools: TLP Cluster (
https://github.com/thelastpickle/tlp-cluster). We can the easily create and
destroy Cassandra environments (on AWS) including Cassandra servers, client
and monitoring (Prometheus).

You can have a look anyway, I think both projects might be of interest to
reach your goal.

C*heers,
-----------------------
Alain Rodriguez - alain@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com


Le jeu. 23 mai 2019 à 21:25, Carl Mueller
<ca...@smartthings.com.invalid> a écrit :

> Does anyone have any schema / schema generation that can be used for
> general testing that has lots of complicated aspects and data?
>
> For example, it has a bunch of different rk/ck variations, column data
> types, altered /added columns and data (which can impact sstables and
> compaction),
>
> Mischeivous data to prepopulate (such as
> https://github.com/minimaxir/big-list-of-naughty-strings for strings,
> ugly keys in maps, semi-evil column names) of sufficient size to get on
> most nodes of a 3-5 node cluster
>
> superwide rows
> large key values
>
> version specific stuff to 2.1, 2.2, 3.x, 4.x
>
> I'd be happy to centralize this in a github if this doesn't exist anywhere
> yet
>
>
>