You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Lucas Benevides <lu...@maurobenevides.com.br> on 2017/11/01 16:38:21 UTC

Cassandra stress tool - data generation

Dear community,

I am using Cassandra Stress Tool and trying to simulate IoT generated data.
So I created a column family with the device_id as the partition key.

But in every different operation (the parameter received in the -n option)
the generated values are the same. For instance, I have a column called
observation_time which is supposed to be the time measured by the sensor.
But in every partition the values are equal.

Is there a way to make those values be randomically generated with
different seeds? I need this way so that if the same device_id occurs
again, it makes an INSERT instead of an UPSERT.

To clarify: What is happening now (fictional data):

operation 1
device 1
ts1: 01/01/1970
ts2: 02/01/1980
ts3: 03/01/1990

operation 2
device 2
ts1: 01/01/1970
ts2: 02/01/1980
ts3: 03/01/1990

What I want:
operation1
device 1
ts1: 01/01/1970
ts2: 02/01/1980
ts3: 03/01/1990

operation2
device 2
ts1: 02/01/1971  #Different values here.
ts2: 05/01/1982
ts3: 08/01/1993

Thanks in advance,
Lucas Benevides

Re: Cassandra stress tool - data generation

Posted by Lucas Benevides <lu...@maurobenevides.com.br>.

Hi Varun,

I apreciate you answer but this is not what is causing my problem.
Even if it is SEQ, as the excelent article by Ben Slater says, it will
always repeat the same sequential at each new operation (in my case one
operation equals to one partition).

But in that issue, I saw another one: https://issues.apache.
org/jira/browse/CASSANDRA-11138 that may be causing the problem. I will
apply this patch, test it and report it later.

Thank you
Lucas Benevides

2017-11-01 14:59 GMT-02:00 Varun Barala <va...@gmail.com>:

> https://www.instaclustr.com/deep-diving-cassandra-stress-
> part-3-using-yaml-profiles/ In this particular blog, they mentioned your
> case.
>
> Changed uniform() distribution to seq() distribution
> https://issues.apache.org/jira/browse/CASSANDRA-12490
>
> Thanks!!
>
>
> On Thu, Nov 2, 2017 at 12:54 AM, Varun Barala <va...@gmail.com>
> wrote:
>
>> Hi,
>>
>> https://www.instaclustr.com/deep-diving-into-cassandra-stress-part-1/
>>
>> In the blog, They covered many things in detail.
>>
>> Thanks!!
>>
>> On Thu, Nov 2, 2017 at 12:38 AM, Lucas Benevides <
>> lucas@maurobenevides.com.br> wrote:
>>
>>> Dear community,
>>>
>>> I am using Cassandra Stress Tool and trying to simulate IoT generated
>>> data.
>>> So I created a column family with the device_id as the partition key.
>>>
>>> But in every different operation (the parameter received in the -n
>>> option) the generated values are the same. For instance, I have a column
>>> called observation_time which is supposed to be the time measured by the
>>> sensor. But in every partition the values are equal.
>>>
>>> Is there a way to make those values be randomically generated with
>>> different seeds? I need this way so that if the same device_id occurs
>>> again, it makes an INSERT instead of an UPSERT.
>>>
>>> To clarify: What is happening now (fictional data):
>>>
>>> operation 1
>>> device 1
>>> ts1: 01/01/1970
>>> ts2: 02/01/1980
>>> ts3: 03/01/1990
>>>
>>> operation 2
>>> device 2
>>> ts1: 01/01/1970
>>> ts2: 02/01/1980
>>> ts3: 03/01/1990
>>>
>>> What I want:
>>> operation1
>>> device 1
>>> ts1: 01/01/1970
>>> ts2: 02/01/1980
>>> ts3: 03/01/1990
>>>
>>> operation2
>>> device 2
>>> ts1: 02/01/1971  #Different values here.
>>> ts2: 05/01/1982
>>> ts3: 08/01/1993
>>>
>>> Thanks in advance,
>>> Lucas Benevides
>>>
>>>
>>
>

Re: Cassandra stress tool - data generation

Posted by Varun Barala <va...@gmail.com>.

https://www.instaclustr.com/deep-diving-cassandra-stress-part-3-using-yaml-profiles/
In this particular blog, they mentioned your case.

Changed uniform() distribution to seq() distribution
https://issues.apache.org/jira/browse/CASSANDRA-12490

Thanks!!


On Thu, Nov 2, 2017 at 12:54 AM, Varun Barala <va...@gmail.com>
wrote:

> Hi,
>
> https://www.instaclustr.com/deep-diving-into-cassandra-stress-part-1/
>
> In the blog, They covered many things in detail.
>
> Thanks!!
>
> On Thu, Nov 2, 2017 at 12:38 AM, Lucas Benevides <
> lucas@maurobenevides.com.br> wrote:
>
>> Dear community,
>>
>> I am using Cassandra Stress Tool and trying to simulate IoT generated
>> data.
>> So I created a column family with the device_id as the partition key.
>>
>> But in every different operation (the parameter received in the -n
>> option) the generated values are the same. For instance, I have a column
>> called observation_time which is supposed to be the time measured by the
>> sensor. But in every partition the values are equal.
>>
>> Is there a way to make those values be randomically generated with
>> different seeds? I need this way so that if the same device_id occurs
>> again, it makes an INSERT instead of an UPSERT.
>>
>> To clarify: What is happening now (fictional data):
>>
>> operation 1
>> device 1
>> ts1: 01/01/1970
>> ts2: 02/01/1980
>> ts3: 03/01/1990
>>
>> operation 2
>> device 2
>> ts1: 01/01/1970
>> ts2: 02/01/1980
>> ts3: 03/01/1990
>>
>> What I want:
>> operation1
>> device 1
>> ts1: 01/01/1970
>> ts2: 02/01/1980
>> ts3: 03/01/1990
>>
>> operation2
>> device 2
>> ts1: 02/01/1971  #Different values here.
>> ts2: 05/01/1982
>> ts3: 08/01/1993
>>
>> Thanks in advance,
>> Lucas Benevides
>>
>>
>

Re: Cassandra stress tool - data generation

Posted by Varun Barala <va...@gmail.com>.

Hi,

https://www.instaclustr.com/deep-diving-into-cassandra-stress-part-1/

In the blog, They covered many things in detail.

Thanks!!

On Thu, Nov 2, 2017 at 12:38 AM, Lucas Benevides <
lucas@maurobenevides.com.br> wrote:

> Dear community,
>
> I am using Cassandra Stress Tool and trying to simulate IoT generated data.
> So I created a column family with the device_id as the partition key.
>
> But in every different operation (the parameter received in the -n option)
> the generated values are the same. For instance, I have a column called
> observation_time which is supposed to be the time measured by the sensor.
> But in every partition the values are equal.
>
> Is there a way to make those values be randomically generated with
> different seeds? I need this way so that if the same device_id occurs
> again, it makes an INSERT instead of an UPSERT.
>
> To clarify: What is happening now (fictional data):
>
> operation 1
> device 1
> ts1: 01/01/1970
> ts2: 02/01/1980
> ts3: 03/01/1990
>
> operation 2
> device 2
> ts1: 01/01/1970
> ts2: 02/01/1980
> ts3: 03/01/1990
>
> What I want:
> operation1
> device 1
> ts1: 01/01/1970
> ts2: 02/01/1980
> ts3: 03/01/1990
>
> operation2
> device 2
> ts1: 02/01/1971  #Different values here.
> ts2: 05/01/1982
> ts3: 08/01/1993
>
> Thanks in advance,
> Lucas Benevides
>
>