You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Lucas Benevides <lu...@maurobenevides.com.br> on 2017/11/01 16:38:21 UTC
Cassandra stress tool - data generation
Dear community,
I am using Cassandra Stress Tool and trying to simulate IoT generated data.
So I created a column family with the device_id as the partition key.
But in every different operation (the parameter received in the -n option)
the generated values are the same. For instance, I have a column called
observation_time which is supposed to be the time measured by the sensor.
But in every partition the values are equal.
Is there a way to make those values be randomically generated with
different seeds? I need this way so that if the same device_id occurs
again, it makes an INSERT instead of an UPSERT.
To clarify: What is happening now (fictional data):
operation 1
device 1
ts1: 01/01/1970
ts2: 02/01/1980
ts3: 03/01/1990
operation 2
device 2
ts1: 01/01/1970
ts2: 02/01/1980
ts3: 03/01/1990
What I want:
operation1
device 1
ts1: 01/01/1970
ts2: 02/01/1980
ts3: 03/01/1990
operation2
device 2
ts1: 02/01/1971 #Different values here.
ts2: 05/01/1982
ts3: 08/01/1993
Thanks in advance,
Lucas Benevides
Re: Cassandra stress tool - data generation
Posted by Lucas Benevides <lu...@maurobenevides.com.br>.
Hi Varun,
I apreciate you answer but this is not what is causing my problem.
Even if it is SEQ, as the excelent article by Ben Slater says, it will
always repeat the same sequential at each new operation (in my case one
operation equals to one partition).
But in that issue, I saw another one: https://issues.apache.
org/jira/browse/CASSANDRA-11138 that may be causing the problem. I will
apply this patch, test it and report it later.
Thank you
Lucas Benevides
2017-11-01 14:59 GMT-02:00 Varun Barala <va...@gmail.com>:
> https://www.instaclustr.com/deep-diving-cassandra-stress-
> part-3-using-yaml-profiles/ In this particular blog, they mentioned your
> case.
>
> Changed uniform() distribution to seq() distribution
> https://issues.apache.org/jira/browse/CASSANDRA-12490
>
> Thanks!!
>
>
> On Thu, Nov 2, 2017 at 12:54 AM, Varun Barala <va...@gmail.com>
> wrote:
>
>> Hi,
>>
>> https://www.instaclustr.com/deep-diving-into-cassandra-stress-part-1/
>>
>> In the blog, They covered many things in detail.
>>
>> Thanks!!
>>
>> On Thu, Nov 2, 2017 at 12:38 AM, Lucas Benevides <
>> lucas@maurobenevides.com.br> wrote:
>>
>>> Dear community,
>>>
>>> I am using Cassandra Stress Tool and trying to simulate IoT generated
>>> data.
>>> So I created a column family with the device_id as the partition key.
>>>
>>> But in every different operation (the parameter received in the -n
>>> option) the generated values are the same. For instance, I have a column
>>> called observation_time which is supposed to be the time measured by the
>>> sensor. But in every partition the values are equal.
>>>
>>> Is there a way to make those values be randomically generated with
>>> different seeds? I need this way so that if the same device_id occurs
>>> again, it makes an INSERT instead of an UPSERT.
>>>
>>> To clarify: What is happening now (fictional data):
>>>
>>> operation 1
>>> device 1
>>> ts1: 01/01/1970
>>> ts2: 02/01/1980
>>> ts3: 03/01/1990
>>>
>>> operation 2
>>> device 2
>>> ts1: 01/01/1970
>>> ts2: 02/01/1980
>>> ts3: 03/01/1990
>>>
>>> What I want:
>>> operation1
>>> device 1
>>> ts1: 01/01/1970
>>> ts2: 02/01/1980
>>> ts3: 03/01/1990
>>>
>>> operation2
>>> device 2
>>> ts1: 02/01/1971 #Different values here.
>>> ts2: 05/01/1982
>>> ts3: 08/01/1993
>>>
>>> Thanks in advance,
>>> Lucas Benevides
>>>
>>>
>>
>
Re: Cassandra stress tool - data generation
Posted by Varun Barala <va...@gmail.com>.
https://www.instaclustr.com/deep-diving-cassandra-stress-part-3-using-yaml-profiles/
In this particular blog, they mentioned your case.
Changed uniform() distribution to seq() distribution
https://issues.apache.org/jira/browse/CASSANDRA-12490
Thanks!!
On Thu, Nov 2, 2017 at 12:54 AM, Varun Barala <va...@gmail.com>
wrote:
> Hi,
>
> https://www.instaclustr.com/deep-diving-into-cassandra-stress-part-1/
>
> In the blog, They covered many things in detail.
>
> Thanks!!
>
> On Thu, Nov 2, 2017 at 12:38 AM, Lucas Benevides <
> lucas@maurobenevides.com.br> wrote:
>
>> Dear community,
>>
>> I am using Cassandra Stress Tool and trying to simulate IoT generated
>> data.
>> So I created a column family with the device_id as the partition key.
>>
>> But in every different operation (the parameter received in the -n
>> option) the generated values are the same. For instance, I have a column
>> called observation_time which is supposed to be the time measured by the
>> sensor. But in every partition the values are equal.
>>
>> Is there a way to make those values be randomically generated with
>> different seeds? I need this way so that if the same device_id occurs
>> again, it makes an INSERT instead of an UPSERT.
>>
>> To clarify: What is happening now (fictional data):
>>
>> operation 1
>> device 1
>> ts1: 01/01/1970
>> ts2: 02/01/1980
>> ts3: 03/01/1990
>>
>> operation 2
>> device 2
>> ts1: 01/01/1970
>> ts2: 02/01/1980
>> ts3: 03/01/1990
>>
>> What I want:
>> operation1
>> device 1
>> ts1: 01/01/1970
>> ts2: 02/01/1980
>> ts3: 03/01/1990
>>
>> operation2
>> device 2
>> ts1: 02/01/1971 #Different values here.
>> ts2: 05/01/1982
>> ts3: 08/01/1993
>>
>> Thanks in advance,
>> Lucas Benevides
>>
>>
>
Re: Cassandra stress tool - data generation
Posted by Varun Barala <va...@gmail.com>.
Hi,
https://www.instaclustr.com/deep-diving-into-cassandra-stress-part-1/
In the blog, They covered many things in detail.
Thanks!!
On Thu, Nov 2, 2017 at 12:38 AM, Lucas Benevides <
lucas@maurobenevides.com.br> wrote:
> Dear community,
>
> I am using Cassandra Stress Tool and trying to simulate IoT generated data.
> So I created a column family with the device_id as the partition key.
>
> But in every different operation (the parameter received in the -n option)
> the generated values are the same. For instance, I have a column called
> observation_time which is supposed to be the time measured by the sensor.
> But in every partition the values are equal.
>
> Is there a way to make those values be randomically generated with
> different seeds? I need this way so that if the same device_id occurs
> again, it makes an INSERT instead of an UPSERT.
>
> To clarify: What is happening now (fictional data):
>
> operation 1
> device 1
> ts1: 01/01/1970
> ts2: 02/01/1980
> ts3: 03/01/1990
>
> operation 2
> device 2
> ts1: 01/01/1970
> ts2: 02/01/1980
> ts3: 03/01/1990
>
> What I want:
> operation1
> device 1
> ts1: 01/01/1970
> ts2: 02/01/1980
> ts3: 03/01/1990
>
> operation2
> device 2
> ts1: 02/01/1971 #Different values here.
> ts2: 05/01/1982
> ts3: 08/01/1993
>
> Thanks in advance,
> Lucas Benevides
>
>