You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Sameer Farooqui <ca...@gmail.com> on 2011/02/04 02:02:45 UTC

Problems with Python Stress Test

Hi guys,

I was playing around with the stress.py test this week and noticed a few
things.

1) Progress-interval does not always work correctly. I set it to 5 in the
example below, but am instead getting varying intervals:

*techlabs@cassandraN1:~/apache-cassandra-0.7.0-src/contrib/py_stress$ python
stress.py --num-keys=100000 --columns=5 --column-size=32 --operation=insert
--progress-interval=5 --threads=4 --nodes=170.252.179.222
Keyspace already exists.
total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
6662,1332,1335,0.00307796342135,5
11607,989,988,0.00476862022199,12
20297,1738,1736,0.00273238550807,18
30631,2066,2068,0.00202261635614,24
37291,1332,1331,0.00325975901372,29
47514,2044,2044,0.00193106963725,35
56618,1820,1821,0.00276346638249,41
68652,2406,2406,0.00179436958884,47
77745,1818,1820,0.00220694060007,52
87351,1921,1918,0.00236015612201,58
97167,1963,1963,0.00230505042379,64
100000,566,566,0.00223569174853,66*


2) The key_rate and op_rate doesn't seem to be calculated correctly. Also,
what is the difference between the interval_key_rate and the
interval_op_rate? For example in the example above, the first row shows 6662
keys inserted in 5 seconds and 6662 / 5 = 1332, which matches the
interval_op_rate.

The second row took 7 seconds to update instead of the requested 5. However,
the interval_op_rate and interval_key_rate are being calculated based on my
requested 5 seconds instead of the actual observed 7 seconds.

(11607-6662)/5=989
(11607-6662)/7 = 706

Shouldn't it be basing the calculations off the 7 seconds?


3) If I write x KB to Cassandra with py_stress, the used disk space doesn't
grow by x after the test. In the example below I tried to write 500,000 keys
* 32 bytes * 5 columns = 78,125 kilobytes of data to the database. When I
checked the amount of disk space used after the test it actually grew by
2,684,920 - 2,515,864 = 169,056 kilobytes. Is this because perhaps the
commit log got duplicate copies of the data as the SSTables?

Also, notice how to progress interval got thrown off after 40 seconds.


techlabs@cassandraN1:~/apache-cassandra-0.7.0-src/contrib/py_stress$ df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/cassandra7rc4-root
                       7583436   *2515864   *4682344  35% /
none                    633244       208    633036   1% /dev
none                    640368         0    640368   0% /dev/shm
none                    640368        56    640312   1% /var/run
none                    640368         0    640368   0% /var/lock
/dev/sda1               233191     20601    200149  10% /boot

techlabs@cassandraN1:~/apache-cassandra-0.7.0-src/contrib/py_stress$ python
stress.py --num-keys=500000 --columns=5 --operation=insert
--progress-interval=5 --threads=1 --nodes=170.252.179.222
Keyspace already exists.
total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
15562,3112,3112,0.000300011955333,5
31643,3216,3216,0.000290757187504,10
42968,2265,2265,0.000423845265875,15
54071,2220,2220,0.000430288759747,20
66491,2484,2484,0.000382423304897,25
79891,2680,2680,0.000351728307667,30
91758,2373,2373,0.000402696775367,35
102179,2084,2084,0.000461982612291,40
114003,2364,2364,0.000403893998092,46
126509,2501,2501,0.000379724634489,51
138047,2307,2307,0.000414365229356,56
150261,2442,2442,0.000390332772296,61
164019,2751,2751,0.000343320345113,66
175390,2274,2274,0.000421584286756,71
186564,2234,2234,0.000429319251473,76
198292,2345,2345,0.00040838057315,81
210186,2378,2378,0.000400560030882,87
225144,2991,2991,0.000314564943345,92
236474,2266,2266,0.000422214746265,97
249940,2693,2693,0.000349487200297,102
264410,2894,2894,0.00030166366303,107
275429,2203,2203,0.000464002475276,112
286430,2200,2200,0.00043832517821,117
299217,2557,2557,0.000371891478764,122
313800,2916,2916,0.000322412596002,128
325252,2290,2290,0.000417413284343,133
336031,2155,2155,0.000445155976201,138
347257,2245,2245,0.000426658924816,143
357493,2047,2047,0.000472509730556,148
372151,2931,2931,0.000321278794594,153
384655,2500,2500,0.000381667455343,158
395604,2189,2189,0.000439286896144,163
409713,2821,2821,0.000334938358759,168
423162,2689,2689,0.000351835071877,174
434276,2222,2222,0.000432009316829,179
444809,2106,2106,0.00045844612893,184
458190,2676,2676,0.000353130326037,189
470852,2532,2532,0.000374360740552,194
481333,2096,2096,0.000462788910416,199
492458,2225,2225,0.000431290422932,204
500000,1508,1508,0.000353647808408,207


techlabs@cassandraN1:~/apache-cassandra-0.7.0-src/contrib/py_stress$ df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/cassandra7rc4-root
                       7583436   2684920   4513288  38% /
none                    633244       208    633036   1% /dev
none                    640368         0    640368   0% /dev/shm
none                    640368        56    640312   1% /var/run
none                    640368         0    640368   0% /var/lock
/dev/sda1               233191     20601    200149  10% /boot



- Sameer

Re: Problems with Python Stress Test

Posted by Brandon Williams <dr...@gmail.com>.

On Fri, Feb 4, 2011 at 5:23 PM, Sameer Farooqui <ca...@gmail.com>wrote:

> Brandon,
>
> Thanks for the response. I have also noticed that stress.py's progress
> interval gets thrown off in low memory situations.
>
> What did you mean by "contrib/stress on 0.7 instead".  I don't see that dir
> in the src version of 0.7.

Looks like it didn't make it in 0.7.0.  It will be in 0.7.1, or you can get
it from svn.

-Brandon

Re: Problems with Python Stress Test

Posted by Sameer Farooqui <ca...@gmail.com>.

Brandon,

Thanks for the response. I have also noticed that stress.py's progress
interval gets thrown off in low memory situations.

What did you mean by "contrib/stress on 0.7 instead".  I don't see that dir
in the src version of 0.7.

- Sameer


On Thu, Feb 3, 2011 at 5:22 PM, Brandon Williams <dr...@gmail.com> wrote:

> On Thu, Feb 3, 2011 at 7:02 PM, Sameer Farooqui <ca...@gmail.com>wrote:
>
>> Hi guys,
>>
>> I was playing around with the stress.py test this week and noticed a few
>> things.
>>
>> 1) Progress-interval does not always work correctly. I set it to 5 in the
>> example below, but am instead getting varying intervals:
>>
>
> Generally indicates that the client machine is being overloaded in my
> experience.
>
> 2) The key_rate and op_rate doesn't seem to be calculated correctly. Also,
>> what is the difference between the interval_key_rate and the
>> interval_op_rate? For example in the example above, the first row shows 6662
>> keys inserted in 5 seconds and 6662 / 5 = 1332, which matches the
>> interval_op_rate.
>>
>
> There should be no difference unless you're doing range slices, but IPC
> timing makes them vary somewhat.
>
> 3) If I write x KB to Cassandra with py_stress, the used disk space doesn't
>> grow by x after the test. In the example below I tried to write 500,000 keys
>> * 32 bytes * 5 columns = 78,125 kilobytes of data to the database. When I
>> checked the amount of disk space used after the test it actually grew by
>> 2,684,920 - 2,515,864 = 169,056 kilobytes. Is this because perhaps the
>> commit log got duplicate copies of the data as the SSTables?
>>
>
> Commitlogs could be part of it, you're not factoring in the column names,
> and then there's index and bloom filter overhead.
>
> Use contrib/stress on 0.7 instead.
>
> -Brandon
>

Re: Problems with Python Stress Test

Posted by Brandon Williams <dr...@gmail.com>.

On Thu, Feb 3, 2011 at 7:02 PM, Sameer Farooqui <ca...@gmail.com>wrote:

> Hi guys,
>
> I was playing around with the stress.py test this week and noticed a few
> things.
>
> 1) Progress-interval does not always work correctly. I set it to 5 in the
> example below, but am instead getting varying intervals:
>

Generally indicates that the client machine is being overloaded in my
experience.

2) The key_rate and op_rate doesn't seem to be calculated correctly. Also,
> what is the difference between the interval_key_rate and the
> interval_op_rate? For example in the example above, the first row shows 6662
> keys inserted in 5 seconds and 6662 / 5 = 1332, which matches the
> interval_op_rate.
>

There should be no difference unless you're doing range slices, but IPC
timing makes them vary somewhat.

3) If I write x KB to Cassandra with py_stress, the used disk space doesn't
> grow by x after the test. In the example below I tried to write 500,000 keys
> * 32 bytes * 5 columns = 78,125 kilobytes of data to the database. When I
> checked the amount of disk space used after the test it actually grew by
> 2,684,920 - 2,515,864 = 169,056 kilobytes. Is this because perhaps the
> commit log got duplicate copies of the data as the SSTables?
>

Commitlogs could be part of it, you're not factoring in the column names,
and then there's index and bloom filter overhead.

Use contrib/stress on 0.7 instead.

-Brandon