You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Pradeep Kumar Mantha <pr...@gmail.com> on 2013/01/17 20:10:44 UTC

Cassandra Performance Benchmarking.

Hi,

I am trying to maximize execution of the number of read queries/second.

Here is my cluster configuration.

Replication - Default
12 Data Nodes.
16 Client Nodes - used for querying.

Each client node executes 32 threads - each thread executes 76896 read
queries using  cassandra-cli tool.
       i.e all the read queries are stored in a file and that file is
given to cassandra-cli tool ( using -f option ) which is executed by a
thread.
so, total number of queries for 16 client Nodes is 16 * 32 * 76896.

The read queries on each client node submitted at the same time. The
time taken for 16 * 32 * 76896 read queries is nearly 742 seconds -
which is nearly 53k transactions/second.

I would like to know if there is any other way/tool through which I
can improve the number of transactions/second.
Is the performance affected by cassandra-cli tool?

thanks
pradeep

Re: Cassandra Performance Benchmarking.

Posted by aaron morton <aa...@thelastpickle.com>.
> My application took nearly 529 seconds for querying 76896 keys.
> 
> Please find the statistic information below for 32 threads ( where
> each thread query 76896 keys ) obtained just after the experiment.
So each read from the client perspective is takes 0.00687942155639 seconds or 6879 microseconds

> (mypython_repo)-bash-3.2$ nodetool -host XX.XX.XX.XX -p 7199 proxyhistograms
> Offset          Read Latency     Write Latency     Range Latency
> ...
> 372                      233                 0                 0
> 446                     7291                 0                 0
> 535                     9669                 0                 0
> 642                    34917                 0                 0
> 770                    73709                 0                 0
> 924                    45270                 0                 0
> 1109                   18186                 0                 0
> 1331                    6931                 0                 0
> 1597                    2111                 0                 0
> 1916                     661                 0                 0
> 2299                     285                 0                 0
> 2759                     123                 0                 0

Most of your reads are taking around 642 to 924 microseconds. 

The latency is on the client side.

Simplify your application, get to a point where the client and server latency match up and then add complexity. 

Hope that helps. 

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 22/01/2013, at 12:05 PM, Pradeep Kumar Mantha <pr...@gmail.com> wrote:

> Hi,
> 
> Thanks for the information..
> 
> I upgraded my cassandra version to 1.2.0 and tried running the
> experiment again to find the statistics.
> 
> My application took nearly 529 seconds for querying 76896 keys.
> 
> Please find the statistic information below for 32 threads ( where
> each thread query 76896 keys ) obtained just after the experiment.
> 
> (mypython_repo)-bash-3.2$ nodetool -host XX.XX.XX.XX -p 7199 proxyhistograms
> proxy histograms
> Offset          Read Latency     Write Latency     Range Latency
> 1                          0                 0                 0
> 2                          0                 0                 0
> 3                          0                 0                 0
> 4                          0                 0                 0
> 5                          0                 0                 0
> 6                          0                 0                 0
> 7                          0                 0                 0
> 8                          0                 0                 0
> 10                         0                 0                 0
> 12                         0                 0                 0
> 14                         0                 0                 0
> 17                         0                 0                 0
> 20                         0                 0                 0
> 24                         0                 0                 0
> 29                         0                 0                 0
> 35                         0                 0                 0
> 42                         0                 0                 0
> 50                         0                 0                 0
> 60                         0                 0                 0
> 72                         0                 0                 0
> 86                         0                 0                 0
> 103                        0                 0                 0
> 124                        0                 0                 0
> 149                        0                 0                 0
> 179                        0                 0                 0
> 215                        0                 0                 0
> 258                        0                 0                 0
> 310                        2                 0                 0
> 372                      233                 0                 0
> 446                     7291                 0                 0
> 535                     9669                 0                 0
> 642                    34917                 0                 0
> 770                    73709                 0                 0
> 924                    45270                 0                 0
> 1109                   18186                 0                 0
> 1331                    6931                 0                 0
> 1597                    2111                 0                 0
> 1916                     661                 0                 0
> 2299                     285                 0                 0
> 2759                     123                 0                 0
> 3311                      56                 0                 0
> 3973                      47                 0                 0
> 4768                      45                 0                 0
> 5722                      42                 0                 0
> 6866                      43                 0                 0
> 8239                      60                 0                 0
> 9887                      41                 0                 0
> 11864                     42                 0                 0
> 14237                     32                 0                 0
> 17084                     50                 0                 0
> 20501                     51                 0                 0
> 24601                     55                 0                 0
> 29521                     43                 0                 0
> 35425                     26                 0                 0
> 42510                     30                 0                 0
> 51012                     37                 0                 0
> 61214                     46                 0                 0
> 73457                     60                 0                 0
> 88148                    106                 0                 0
> 105778                   127                 0                 0
> 126934                   168                 0                 0
> 152321                   110                 0                 0
> 182785                    71                 0                 0
> 219342                    22                 0                 0
> 263210                    10                 0                 0
> 315852                     2                 0                 0
> 379022                     2                 0                 0
> 454826                     5                 0                 0
> 545791                     0                 0                 0
> 654949                     8                 0                 0
> 785939                     0                 0                 0
> 943127                     0                 0                 0
> 1131752                    2                 0                 0
> 1358102                    1                 0                 0
> 1629722                    3                 0                 0
> 1955666                    2                 0                 0
> 2346799                    0                 0                 0
> 2816159                    0                 0                 0
> 3379391                    3                 0                 0
> 4055269                    0                 0                 0
> 4866323                    0                 0                 0
> 5839588                    0                 0                 0
> 7007506                    0                 0                 0
> 8409007                    0                 0                 0
> 10090808                   0                 0                 0
> 12108970                   0                 0                 0
> 14530764                   0                 0                 0
> 17436917                   0                 0                 0
> 20924300                   0                 0                 0
> 25109160                   0                 0                 0
> 
> 
> (mypython_repo)-bash-3.2$ nodetool -host XX.XX.XX.XX -p 7199
> cfhistograms Blast Blast_NR
> Blast/Blast_NR histograms
> Offset      SSTables     Write Latency      Read Latency          Row
> Size      Column Count
> 1             215220                 0                        0
>                         0                 0
> 2                  0                     0                        0
>                             0           1281975
> 3                  0                     0                        0
>                             0                 0
> 4                  0                 0                 0
>  0                 0
> 5                  0                 0                 0
>  0                 0
> 6                  0                 0                 0
>  0                 0
> 7                  0                 0                 0
>  0                 0
> 8                  0                 0                 0
>  0                 0
> 10                 0                 0                 0
>  0                 0
> 12                 0                 0                 0
>  0                 0
> 14                 0                 0                 0
>  0                 0
> 17                 0                 0                 0
>  0                 0
> 20                 0                 0                 0
>  0                 0
> 24                 0                 0                 0
>  0                 0
> 29                 0                 0                 0
>  0                 0
> 35                 0                 0                 0
>  0                 0
> 42                 0                 0                 0
>  0                 0
> 50                 0                 0                 0
>  0                 0
> 60                 0                 0                 0
>  0                 0
> 72                 0                 0                 0
>  0                 0
> 86                 0                 0                 0
>  0                 0
> 103                0                 0                 0
>  0                 0
> 124                0                 0                 0
>  0                 0
> 149                0                 0                 0
> 47                 0
> 179                0                 0                 0
> 42067                 0
> 215                0                 0                 0
> 108857                 0
> 258                0                 0                 4
> 126799                 0
> 310                0                 0                27
> 150803                 0
> 372                0                 0            154314
> 159451                 0
> 446                0                 0             47355
> 184528                 0
> 535                0                 0             11445
> 175136                 0
> 642                0                 0              1220
> 141025                 0
> 770                0                 0               686
> 81703                 0
> 924                0                 0                15
> 49067                 0
> 1109               0                 0                 1
> 29531                 0
> 1331               0                 0                 1
> 16366                 0
> 1597               0                 0                 1
> 7974                 0
> 1916               0                 0                 3
> 3816                 0
> 2299               0                 0                 2
> 2063                 0
> 2759               0                 0                 1
> 1142                 0
> 3311               0                 0                 4
> 646                 0
> 3973               0                 0                 4
> 382                 0
> 4768               0                 0                 2
> 295                 0
> 5722               0                 0                12
> 150                 0
> 6866               0                 0                15
> 56                 0
> 8239               0                 0                17
> 38                 0
> 9887               0                 0                11
> 18                 0
> 11864              0                 0                11
> 10                 0
> 14237              0                 0                 6
>  0                 0
> 17084              0                 0                 3
>  4                 0
> 20501              0                 0                 1
>  0                 0
> 24601              0                 0                37
>  0                 0
> 29521              0                 0                 9
>  0                 0
> 35425              0                 0                 6
>  1                 0
> 42510              0                 0                 4
>  0                 0
> 51012              0                 0                 1
>  0                 0
> 61214              0                 0                 0
>  0                 0
> 73457              0                 0                 0
>  0                 0
> 88148              0                 0                 0
>  0                 0
> 105778             0                 0                 0
>  0                 0
> 126934             0                 0                 0
>  0                 0
> 152321             0                 0                 0
>  0                 0
> 182785             0                 0                 0
>  0                 0
> 219342             0                 0                 2
>  0                 0
> 263210             0                 0                 0
>  0                 0
> 315852             0                 0                 0
>  0                 0
> 379022             0                 0                 0
>  0                 0
> 454826             0                 0                 0
>  0                 0
> 545791             0                 0                 0
>  0                 0
> 654949             0                 0                 0
>  0                 0
> 785939             0                 0                 0
>  0                 0
> 943127             0                 0                 0
>  0                 0
> 1131752            0                 0                 0
>  0                 0
> 1358102            0                 0                 0
>  0                 0
> 1629722            0                 0                 0
>  0                 0
> 1955666            0                 0                 0
>  0                 0
> 2346799            0                 0                 0
>  0                 0
> 2816159            0                 0                 0
>  0                 0
> 3379391            0                 0                 0
>  0                 0
> 4055269            0                 0                 0
>  0                 0
> 4866323            0                 0                 0
>  0                 0
> 5839588            0                 0                 0
>  0                 0
> 7007506            0                 0                 0
>  0                 0
> 8409007            0                 0                 0
>  0                 0
> 10090808           0                 0                 0
>  0                 0
> 12108970           0                 0                 0
>  0                 0
> 14530764           0                 0                 0
>  0                 0
> 17436917           0                 0                 0
>  0                 0
> 20924300           0                 0                 0
>  0                 0
> 25109160           0                 0                 0
>  0                 0
> 
> Could you please let me know how I can analyze this information.
> 
> 
> thanks
> pradeep
> 
> On Mon, Jan 21, 2013 at 12:02 AM, aaron morton <aa...@thelastpickle.com> wrote:
>> You can also see what it looks like from the server side.
>> 
>> nodetool proxyhistograms will show you full request latency recorded by the
>> coordinator.
>> nodetool cfhistograms will show you the local read latency, this is just the
>> time it takes to read data on a replica and does not include network or wait
>> times.
>> 
>> If the proxyhistograms is showing most requests running faster than your app
>> says it's your app.
>> 
>> Cheers
>> 
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> New Zealand
>> 
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 19/01/2013, at 8:16 AM, Tyler Hobbs <ty...@datastax.com> wrote:
>> 
>> The fact that it's still exactly 521 seconds is very suspicious.  I can't
>> debug your script over the mailing list, but do some sanity checks to make
>> sure there's not a bottleneck somewhere you don't expect.
>> 
>> 
>> On Fri, Jan 18, 2013 at 12:44 PM, Pradeep Kumar Mantha
>> <pr...@gmail.com> wrote:
>>> 
>>> Hi,
>>> 
>>> Thanks Tyler.
>>> 
>>> Below is the *global* connection pool I am trying to use, where the
>>> server_list contains all the ips of 12 DataNodes I am using and
>>> pool_size is the number of threads  and I just set to timeout to 60 to
>>> avoid connection retry errors.
>>> 
>>> pool = pycassa.ConnectionPool('Blast',
>>> server_list=server_list,pool_size=32,timeout=60)
>>> 
>>> 
>>> It seems the performance is still stuck at 521 seconds.. which is 177
>>> seconds for cassandra-cli.
>>> 
>>> Am I still missing something?
>>> 
>>> thanks
>>> Pradeep
>>> 
>>> 
>>> 
>>> On Fri, Jan 18, 2013 at 7:12 AM, Tyler Hobbs <ty...@datastax.com> wrote:
>>>> You just need to increase the ConnectionPool size to handle the number
>>>> of
>>>> threads you have using it concurrently.  Set the pool_size kwarg to at
>>>> least
>>>> the number of threads you're using.
>>>> 
>>>> 
>>>> On Thu, Jan 17, 2013 at 6:46 PM, Pradeep Kumar Mantha
>>>> <pr...@gmail.com>
>>>> wrote:
>>>>> 
>>>>> Thanks Tyler.
>>>>> 
>>>>> I just moved the pool and cf which store the connection pool and CF
>>>>> information to have global scope.
>>>>> 
>>>>> Increased the server_list values from 1 to 4. ( i think i can increase
>>>>> them max to 12 since I have 12 data nodes )
>>>>> 
>>>>> when I created 8 threads  using python threading package , I see the
>>>>> below error.
>>>>> 
>>>>> Exception in thread Thread-3:
>>>>> Traceback (most recent call last):
>>>>>  File
>>>>> "/usr/common/usg/python/2.7.1-20110310/lib64/python2.7/threading.py",
>>>>> line 530, in __bootstrap_inner
>>>>>    self.run()
>>>>>  File "my_cc.py", line 20, in run
>>>>>    start_cassandra_client(self.name)
>>>>>  File "my_cc.py", line 33, in start_cassandra_client
>>>>>    cf.get(key)
>>>>>  File
>>>>> 
>>>>> "/global/homes/p/pmantha/mypython_repo/lib/python2.7/site-packages/pycassa/columnfamily.py",
>>>>> line 652, in get
>>>>>    read_consistency_level or self.read_consistency_level)
>>>>>  File
>>>>> 
>>>>> "/global/homes/p/pmantha/mypython_repo/lib/python2.7/site-packages/pycassa/pool.py",
>>>>> line 553, in execute
>>>>>    conn = self.get()
>>>>>  File
>>>>> 
>>>>> "/global/homes/p/pmantha/mypython_repo/lib/python2.7/site-packages/pycassa/pool.py",
>>>>> line 536, in get
>>>>>    raise NoConnectionAvailable(message)
>>>>> NoConnectionAvailable: ConnectionPool limit of size 5 reached, unable
>>>>> to obtain connection after 30 seconds
>>>>> 
>>>>> 
>>>>> Please have a look at the script attached.. and let me know if I need
>>>>> to change something.. Please bear with me, if I do something terribly
>>>>> wrong..
>>>>> 
>>>>> I am running the script on a 8 processor node.
>>>>> 
>>>>> thanks
>>>>> pradeep
>>>>> 
>>>>> On Thu, Jan 17, 2013 at 4:18 PM, Tyler Hobbs <ty...@datastax.com>
>>>>> wrote:
>>>>>> ConnectionPools and ColumnFamilies are thread-safe in pycassa, and
>>>>>> it's
>>>>>> best
>>>>>> to share them across multiple threads.  Of course, when you do that,
>>>>>> make
>>>>>> sure to make the ConnectionPool large enough to support all of the
>>>>>> threads
>>>>>> making queries concurrently.  I'm also not sure if you're just
>>>>>> omitting
>>>>>> this, but pycassa's ConnectionPool will only open connections to
>>>>>> servers
>>>>>> you
>>>>>> explicitly include in server_list; there's no autodiscovery of other
>>>>>> nodes
>>>>>> going on.
>>>>>> 
>>>>>> Depending on your network latency, you'll top out on python
>>>>>> performance
>>>>>> with
>>>>>> a fairly low number of threads due to the GIL.  It's best to use
>>>>>> multiple
>>>>>> processes if you really want to benchmark something.
>>>>>> 
>>>>>> 
>>>>>> On Thu, Jan 17, 2013 at 6:05 PM, Pradeep Kumar Mantha
>>>>>> <pr...@gmail.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> Thanks. I would like to benchmark cassandra with our application so
>>>>>>> that we understand the details of how the actual benchmarking is
>>>>>>> done.
>>>>>>> Not sure, how easy it would be to integrate YCSB with our
>>>>>>> application.
>>>>>>> 
>>>>>>> So, i am trying different client interfaces to cassandra.
>>>>>>> 
>>>>>>> I found
>>>>>>> 
>>>>>>> for 12 Data Nodes Cassandra cluster and 1 Client Node which run 32
>>>>>>> threads ( each querying X number of queries ).
>>>>>>> 
>>>>>>> cassandra-cli     took 133 seconds
>>>>>>> pycassa took 521 seconds.
>>>>>>> 
>>>>>>> Here is the python pycassa code used to query and passed to each
>>>>>>> thread....
>>>>>>> 
>>>>>>> def start_cassandra_client(Threadname):
>>>>>>>        pool = pycassa.ConnectionPool('Blast',
>>>>>>> server_list=['xxx.xx.xx.xx'])
>>>>>>>        cf = pycassa.ColumnFamily(pool, 'Blast_NR')
>>>>>>>        inp_file=open("pycassa_100%_query")
>>>>>>>        for key in inp_file:
>>>>>>>                key=key.strip()
>>>>>>>                cf.get(key)
>>>>>>> 
>>>>>>> Does Java clients like Hector/Astynax help here.. I am more
>>>>>>> comfortable with Python than Java and our existing application is
>>>>>>> also
>>>>>>> in Python.
>>>>>>> 
>>>>>>> thanks
>>>>>>> pradeep
>>>>>>> 
>>>>>>> 
>>>>>>> On Thu, Jan 17, 2013 at 2:08 PM, Edward Capriolo
>>>>>>> <ed...@gmail.com>
>>>>>>> wrote:
>>>>>>>> Wow you managed to do a load test through the cassandra-cli. There
>>>>>>>> should be
>>>>>>>> a merit badge for that.
>>>>>>>> 
>>>>>>>> You should use the built in stress tool or YCSB.
>>>>>>>> 
>>>>>>>> The CLI has to do much more string conversion then a normal client
>>>>>>>> would
>>>>>>>> and
>>>>>>>> it is not built for performance. You will definitely get better
>>>>>>>> numbers
>>>>>>>> through other means.
>>>>>>>> 
>>>>>>>> On Thu, Jan 17, 2013 at 2:10 PM, Pradeep Kumar Mantha
>>>>>>>> <pr...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> I am trying to maximize execution of the number of read
>>>>>>>>> queries/second.
>>>>>>>>> 
>>>>>>>>> Here is my cluster configuration.
>>>>>>>>> 
>>>>>>>>> Replication - Default
>>>>>>>>> 12 Data Nodes.
>>>>>>>>> 16 Client Nodes - used for querying.
>>>>>>>>> 
>>>>>>>>> Each client node executes 32 threads - each thread executes 76896
>>>>>>>>> read
>>>>>>>>> queries using  cassandra-cli tool.
>>>>>>>>>       i.e all the read queries are stored in a file and that
>>>>>>>>> file
>>>>>>>>> is
>>>>>>>>> given to cassandra-cli tool ( using -f option ) which is executed
>>>>>>>>> by
>>>>>>>>> a
>>>>>>>>> thread.
>>>>>>>>> so, total number of queries for 16 client Nodes is 16 * 32 *
>>>>>>>>> 76896.
>>>>>>>>> 
>>>>>>>>> The read queries on each client node submitted at the same time.
>>>>>>>>> The
>>>>>>>>> time taken for 16 * 32 * 76896 read queries is nearly 742 seconds
>>>>>>>>> -
>>>>>>>>> which is nearly 53k transactions/second.
>>>>>>>>> 
>>>>>>>>> I would like to know if there is any other way/tool through which
>>>>>>>>> I
>>>>>>>>> can improve the number of transactions/second.
>>>>>>>>> Is the performance affected by cassandra-cli tool?
>>>>>>>>> 
>>>>>>>>> thanks
>>>>>>>>> pradeep
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Tyler Hobbs
>>>>>> DataStax
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Tyler Hobbs
>>>> DataStax
>> 
>> 
>> 
>> 
>> --
>> Tyler Hobbs
>> DataStax
>> 
>> 


Re: Cassandra Performance Benchmarking.

Posted by Pradeep Kumar Mantha <pr...@gmail.com>.
Hi,

Thanks for the information..

I upgraded my cassandra version to 1.2.0 and tried running the
experiment again to find the statistics.

My application took nearly 529 seconds for querying 76896 keys.

Please find the statistic information below for 32 threads ( where
each thread query 76896 keys ) obtained just after the experiment.

(mypython_repo)-bash-3.2$ nodetool -host XX.XX.XX.XX -p 7199 proxyhistograms
proxy histograms
Offset          Read Latency     Write Latency     Range Latency
1                          0                 0                 0
2                          0                 0                 0
3                          0                 0                 0
4                          0                 0                 0
5                          0                 0                 0
6                          0                 0                 0
7                          0                 0                 0
8                          0                 0                 0
10                         0                 0                 0
12                         0                 0                 0
14                         0                 0                 0
17                         0                 0                 0
20                         0                 0                 0
24                         0                 0                 0
29                         0                 0                 0
35                         0                 0                 0
42                         0                 0                 0
50                         0                 0                 0
60                         0                 0                 0
72                         0                 0                 0
86                         0                 0                 0
103                        0                 0                 0
124                        0                 0                 0
149                        0                 0                 0
179                        0                 0                 0
215                        0                 0                 0
258                        0                 0                 0
310                        2                 0                 0
372                      233                 0                 0
446                     7291                 0                 0
535                     9669                 0                 0
642                    34917                 0                 0
770                    73709                 0                 0
924                    45270                 0                 0
1109                   18186                 0                 0
1331                    6931                 0                 0
1597                    2111                 0                 0
1916                     661                 0                 0
2299                     285                 0                 0
2759                     123                 0                 0
3311                      56                 0                 0
3973                      47                 0                 0
4768                      45                 0                 0
5722                      42                 0                 0
6866                      43                 0                 0
8239                      60                 0                 0
9887                      41                 0                 0
11864                     42                 0                 0
14237                     32                 0                 0
17084                     50                 0                 0
20501                     51                 0                 0
24601                     55                 0                 0
29521                     43                 0                 0
35425                     26                 0                 0
42510                     30                 0                 0
51012                     37                 0                 0
61214                     46                 0                 0
73457                     60                 0                 0
88148                    106                 0                 0
105778                   127                 0                 0
126934                   168                 0                 0
152321                   110                 0                 0
182785                    71                 0                 0
219342                    22                 0                 0
263210                    10                 0                 0
315852                     2                 0                 0
379022                     2                 0                 0
454826                     5                 0                 0
545791                     0                 0                 0
654949                     8                 0                 0
785939                     0                 0                 0
943127                     0                 0                 0
1131752                    2                 0                 0
1358102                    1                 0                 0
1629722                    3                 0                 0
1955666                    2                 0                 0
2346799                    0                 0                 0
2816159                    0                 0                 0
3379391                    3                 0                 0
4055269                    0                 0                 0
4866323                    0                 0                 0
5839588                    0                 0                 0
7007506                    0                 0                 0
8409007                    0                 0                 0
10090808                   0                 0                 0
12108970                   0                 0                 0
14530764                   0                 0                 0
17436917                   0                 0                 0
20924300                   0                 0                 0
25109160                   0                 0                 0


(mypython_repo)-bash-3.2$ nodetool -host XX.XX.XX.XX -p 7199
cfhistograms Blast Blast_NR
Blast/Blast_NR histograms
Offset      SSTables     Write Latency      Read Latency          Row
Size      Column Count
1             215220                 0                        0
                         0                 0
2                  0                     0                        0
                             0           1281975
3                  0                     0                        0
                             0                 0
4                  0                 0                 0
  0                 0
5                  0                 0                 0
  0                 0
6                  0                 0                 0
  0                 0
7                  0                 0                 0
  0                 0
8                  0                 0                 0
  0                 0
10                 0                 0                 0
  0                 0
12                 0                 0                 0
  0                 0
14                 0                 0                 0
  0                 0
17                 0                 0                 0
  0                 0
20                 0                 0                 0
  0                 0
24                 0                 0                 0
  0                 0
29                 0                 0                 0
  0                 0
35                 0                 0                 0
  0                 0
42                 0                 0                 0
  0                 0
50                 0                 0                 0
  0                 0
60                 0                 0                 0
  0                 0
72                 0                 0                 0
  0                 0
86                 0                 0                 0
  0                 0
103                0                 0                 0
  0                 0
124                0                 0                 0
  0                 0
149                0                 0                 0
 47                 0
179                0                 0                 0
42067                 0
215                0                 0                 0
108857                 0
258                0                 0                 4
126799                 0
310                0                 0                27
150803                 0
372                0                 0            154314
159451                 0
446                0                 0             47355
184528                 0
535                0                 0             11445
175136                 0
642                0                 0              1220
141025                 0
770                0                 0               686
81703                 0
924                0                 0                15
49067                 0
1109               0                 0                 1
29531                 0
1331               0                 0                 1
16366                 0
1597               0                 0                 1
7974                 0
1916               0                 0                 3
3816                 0
2299               0                 0                 2
2063                 0
2759               0                 0                 1
1142                 0
3311               0                 0                 4
646                 0
3973               0                 0                 4
382                 0
4768               0                 0                 2
295                 0
5722               0                 0                12
150                 0
6866               0                 0                15
 56                 0
8239               0                 0                17
 38                 0
9887               0                 0                11
 18                 0
11864              0                 0                11
 10                 0
14237              0                 0                 6
  0                 0
17084              0                 0                 3
  4                 0
20501              0                 0                 1
  0                 0
24601              0                 0                37
  0                 0
29521              0                 0                 9
  0                 0
35425              0                 0                 6
  1                 0
42510              0                 0                 4
  0                 0
51012              0                 0                 1
  0                 0
61214              0                 0                 0
  0                 0
73457              0                 0                 0
  0                 0
88148              0                 0                 0
  0                 0
105778             0                 0                 0
  0                 0
126934             0                 0                 0
  0                 0
152321             0                 0                 0
  0                 0
182785             0                 0                 0
  0                 0
219342             0                 0                 2
  0                 0
263210             0                 0                 0
  0                 0
315852             0                 0                 0
  0                 0
379022             0                 0                 0
  0                 0
454826             0                 0                 0
  0                 0
545791             0                 0                 0
  0                 0
654949             0                 0                 0
  0                 0
785939             0                 0                 0
  0                 0
943127             0                 0                 0
  0                 0
1131752            0                 0                 0
  0                 0
1358102            0                 0                 0
  0                 0
1629722            0                 0                 0
  0                 0
1955666            0                 0                 0
  0                 0
2346799            0                 0                 0
  0                 0
2816159            0                 0                 0
  0                 0
3379391            0                 0                 0
  0                 0
4055269            0                 0                 0
  0                 0
4866323            0                 0                 0
  0                 0
5839588            0                 0                 0
  0                 0
7007506            0                 0                 0
  0                 0
8409007            0                 0                 0
  0                 0
10090808           0                 0                 0
  0                 0
12108970           0                 0                 0
  0                 0
14530764           0                 0                 0
  0                 0
17436917           0                 0                 0
  0                 0
20924300           0                 0                 0
  0                 0
25109160           0                 0                 0
  0                 0

Could you please let me know how I can analyze this information.


thanks
pradeep

On Mon, Jan 21, 2013 at 12:02 AM, aaron morton <aa...@thelastpickle.com> wrote:
> You can also see what it looks like from the server side.
>
> nodetool proxyhistograms will show you full request latency recorded by the
> coordinator.
> nodetool cfhistograms will show you the local read latency, this is just the
> time it takes to read data on a replica and does not include network or wait
> times.
>
> If the proxyhistograms is showing most requests running faster than your app
> says it's your app.
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 19/01/2013, at 8:16 AM, Tyler Hobbs <ty...@datastax.com> wrote:
>
> The fact that it's still exactly 521 seconds is very suspicious.  I can't
> debug your script over the mailing list, but do some sanity checks to make
> sure there's not a bottleneck somewhere you don't expect.
>
>
> On Fri, Jan 18, 2013 at 12:44 PM, Pradeep Kumar Mantha
> <pr...@gmail.com> wrote:
>>
>> Hi,
>>
>> Thanks Tyler.
>>
>> Below is the *global* connection pool I am trying to use, where the
>> server_list contains all the ips of 12 DataNodes I am using and
>> pool_size is the number of threads  and I just set to timeout to 60 to
>> avoid connection retry errors.
>>
>> pool = pycassa.ConnectionPool('Blast',
>> server_list=server_list,pool_size=32,timeout=60)
>>
>>
>> It seems the performance is still stuck at 521 seconds.. which is 177
>> seconds for cassandra-cli.
>>
>> Am I still missing something?
>>
>> thanks
>> Pradeep
>>
>>
>>
>> On Fri, Jan 18, 2013 at 7:12 AM, Tyler Hobbs <ty...@datastax.com> wrote:
>> > You just need to increase the ConnectionPool size to handle the number
>> > of
>> > threads you have using it concurrently.  Set the pool_size kwarg to at
>> > least
>> > the number of threads you're using.
>> >
>> >
>> > On Thu, Jan 17, 2013 at 6:46 PM, Pradeep Kumar Mantha
>> > <pr...@gmail.com>
>> > wrote:
>> >>
>> >> Thanks Tyler.
>> >>
>> >> I just moved the pool and cf which store the connection pool and CF
>> >> information to have global scope.
>> >>
>> >> Increased the server_list values from 1 to 4. ( i think i can increase
>> >> them max to 12 since I have 12 data nodes )
>> >>
>> >> when I created 8 threads  using python threading package , I see the
>> >> below error.
>> >>
>> >> Exception in thread Thread-3:
>> >> Traceback (most recent call last):
>> >>   File
>> >> "/usr/common/usg/python/2.7.1-20110310/lib64/python2.7/threading.py",
>> >> line 530, in __bootstrap_inner
>> >>     self.run()
>> >>   File "my_cc.py", line 20, in run
>> >>     start_cassandra_client(self.name)
>> >>   File "my_cc.py", line 33, in start_cassandra_client
>> >>     cf.get(key)
>> >>   File
>> >>
>> >> "/global/homes/p/pmantha/mypython_repo/lib/python2.7/site-packages/pycassa/columnfamily.py",
>> >> line 652, in get
>> >>     read_consistency_level or self.read_consistency_level)
>> >>   File
>> >>
>> >> "/global/homes/p/pmantha/mypython_repo/lib/python2.7/site-packages/pycassa/pool.py",
>> >> line 553, in execute
>> >>     conn = self.get()
>> >>   File
>> >>
>> >> "/global/homes/p/pmantha/mypython_repo/lib/python2.7/site-packages/pycassa/pool.py",
>> >> line 536, in get
>> >>     raise NoConnectionAvailable(message)
>> >> NoConnectionAvailable: ConnectionPool limit of size 5 reached, unable
>> >> to obtain connection after 30 seconds
>> >>
>> >>
>> >> Please have a look at the script attached.. and let me know if I need
>> >> to change something.. Please bear with me, if I do something terribly
>> >> wrong..
>> >>
>> >> I am running the script on a 8 processor node.
>> >>
>> >> thanks
>> >> pradeep
>> >>
>> >> On Thu, Jan 17, 2013 at 4:18 PM, Tyler Hobbs <ty...@datastax.com>
>> >> wrote:
>> >> > ConnectionPools and ColumnFamilies are thread-safe in pycassa, and
>> >> > it's
>> >> > best
>> >> > to share them across multiple threads.  Of course, when you do that,
>> >> > make
>> >> > sure to make the ConnectionPool large enough to support all of the
>> >> > threads
>> >> > making queries concurrently.  I'm also not sure if you're just
>> >> > omitting
>> >> > this, but pycassa's ConnectionPool will only open connections to
>> >> > servers
>> >> > you
>> >> > explicitly include in server_list; there's no autodiscovery of other
>> >> > nodes
>> >> > going on.
>> >> >
>> >> > Depending on your network latency, you'll top out on python
>> >> > performance
>> >> > with
>> >> > a fairly low number of threads due to the GIL.  It's best to use
>> >> > multiple
>> >> > processes if you really want to benchmark something.
>> >> >
>> >> >
>> >> > On Thu, Jan 17, 2013 at 6:05 PM, Pradeep Kumar Mantha
>> >> > <pr...@gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> Hi,
>> >> >>
>> >> >> Thanks. I would like to benchmark cassandra with our application so
>> >> >> that we understand the details of how the actual benchmarking is
>> >> >> done.
>> >> >> Not sure, how easy it would be to integrate YCSB with our
>> >> >> application.
>> >> >>
>> >> >> So, i am trying different client interfaces to cassandra.
>> >> >>
>> >> >> I found
>> >> >>
>> >> >> for 12 Data Nodes Cassandra cluster and 1 Client Node which run 32
>> >> >> threads ( each querying X number of queries ).
>> >> >>
>> >> >> cassandra-cli     took 133 seconds
>> >> >> pycassa took 521 seconds.
>> >> >>
>> >> >> Here is the python pycassa code used to query and passed to each
>> >> >> thread....
>> >> >>
>> >> >> def start_cassandra_client(Threadname):
>> >> >>         pool = pycassa.ConnectionPool('Blast',
>> >> >> server_list=['xxx.xx.xx.xx'])
>> >> >>         cf = pycassa.ColumnFamily(pool, 'Blast_NR')
>> >> >>         inp_file=open("pycassa_100%_query")
>> >> >>         for key in inp_file:
>> >> >>                 key=key.strip()
>> >> >>                 cf.get(key)
>> >> >>
>> >> >> Does Java clients like Hector/Astynax help here.. I am more
>> >> >> comfortable with Python than Java and our existing application is
>> >> >> also
>> >> >> in Python.
>> >> >>
>> >> >> thanks
>> >> >> pradeep
>> >> >>
>> >> >>
>> >> >> On Thu, Jan 17, 2013 at 2:08 PM, Edward Capriolo
>> >> >> <ed...@gmail.com>
>> >> >> wrote:
>> >> >> > Wow you managed to do a load test through the cassandra-cli. There
>> >> >> > should be
>> >> >> > a merit badge for that.
>> >> >> >
>> >> >> > You should use the built in stress tool or YCSB.
>> >> >> >
>> >> >> > The CLI has to do much more string conversion then a normal client
>> >> >> > would
>> >> >> > and
>> >> >> > it is not built for performance. You will definitely get better
>> >> >> > numbers
>> >> >> > through other means.
>> >> >> >
>> >> >> > On Thu, Jan 17, 2013 at 2:10 PM, Pradeep Kumar Mantha
>> >> >> > <pr...@gmail.com>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> Hi,
>> >> >> >>
>> >> >> >> I am trying to maximize execution of the number of read
>> >> >> >> queries/second.
>> >> >> >>
>> >> >> >> Here is my cluster configuration.
>> >> >> >>
>> >> >> >> Replication - Default
>> >> >> >> 12 Data Nodes.
>> >> >> >> 16 Client Nodes - used for querying.
>> >> >> >>
>> >> >> >> Each client node executes 32 threads - each thread executes 76896
>> >> >> >> read
>> >> >> >> queries using  cassandra-cli tool.
>> >> >> >>        i.e all the read queries are stored in a file and that
>> >> >> >> file
>> >> >> >> is
>> >> >> >> given to cassandra-cli tool ( using -f option ) which is executed
>> >> >> >> by
>> >> >> >> a
>> >> >> >> thread.
>> >> >> >> so, total number of queries for 16 client Nodes is 16 * 32 *
>> >> >> >> 76896.
>> >> >> >>
>> >> >> >> The read queries on each client node submitted at the same time.
>> >> >> >> The
>> >> >> >> time taken for 16 * 32 * 76896 read queries is nearly 742 seconds
>> >> >> >> -
>> >> >> >> which is nearly 53k transactions/second.
>> >> >> >>
>> >> >> >> I would like to know if there is any other way/tool through which
>> >> >> >> I
>> >> >> >> can improve the number of transactions/second.
>> >> >> >> Is the performance affected by cassandra-cli tool?
>> >> >> >>
>> >> >> >> thanks
>> >> >> >> pradeep
>> >> >> >
>> >> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Tyler Hobbs
>> >> > DataStax
>> >
>> >
>> >
>> >
>> > --
>> > Tyler Hobbs
>> > DataStax
>
>
>
>
> --
> Tyler Hobbs
> DataStax
>
>

Re: Cassandra Performance Benchmarking.

Posted by aaron morton <aa...@thelastpickle.com>.
You can also see what it looks like from the server side. 

nodetool proxyhistograms will show you full request latency recorded by the coordinator. 
nodetool cfhistograms will show you the local read latency, this is just the time it takes to read data on a replica and does not include network or wait times. 

If the proxyhistograms is showing most requests running faster than your app says it's your app. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 19/01/2013, at 8:16 AM, Tyler Hobbs <ty...@datastax.com> wrote:

> The fact that it's still exactly 521 seconds is very suspicious.  I can't debug your script over the mailing list, but do some sanity checks to make sure there's not a bottleneck somewhere you don't expect.
> 
> 
> On Fri, Jan 18, 2013 at 12:44 PM, Pradeep Kumar Mantha <pr...@gmail.com> wrote:
> Hi,
> 
> Thanks Tyler.
> 
> Below is the *global* connection pool I am trying to use, where the
> server_list contains all the ips of 12 DataNodes I am using and
> pool_size is the number of threads  and I just set to timeout to 60 to
> avoid connection retry errors.
> 
> pool = pycassa.ConnectionPool('Blast',
> server_list=server_list,pool_size=32,timeout=60)
> 
> 
> It seems the performance is still stuck at 521 seconds.. which is 177
> seconds for cassandra-cli.
> 
> Am I still missing something?
> 
> thanks
> Pradeep
> 
> 
> 
> On Fri, Jan 18, 2013 at 7:12 AM, Tyler Hobbs <ty...@datastax.com> wrote:
> > You just need to increase the ConnectionPool size to handle the number of
> > threads you have using it concurrently.  Set the pool_size kwarg to at least
> > the number of threads you're using.
> >
> >
> > On Thu, Jan 17, 2013 at 6:46 PM, Pradeep Kumar Mantha <pr...@gmail.com>
> > wrote:
> >>
> >> Thanks Tyler.
> >>
> >> I just moved the pool and cf which store the connection pool and CF
> >> information to have global scope.
> >>
> >> Increased the server_list values from 1 to 4. ( i think i can increase
> >> them max to 12 since I have 12 data nodes )
> >>
> >> when I created 8 threads  using python threading package , I see the
> >> below error.
> >>
> >> Exception in thread Thread-3:
> >> Traceback (most recent call last):
> >>   File
> >> "/usr/common/usg/python/2.7.1-20110310/lib64/python2.7/threading.py",
> >> line 530, in __bootstrap_inner
> >>     self.run()
> >>   File "my_cc.py", line 20, in run
> >>     start_cassandra_client(self.name)
> >>   File "my_cc.py", line 33, in start_cassandra_client
> >>     cf.get(key)
> >>   File
> >> "/global/homes/p/pmantha/mypython_repo/lib/python2.7/site-packages/pycassa/columnfamily.py",
> >> line 652, in get
> >>     read_consistency_level or self.read_consistency_level)
> >>   File
> >> "/global/homes/p/pmantha/mypython_repo/lib/python2.7/site-packages/pycassa/pool.py",
> >> line 553, in execute
> >>     conn = self.get()
> >>   File
> >> "/global/homes/p/pmantha/mypython_repo/lib/python2.7/site-packages/pycassa/pool.py",
> >> line 536, in get
> >>     raise NoConnectionAvailable(message)
> >> NoConnectionAvailable: ConnectionPool limit of size 5 reached, unable
> >> to obtain connection after 30 seconds
> >>
> >>
> >> Please have a look at the script attached.. and let me know if I need
> >> to change something.. Please bear with me, if I do something terribly
> >> wrong..
> >>
> >> I am running the script on a 8 processor node.
> >>
> >> thanks
> >> pradeep
> >>
> >> On Thu, Jan 17, 2013 at 4:18 PM, Tyler Hobbs <ty...@datastax.com> wrote:
> >> > ConnectionPools and ColumnFamilies are thread-safe in pycassa, and it's
> >> > best
> >> > to share them across multiple threads.  Of course, when you do that,
> >> > make
> >> > sure to make the ConnectionPool large enough to support all of the
> >> > threads
> >> > making queries concurrently.  I'm also not sure if you're just omitting
> >> > this, but pycassa's ConnectionPool will only open connections to servers
> >> > you
> >> > explicitly include in server_list; there's no autodiscovery of other
> >> > nodes
> >> > going on.
> >> >
> >> > Depending on your network latency, you'll top out on python performance
> >> > with
> >> > a fairly low number of threads due to the GIL.  It's best to use
> >> > multiple
> >> > processes if you really want to benchmark something.
> >> >
> >> >
> >> > On Thu, Jan 17, 2013 at 6:05 PM, Pradeep Kumar Mantha
> >> > <pr...@gmail.com>
> >> > wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> Thanks. I would like to benchmark cassandra with our application so
> >> >> that we understand the details of how the actual benchmarking is done.
> >> >> Not sure, how easy it would be to integrate YCSB with our application.
> >> >>
> >> >> So, i am trying different client interfaces to cassandra.
> >> >>
> >> >> I found
> >> >>
> >> >> for 12 Data Nodes Cassandra cluster and 1 Client Node which run 32
> >> >> threads ( each querying X number of queries ).
> >> >>
> >> >> cassandra-cli     took 133 seconds
> >> >> pycassa took 521 seconds.
> >> >>
> >> >> Here is the python pycassa code used to query and passed to each
> >> >> thread....
> >> >>
> >> >> def start_cassandra_client(Threadname):
> >> >>         pool = pycassa.ConnectionPool('Blast',
> >> >> server_list=['xxx.xx.xx.xx'])
> >> >>         cf = pycassa.ColumnFamily(pool, 'Blast_NR')
> >> >>         inp_file=open("pycassa_100%_query")
> >> >>         for key in inp_file:
> >> >>                 key=key.strip()
> >> >>                 cf.get(key)
> >> >>
> >> >> Does Java clients like Hector/Astynax help here.. I am more
> >> >> comfortable with Python than Java and our existing application is also
> >> >> in Python.
> >> >>
> >> >> thanks
> >> >> pradeep
> >> >>
> >> >>
> >> >> On Thu, Jan 17, 2013 at 2:08 PM, Edward Capriolo
> >> >> <ed...@gmail.com>
> >> >> wrote:
> >> >> > Wow you managed to do a load test through the cassandra-cli. There
> >> >> > should be
> >> >> > a merit badge for that.
> >> >> >
> >> >> > You should use the built in stress tool or YCSB.
> >> >> >
> >> >> > The CLI has to do much more string conversion then a normal client
> >> >> > would
> >> >> > and
> >> >> > it is not built for performance. You will definitely get better
> >> >> > numbers
> >> >> > through other means.
> >> >> >
> >> >> > On Thu, Jan 17, 2013 at 2:10 PM, Pradeep Kumar Mantha
> >> >> > <pr...@gmail.com>
> >> >> > wrote:
> >> >> >>
> >> >> >> Hi,
> >> >> >>
> >> >> >> I am trying to maximize execution of the number of read
> >> >> >> queries/second.
> >> >> >>
> >> >> >> Here is my cluster configuration.
> >> >> >>
> >> >> >> Replication - Default
> >> >> >> 12 Data Nodes.
> >> >> >> 16 Client Nodes - used for querying.
> >> >> >>
> >> >> >> Each client node executes 32 threads - each thread executes 76896
> >> >> >> read
> >> >> >> queries using  cassandra-cli tool.
> >> >> >>        i.e all the read queries are stored in a file and that file
> >> >> >> is
> >> >> >> given to cassandra-cli tool ( using -f option ) which is executed by
> >> >> >> a
> >> >> >> thread.
> >> >> >> so, total number of queries for 16 client Nodes is 16 * 32 * 76896.
> >> >> >>
> >> >> >> The read queries on each client node submitted at the same time. The
> >> >> >> time taken for 16 * 32 * 76896 read queries is nearly 742 seconds -
> >> >> >> which is nearly 53k transactions/second.
> >> >> >>
> >> >> >> I would like to know if there is any other way/tool through which I
> >> >> >> can improve the number of transactions/second.
> >> >> >> Is the performance affected by cassandra-cli tool?
> >> >> >>
> >> >> >> thanks
> >> >> >> pradeep
> >> >> >
> >> >> >
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > Tyler Hobbs
> >> > DataStax
> >
> >
> >
> >
> > --
> > Tyler Hobbs
> > DataStax
> 
> 
> 
> -- 
> Tyler Hobbs
> DataStax


Re: Cassandra Performance Benchmarking.

Posted by Tyler Hobbs <ty...@datastax.com>.
The fact that it's still exactly 521 seconds is very suspicious.  I can't
debug your script over the mailing list, but do some sanity checks to make
sure there's not a bottleneck somewhere you don't expect.


On Fri, Jan 18, 2013 at 12:44 PM, Pradeep Kumar Mantha <pradeepm66@gmail.com
> wrote:

> Hi,
>
> Thanks Tyler.
>
> Below is the *global* connection pool I am trying to use, where the
> server_list contains all the ips of 12 DataNodes I am using and
> pool_size is the number of threads  and I just set to timeout to 60 to
> avoid connection retry errors.
>
> pool = pycassa.ConnectionPool('Blast',
> server_list=server_list,pool_size=32,timeout=60)
>
>
> It seems the performance is still stuck at 521 seconds.. which is 177
> seconds for cassandra-cli.
>
> Am I still missing something?
>
> thanks
> Pradeep
>
>
>
> On Fri, Jan 18, 2013 at 7:12 AM, Tyler Hobbs <ty...@datastax.com> wrote:
> > You just need to increase the ConnectionPool size to handle the number of
> > threads you have using it concurrently.  Set the pool_size kwarg to at
> least
> > the number of threads you're using.
> >
> >
> > On Thu, Jan 17, 2013 at 6:46 PM, Pradeep Kumar Mantha <
> pradeepm66@gmail.com>
> > wrote:
> >>
> >> Thanks Tyler.
> >>
> >> I just moved the pool and cf which store the connection pool and CF
> >> information to have global scope.
> >>
> >> Increased the server_list values from 1 to 4. ( i think i can increase
> >> them max to 12 since I have 12 data nodes )
> >>
> >> when I created 8 threads  using python threading package , I see the
> >> below error.
> >>
> >> Exception in thread Thread-3:
> >> Traceback (most recent call last):
> >>   File
> >> "/usr/common/usg/python/2.7.1-20110310/lib64/python2.7/threading.py",
> >> line 530, in __bootstrap_inner
> >>     self.run()
> >>   File "my_cc.py", line 20, in run
> >>     start_cassandra_client(self.name)
> >>   File "my_cc.py", line 33, in start_cassandra_client
> >>     cf.get(key)
> >>   File
> >>
> "/global/homes/p/pmantha/mypython_repo/lib/python2.7/site-packages/pycassa/columnfamily.py",
> >> line 652, in get
> >>     read_consistency_level or self.read_consistency_level)
> >>   File
> >>
> "/global/homes/p/pmantha/mypython_repo/lib/python2.7/site-packages/pycassa/pool.py",
> >> line 553, in execute
> >>     conn = self.get()
> >>   File
> >>
> "/global/homes/p/pmantha/mypython_repo/lib/python2.7/site-packages/pycassa/pool.py",
> >> line 536, in get
> >>     raise NoConnectionAvailable(message)
> >> NoConnectionAvailable: ConnectionPool limit of size 5 reached, unable
> >> to obtain connection after 30 seconds
> >>
> >>
> >> Please have a look at the script attached.. and let me know if I need
> >> to change something.. Please bear with me, if I do something terribly
> >> wrong..
> >>
> >> I am running the script on a 8 processor node.
> >>
> >> thanks
> >> pradeep
> >>
> >> On Thu, Jan 17, 2013 at 4:18 PM, Tyler Hobbs <ty...@datastax.com>
> wrote:
> >> > ConnectionPools and ColumnFamilies are thread-safe in pycassa, and
> it's
> >> > best
> >> > to share them across multiple threads.  Of course, when you do that,
> >> > make
> >> > sure to make the ConnectionPool large enough to support all of the
> >> > threads
> >> > making queries concurrently.  I'm also not sure if you're just
> omitting
> >> > this, but pycassa's ConnectionPool will only open connections to
> servers
> >> > you
> >> > explicitly include in server_list; there's no autodiscovery of other
> >> > nodes
> >> > going on.
> >> >
> >> > Depending on your network latency, you'll top out on python
> performance
> >> > with
> >> > a fairly low number of threads due to the GIL.  It's best to use
> >> > multiple
> >> > processes if you really want to benchmark something.
> >> >
> >> >
> >> > On Thu, Jan 17, 2013 at 6:05 PM, Pradeep Kumar Mantha
> >> > <pr...@gmail.com>
> >> > wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> Thanks. I would like to benchmark cassandra with our application so
> >> >> that we understand the details of how the actual benchmarking is
> done.
> >> >> Not sure, how easy it would be to integrate YCSB with our
> application.
> >> >>
> >> >> So, i am trying different client interfaces to cassandra.
> >> >>
> >> >> I found
> >> >>
> >> >> for 12 Data Nodes Cassandra cluster and 1 Client Node which run 32
> >> >> threads ( each querying X number of queries ).
> >> >>
> >> >> cassandra-cli     took 133 seconds
> >> >> pycassa took 521 seconds.
> >> >>
> >> >> Here is the python pycassa code used to query and passed to each
> >> >> thread....
> >> >>
> >> >> def start_cassandra_client(Threadname):
> >> >>         pool = pycassa.ConnectionPool('Blast',
> >> >> server_list=['xxx.xx.xx.xx'])
> >> >>         cf = pycassa.ColumnFamily(pool, 'Blast_NR')
> >> >>         inp_file=open("pycassa_100%_query")
> >> >>         for key in inp_file:
> >> >>                 key=key.strip()
> >> >>                 cf.get(key)
> >> >>
> >> >> Does Java clients like Hector/Astynax help here.. I am more
> >> >> comfortable with Python than Java and our existing application is
> also
> >> >> in Python.
> >> >>
> >> >> thanks
> >> >> pradeep
> >> >>
> >> >>
> >> >> On Thu, Jan 17, 2013 at 2:08 PM, Edward Capriolo
> >> >> <ed...@gmail.com>
> >> >> wrote:
> >> >> > Wow you managed to do a load test through the cassandra-cli. There
> >> >> > should be
> >> >> > a merit badge for that.
> >> >> >
> >> >> > You should use the built in stress tool or YCSB.
> >> >> >
> >> >> > The CLI has to do much more string conversion then a normal client
> >> >> > would
> >> >> > and
> >> >> > it is not built for performance. You will definitely get better
> >> >> > numbers
> >> >> > through other means.
> >> >> >
> >> >> > On Thu, Jan 17, 2013 at 2:10 PM, Pradeep Kumar Mantha
> >> >> > <pr...@gmail.com>
> >> >> > wrote:
> >> >> >>
> >> >> >> Hi,
> >> >> >>
> >> >> >> I am trying to maximize execution of the number of read
> >> >> >> queries/second.
> >> >> >>
> >> >> >> Here is my cluster configuration.
> >> >> >>
> >> >> >> Replication - Default
> >> >> >> 12 Data Nodes.
> >> >> >> 16 Client Nodes - used for querying.
> >> >> >>
> >> >> >> Each client node executes 32 threads - each thread executes 76896
> >> >> >> read
> >> >> >> queries using  cassandra-cli tool.
> >> >> >>        i.e all the read queries are stored in a file and that file
> >> >> >> is
> >> >> >> given to cassandra-cli tool ( using -f option ) which is executed
> by
> >> >> >> a
> >> >> >> thread.
> >> >> >> so, total number of queries for 16 client Nodes is 16 * 32 *
> 76896.
> >> >> >>
> >> >> >> The read queries on each client node submitted at the same time.
> The
> >> >> >> time taken for 16 * 32 * 76896 read queries is nearly 742 seconds
> -
> >> >> >> which is nearly 53k transactions/second.
> >> >> >>
> >> >> >> I would like to know if there is any other way/tool through which
> I
> >> >> >> can improve the number of transactions/second.
> >> >> >> Is the performance affected by cassandra-cli tool?
> >> >> >>
> >> >> >> thanks
> >> >> >> pradeep
> >> >> >
> >> >> >
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > Tyler Hobbs
> >> > DataStax
> >
> >
> >
> >
> > --
> > Tyler Hobbs
> > DataStax
>



-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Cassandra Performance Benchmarking.

Posted by Pradeep Kumar Mantha <pr...@gmail.com>.
Hi,

Thanks Tyler.

Below is the *global* connection pool I am trying to use, where the
server_list contains all the ips of 12 DataNodes I am using and
pool_size is the number of threads  and I just set to timeout to 60 to
avoid connection retry errors.

pool = pycassa.ConnectionPool('Blast',
server_list=server_list,pool_size=32,timeout=60)


It seems the performance is still stuck at 521 seconds.. which is 177
seconds for cassandra-cli.

Am I still missing something?

thanks
Pradeep



On Fri, Jan 18, 2013 at 7:12 AM, Tyler Hobbs <ty...@datastax.com> wrote:
> You just need to increase the ConnectionPool size to handle the number of
> threads you have using it concurrently.  Set the pool_size kwarg to at least
> the number of threads you're using.
>
>
> On Thu, Jan 17, 2013 at 6:46 PM, Pradeep Kumar Mantha <pr...@gmail.com>
> wrote:
>>
>> Thanks Tyler.
>>
>> I just moved the pool and cf which store the connection pool and CF
>> information to have global scope.
>>
>> Increased the server_list values from 1 to 4. ( i think i can increase
>> them max to 12 since I have 12 data nodes )
>>
>> when I created 8 threads  using python threading package , I see the
>> below error.
>>
>> Exception in thread Thread-3:
>> Traceback (most recent call last):
>>   File
>> "/usr/common/usg/python/2.7.1-20110310/lib64/python2.7/threading.py",
>> line 530, in __bootstrap_inner
>>     self.run()
>>   File "my_cc.py", line 20, in run
>>     start_cassandra_client(self.name)
>>   File "my_cc.py", line 33, in start_cassandra_client
>>     cf.get(key)
>>   File
>> "/global/homes/p/pmantha/mypython_repo/lib/python2.7/site-packages/pycassa/columnfamily.py",
>> line 652, in get
>>     read_consistency_level or self.read_consistency_level)
>>   File
>> "/global/homes/p/pmantha/mypython_repo/lib/python2.7/site-packages/pycassa/pool.py",
>> line 553, in execute
>>     conn = self.get()
>>   File
>> "/global/homes/p/pmantha/mypython_repo/lib/python2.7/site-packages/pycassa/pool.py",
>> line 536, in get
>>     raise NoConnectionAvailable(message)
>> NoConnectionAvailable: ConnectionPool limit of size 5 reached, unable
>> to obtain connection after 30 seconds
>>
>>
>> Please have a look at the script attached.. and let me know if I need
>> to change something.. Please bear with me, if I do something terribly
>> wrong..
>>
>> I am running the script on a 8 processor node.
>>
>> thanks
>> pradeep
>>
>> On Thu, Jan 17, 2013 at 4:18 PM, Tyler Hobbs <ty...@datastax.com> wrote:
>> > ConnectionPools and ColumnFamilies are thread-safe in pycassa, and it's
>> > best
>> > to share them across multiple threads.  Of course, when you do that,
>> > make
>> > sure to make the ConnectionPool large enough to support all of the
>> > threads
>> > making queries concurrently.  I'm also not sure if you're just omitting
>> > this, but pycassa's ConnectionPool will only open connections to servers
>> > you
>> > explicitly include in server_list; there's no autodiscovery of other
>> > nodes
>> > going on.
>> >
>> > Depending on your network latency, you'll top out on python performance
>> > with
>> > a fairly low number of threads due to the GIL.  It's best to use
>> > multiple
>> > processes if you really want to benchmark something.
>> >
>> >
>> > On Thu, Jan 17, 2013 at 6:05 PM, Pradeep Kumar Mantha
>> > <pr...@gmail.com>
>> > wrote:
>> >>
>> >> Hi,
>> >>
>> >> Thanks. I would like to benchmark cassandra with our application so
>> >> that we understand the details of how the actual benchmarking is done.
>> >> Not sure, how easy it would be to integrate YCSB with our application.
>> >>
>> >> So, i am trying different client interfaces to cassandra.
>> >>
>> >> I found
>> >>
>> >> for 12 Data Nodes Cassandra cluster and 1 Client Node which run 32
>> >> threads ( each querying X number of queries ).
>> >>
>> >> cassandra-cli     took 133 seconds
>> >> pycassa took 521 seconds.
>> >>
>> >> Here is the python pycassa code used to query and passed to each
>> >> thread....
>> >>
>> >> def start_cassandra_client(Threadname):
>> >>         pool = pycassa.ConnectionPool('Blast',
>> >> server_list=['xxx.xx.xx.xx'])
>> >>         cf = pycassa.ColumnFamily(pool, 'Blast_NR')
>> >>         inp_file=open("pycassa_100%_query")
>> >>         for key in inp_file:
>> >>                 key=key.strip()
>> >>                 cf.get(key)
>> >>
>> >> Does Java clients like Hector/Astynax help here.. I am more
>> >> comfortable with Python than Java and our existing application is also
>> >> in Python.
>> >>
>> >> thanks
>> >> pradeep
>> >>
>> >>
>> >> On Thu, Jan 17, 2013 at 2:08 PM, Edward Capriolo
>> >> <ed...@gmail.com>
>> >> wrote:
>> >> > Wow you managed to do a load test through the cassandra-cli. There
>> >> > should be
>> >> > a merit badge for that.
>> >> >
>> >> > You should use the built in stress tool or YCSB.
>> >> >
>> >> > The CLI has to do much more string conversion then a normal client
>> >> > would
>> >> > and
>> >> > it is not built for performance. You will definitely get better
>> >> > numbers
>> >> > through other means.
>> >> >
>> >> > On Thu, Jan 17, 2013 at 2:10 PM, Pradeep Kumar Mantha
>> >> > <pr...@gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> Hi,
>> >> >>
>> >> >> I am trying to maximize execution of the number of read
>> >> >> queries/second.
>> >> >>
>> >> >> Here is my cluster configuration.
>> >> >>
>> >> >> Replication - Default
>> >> >> 12 Data Nodes.
>> >> >> 16 Client Nodes - used for querying.
>> >> >>
>> >> >> Each client node executes 32 threads - each thread executes 76896
>> >> >> read
>> >> >> queries using  cassandra-cli tool.
>> >> >>        i.e all the read queries are stored in a file and that file
>> >> >> is
>> >> >> given to cassandra-cli tool ( using -f option ) which is executed by
>> >> >> a
>> >> >> thread.
>> >> >> so, total number of queries for 16 client Nodes is 16 * 32 * 76896.
>> >> >>
>> >> >> The read queries on each client node submitted at the same time. The
>> >> >> time taken for 16 * 32 * 76896 read queries is nearly 742 seconds -
>> >> >> which is nearly 53k transactions/second.
>> >> >>
>> >> >> I would like to know if there is any other way/tool through which I
>> >> >> can improve the number of transactions/second.
>> >> >> Is the performance affected by cassandra-cli tool?
>> >> >>
>> >> >> thanks
>> >> >> pradeep
>> >> >
>> >> >
>> >
>> >
>> >
>> >
>> > --
>> > Tyler Hobbs
>> > DataStax
>
>
>
>
> --
> Tyler Hobbs
> DataStax

Re: Cassandra Performance Benchmarking.

Posted by Tyler Hobbs <ty...@datastax.com>.
You just need to increase the ConnectionPool size to handle the number of
threads you have using it concurrently.  Set the pool_size kwarg to at
least the number of threads you're using.


On Thu, Jan 17, 2013 at 6:46 PM, Pradeep Kumar Mantha
<pr...@gmail.com>wrote:

> Thanks Tyler.
>
> I just moved the pool and cf which store the connection pool and CF
> information to have global scope.
>
> Increased the server_list values from 1 to 4. ( i think i can increase
> them max to 12 since I have 12 data nodes )
>
> when I created 8 threads  using python threading package , I see the
> below error.
>
> Exception in thread Thread-3:
> Traceback (most recent call last):
>   File
> "/usr/common/usg/python/2.7.1-20110310/lib64/python2.7/threading.py",
> line 530, in __bootstrap_inner
>     self.run()
>   File "my_cc.py", line 20, in run
>     start_cassandra_client(self.name)
>   File "my_cc.py", line 33, in start_cassandra_client
>     cf.get(key)
>   File
> "/global/homes/p/pmantha/mypython_repo/lib/python2.7/site-packages/pycassa/columnfamily.py",
> line 652, in get
>     read_consistency_level or self.read_consistency_level)
>   File
> "/global/homes/p/pmantha/mypython_repo/lib/python2.7/site-packages/pycassa/pool.py",
> line 553, in execute
>     conn = self.get()
>   File
> "/global/homes/p/pmantha/mypython_repo/lib/python2.7/site-packages/pycassa/pool.py",
> line 536, in get
>     raise NoConnectionAvailable(message)
> NoConnectionAvailable: ConnectionPool limit of size 5 reached, unable
> to obtain connection after 30 seconds
>
>
> Please have a look at the script attached.. and let me know if I need
> to change something.. Please bear with me, if I do something terribly
> wrong..
>
> I am running the script on a 8 processor node.
>
> thanks
> pradeep
>
> On Thu, Jan 17, 2013 at 4:18 PM, Tyler Hobbs <ty...@datastax.com> wrote:
> > ConnectionPools and ColumnFamilies are thread-safe in pycassa, and it's
> best
> > to share them across multiple threads.  Of course, when you do that, make
> > sure to make the ConnectionPool large enough to support all of the
> threads
> > making queries concurrently.  I'm also not sure if you're just omitting
> > this, but pycassa's ConnectionPool will only open connections to servers
> you
> > explicitly include in server_list; there's no autodiscovery of other
> nodes
> > going on.
> >
> > Depending on your network latency, you'll top out on python performance
> with
> > a fairly low number of threads due to the GIL.  It's best to use multiple
> > processes if you really want to benchmark something.
> >
> >
> > On Thu, Jan 17, 2013 at 6:05 PM, Pradeep Kumar Mantha <
> pradeepm66@gmail.com>
> > wrote:
> >>
> >> Hi,
> >>
> >> Thanks. I would like to benchmark cassandra with our application so
> >> that we understand the details of how the actual benchmarking is done.
> >> Not sure, how easy it would be to integrate YCSB with our application.
> >>
> >> So, i am trying different client interfaces to cassandra.
> >>
> >> I found
> >>
> >> for 12 Data Nodes Cassandra cluster and 1 Client Node which run 32
> >> threads ( each querying X number of queries ).
> >>
> >> cassandra-cli     took 133 seconds
> >> pycassa took 521 seconds.
> >>
> >> Here is the python pycassa code used to query and passed to each
> >> thread....
> >>
> >> def start_cassandra_client(Threadname):
> >>         pool = pycassa.ConnectionPool('Blast',
> >> server_list=['xxx.xx.xx.xx'])
> >>         cf = pycassa.ColumnFamily(pool, 'Blast_NR')
> >>         inp_file=open("pycassa_100%_query")
> >>         for key in inp_file:
> >>                 key=key.strip()
> >>                 cf.get(key)
> >>
> >> Does Java clients like Hector/Astynax help here.. I am more
> >> comfortable with Python than Java and our existing application is also
> >> in Python.
> >>
> >> thanks
> >> pradeep
> >>
> >>
> >> On Thu, Jan 17, 2013 at 2:08 PM, Edward Capriolo <edlinuxguru@gmail.com
> >
> >> wrote:
> >> > Wow you managed to do a load test through the cassandra-cli. There
> >> > should be
> >> > a merit badge for that.
> >> >
> >> > You should use the built in stress tool or YCSB.
> >> >
> >> > The CLI has to do much more string conversion then a normal client
> would
> >> > and
> >> > it is not built for performance. You will definitely get better
> numbers
> >> > through other means.
> >> >
> >> > On Thu, Jan 17, 2013 at 2:10 PM, Pradeep Kumar Mantha
> >> > <pr...@gmail.com>
> >> > wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> I am trying to maximize execution of the number of read
> queries/second.
> >> >>
> >> >> Here is my cluster configuration.
> >> >>
> >> >> Replication - Default
> >> >> 12 Data Nodes.
> >> >> 16 Client Nodes - used for querying.
> >> >>
> >> >> Each client node executes 32 threads - each thread executes 76896
> read
> >> >> queries using  cassandra-cli tool.
> >> >>        i.e all the read queries are stored in a file and that file is
> >> >> given to cassandra-cli tool ( using -f option ) which is executed by
> a
> >> >> thread.
> >> >> so, total number of queries for 16 client Nodes is 16 * 32 * 76896.
> >> >>
> >> >> The read queries on each client node submitted at the same time. The
> >> >> time taken for 16 * 32 * 76896 read queries is nearly 742 seconds -
> >> >> which is nearly 53k transactions/second.
> >> >>
> >> >> I would like to know if there is any other way/tool through which I
> >> >> can improve the number of transactions/second.
> >> >> Is the performance affected by cassandra-cli tool?
> >> >>
> >> >> thanks
> >> >> pradeep
> >> >
> >> >
> >
> >
> >
> >
> > --
> > Tyler Hobbs
> > DataStax
>



-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Cassandra Performance Benchmarking.

Posted by Pradeep Kumar Mantha <pr...@gmail.com>.
Thanks Tyler.

I just moved the pool and cf which store the connection pool and CF
information to have global scope.

Increased the server_list values from 1 to 4. ( i think i can increase
them max to 12 since I have 12 data nodes )

when I created 8 threads  using python threading package , I see the
below error.

Exception in thread Thread-3:
Traceback (most recent call last):
  File "/usr/common/usg/python/2.7.1-20110310/lib64/python2.7/threading.py",
line 530, in __bootstrap_inner
    self.run()
  File "my_cc.py", line 20, in run
    start_cassandra_client(self.name)
  File "my_cc.py", line 33, in start_cassandra_client
    cf.get(key)
  File "/global/homes/p/pmantha/mypython_repo/lib/python2.7/site-packages/pycassa/columnfamily.py",
line 652, in get
    read_consistency_level or self.read_consistency_level)
  File "/global/homes/p/pmantha/mypython_repo/lib/python2.7/site-packages/pycassa/pool.py",
line 553, in execute
    conn = self.get()
  File "/global/homes/p/pmantha/mypython_repo/lib/python2.7/site-packages/pycassa/pool.py",
line 536, in get
    raise NoConnectionAvailable(message)
NoConnectionAvailable: ConnectionPool limit of size 5 reached, unable
to obtain connection after 30 seconds


Please have a look at the script attached.. and let me know if I need
to change something.. Please bear with me, if I do something terribly
wrong..

I am running the script on a 8 processor node.

thanks
pradeep

On Thu, Jan 17, 2013 at 4:18 PM, Tyler Hobbs <ty...@datastax.com> wrote:
> ConnectionPools and ColumnFamilies are thread-safe in pycassa, and it's best
> to share them across multiple threads.  Of course, when you do that, make
> sure to make the ConnectionPool large enough to support all of the threads
> making queries concurrently.  I'm also not sure if you're just omitting
> this, but pycassa's ConnectionPool will only open connections to servers you
> explicitly include in server_list; there's no autodiscovery of other nodes
> going on.
>
> Depending on your network latency, you'll top out on python performance with
> a fairly low number of threads due to the GIL.  It's best to use multiple
> processes if you really want to benchmark something.
>
>
> On Thu, Jan 17, 2013 at 6:05 PM, Pradeep Kumar Mantha <pr...@gmail.com>
> wrote:
>>
>> Hi,
>>
>> Thanks. I would like to benchmark cassandra with our application so
>> that we understand the details of how the actual benchmarking is done.
>> Not sure, how easy it would be to integrate YCSB with our application.
>>
>> So, i am trying different client interfaces to cassandra.
>>
>> I found
>>
>> for 12 Data Nodes Cassandra cluster and 1 Client Node which run 32
>> threads ( each querying X number of queries ).
>>
>> cassandra-cli     took 133 seconds
>> pycassa took 521 seconds.
>>
>> Here is the python pycassa code used to query and passed to each
>> thread....
>>
>> def start_cassandra_client(Threadname):
>>         pool = pycassa.ConnectionPool('Blast',
>> server_list=['xxx.xx.xx.xx'])
>>         cf = pycassa.ColumnFamily(pool, 'Blast_NR')
>>         inp_file=open("pycassa_100%_query")
>>         for key in inp_file:
>>                 key=key.strip()
>>                 cf.get(key)
>>
>> Does Java clients like Hector/Astynax help here.. I am more
>> comfortable with Python than Java and our existing application is also
>> in Python.
>>
>> thanks
>> pradeep
>>
>>
>> On Thu, Jan 17, 2013 at 2:08 PM, Edward Capriolo <ed...@gmail.com>
>> wrote:
>> > Wow you managed to do a load test through the cassandra-cli. There
>> > should be
>> > a merit badge for that.
>> >
>> > You should use the built in stress tool or YCSB.
>> >
>> > The CLI has to do much more string conversion then a normal client would
>> > and
>> > it is not built for performance. You will definitely get better numbers
>> > through other means.
>> >
>> > On Thu, Jan 17, 2013 at 2:10 PM, Pradeep Kumar Mantha
>> > <pr...@gmail.com>
>> > wrote:
>> >>
>> >> Hi,
>> >>
>> >> I am trying to maximize execution of the number of read queries/second.
>> >>
>> >> Here is my cluster configuration.
>> >>
>> >> Replication - Default
>> >> 12 Data Nodes.
>> >> 16 Client Nodes - used for querying.
>> >>
>> >> Each client node executes 32 threads - each thread executes 76896 read
>> >> queries using  cassandra-cli tool.
>> >>        i.e all the read queries are stored in a file and that file is
>> >> given to cassandra-cli tool ( using -f option ) which is executed by a
>> >> thread.
>> >> so, total number of queries for 16 client Nodes is 16 * 32 * 76896.
>> >>
>> >> The read queries on each client node submitted at the same time. The
>> >> time taken for 16 * 32 * 76896 read queries is nearly 742 seconds -
>> >> which is nearly 53k transactions/second.
>> >>
>> >> I would like to know if there is any other way/tool through which I
>> >> can improve the number of transactions/second.
>> >> Is the performance affected by cassandra-cli tool?
>> >>
>> >> thanks
>> >> pradeep
>> >
>> >
>
>
>
>
> --
> Tyler Hobbs
> DataStax

Re: Cassandra Performance Benchmarking.

Posted by Tyler Hobbs <ty...@datastax.com>.
ConnectionPools and ColumnFamilies are thread-safe in pycassa, and it's
best to share them across multiple threads.  Of course, when you do that,
make sure to make the ConnectionPool large enough to support all of the
threads making queries concurrently.  I'm also not sure if you're just
omitting this, but pycassa's ConnectionPool will only open connections to
servers you explicitly include in server_list; there's no autodiscovery of
other nodes going on.

Depending on your network latency, you'll top out on python performance
with a fairly low number of threads due to the GIL.  It's best to use
multiple processes if you really want to benchmark something.


On Thu, Jan 17, 2013 at 6:05 PM, Pradeep Kumar Mantha
<pr...@gmail.com>wrote:

> Hi,
>
> Thanks. I would like to benchmark cassandra with our application so
> that we understand the details of how the actual benchmarking is done.
> Not sure, how easy it would be to integrate YCSB with our application.
>
> So, i am trying different client interfaces to cassandra.
>
> I found
>
> for 12 Data Nodes Cassandra cluster and 1 Client Node which run 32
> threads ( each querying X number of queries ).
>
> cassandra-cli     took 133 seconds
> pycassa took 521 seconds.
>
> Here is the python pycassa code used to query and passed to each thread....
>
> def start_cassandra_client(Threadname):
>         pool = pycassa.ConnectionPool('Blast',
> server_list=['xxx.xx.xx.xx'])
>         cf = pycassa.ColumnFamily(pool, 'Blast_NR')
>         inp_file=open("pycassa_100%_query")
>         for key in inp_file:
>                 key=key.strip()
>                 cf.get(key)
>
> Does Java clients like Hector/Astynax help here.. I am more
> comfortable with Python than Java and our existing application is also
> in Python.
>
> thanks
> pradeep
>
>
> On Thu, Jan 17, 2013 at 2:08 PM, Edward Capriolo <ed...@gmail.com>
> wrote:
> > Wow you managed to do a load test through the cassandra-cli. There
> should be
> > a merit badge for that.
> >
> > You should use the built in stress tool or YCSB.
> >
> > The CLI has to do much more string conversion then a normal client would
> and
> > it is not built for performance. You will definitely get better numbers
> > through other means.
> >
> > On Thu, Jan 17, 2013 at 2:10 PM, Pradeep Kumar Mantha <
> pradeepm66@gmail.com>
> > wrote:
> >>
> >> Hi,
> >>
> >> I am trying to maximize execution of the number of read queries/second.
> >>
> >> Here is my cluster configuration.
> >>
> >> Replication - Default
> >> 12 Data Nodes.
> >> 16 Client Nodes - used for querying.
> >>
> >> Each client node executes 32 threads - each thread executes 76896 read
> >> queries using  cassandra-cli tool.
> >>        i.e all the read queries are stored in a file and that file is
> >> given to cassandra-cli tool ( using -f option ) which is executed by a
> >> thread.
> >> so, total number of queries for 16 client Nodes is 16 * 32 * 76896.
> >>
> >> The read queries on each client node submitted at the same time. The
> >> time taken for 16 * 32 * 76896 read queries is nearly 742 seconds -
> >> which is nearly 53k transactions/second.
> >>
> >> I would like to know if there is any other way/tool through which I
> >> can improve the number of transactions/second.
> >> Is the performance affected by cassandra-cli tool?
> >>
> >> thanks
> >> pradeep
> >
> >
>



-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Cassandra Performance Benchmarking.

Posted by Pradeep Kumar Mantha <pr...@gmail.com>.
Hi,

Thanks. I would like to benchmark cassandra with our application so
that we understand the details of how the actual benchmarking is done.
Not sure, how easy it would be to integrate YCSB with our application.

So, i am trying different client interfaces to cassandra.

I found

for 12 Data Nodes Cassandra cluster and 1 Client Node which run 32
threads ( each querying X number of queries ).

cassandra-cli     took 133 seconds
pycassa took 521 seconds.

Here is the python pycassa code used to query and passed to each thread....

def start_cassandra_client(Threadname):
        pool = pycassa.ConnectionPool('Blast', server_list=['xxx.xx.xx.xx'])
        cf = pycassa.ColumnFamily(pool, 'Blast_NR')
        inp_file=open("pycassa_100%_query")
        for key in inp_file:
                key=key.strip()
                cf.get(key)

Does Java clients like Hector/Astynax help here.. I am more
comfortable with Python than Java and our existing application is also
in Python.

thanks
pradeep


On Thu, Jan 17, 2013 at 2:08 PM, Edward Capriolo <ed...@gmail.com> wrote:
> Wow you managed to do a load test through the cassandra-cli. There should be
> a merit badge for that.
>
> You should use the built in stress tool or YCSB.
>
> The CLI has to do much more string conversion then a normal client would and
> it is not built for performance. You will definitely get better numbers
> through other means.
>
> On Thu, Jan 17, 2013 at 2:10 PM, Pradeep Kumar Mantha <pr...@gmail.com>
> wrote:
>>
>> Hi,
>>
>> I am trying to maximize execution of the number of read queries/second.
>>
>> Here is my cluster configuration.
>>
>> Replication - Default
>> 12 Data Nodes.
>> 16 Client Nodes - used for querying.
>>
>> Each client node executes 32 threads - each thread executes 76896 read
>> queries using  cassandra-cli tool.
>>        i.e all the read queries are stored in a file and that file is
>> given to cassandra-cli tool ( using -f option ) which is executed by a
>> thread.
>> so, total number of queries for 16 client Nodes is 16 * 32 * 76896.
>>
>> The read queries on each client node submitted at the same time. The
>> time taken for 16 * 32 * 76896 read queries is nearly 742 seconds -
>> which is nearly 53k transactions/second.
>>
>> I would like to know if there is any other way/tool through which I
>> can improve the number of transactions/second.
>> Is the performance affected by cassandra-cli tool?
>>
>> thanks
>> pradeep
>
>

Re: Cassandra Performance Benchmarking.

Posted by Edward Capriolo <ed...@gmail.com>.
Wow you managed to do a load test through the cassandra-cli. There should
be a merit badge for that.

You should use the built in stress tool or YCSB.

The CLI has to do much more string conversion then a normal client would
and it is not built for performance. You will definitely get better numbers
through other means.

On Thu, Jan 17, 2013 at 2:10 PM, Pradeep Kumar Mantha
<pr...@gmail.com>wrote:

> Hi,
>
> I am trying to maximize execution of the number of read queries/second.
>
> Here is my cluster configuration.
>
> Replication - Default
> 12 Data Nodes.
> 16 Client Nodes - used for querying.
>
> Each client node executes 32 threads - each thread executes 76896 read
> queries using  cassandra-cli tool.
>        i.e all the read queries are stored in a file and that file is
> given to cassandra-cli tool ( using -f option ) which is executed by a
> thread.
> so, total number of queries for 16 client Nodes is 16 * 32 * 76896.
>
> The read queries on each client node submitted at the same time. The
> time taken for 16 * 32 * 76896 read queries is nearly 742 seconds -
> which is nearly 53k transactions/second.
>
> I would like to know if there is any other way/tool through which I
> can improve the number of transactions/second.
> Is the performance affected by cassandra-cli tool?
>
> thanks
> pradeep
>