You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by ka...@comcast.net on 2013/02/13 14:49:54 UTC

Write performance expectations...

Hello, 
New member here, and I have (yet another) question on write performance. 

I'm using Apache Cassandra version 1.1, Python 2.7 and Pycassa 1.7. 

I have a cluster of 2 datacenters, each with 3 nodes, on AWS EC2 using EBS and the RandomPartioner. I'm writing to a column family in a keyspace that's replicated to all nodes in both datacenters, with a consistency level of LOCAL_QUORUM. 

I'm seeing write performance of around 2500 rows per second. 

Is this in the ballpark for this kind of configuration? 

Thanks in advance. 




Ken.... 

Re: Write performance expectations...

Posted by Ken Adey <ka...@comcast.net>.
On a single processor EC2 instance, however, multiprocessing would be 
useless.

Ken....

On 2/13/2013 5:29 PM, Ben Bromhead wrote:
> If you are using CPython (most likely) remember to use the 
> multiprocessing interface rather than multithreading to avoid the 
> global interpreter lock.
>
> Cheers
>
> Ben
>
> On Thu, Feb 14, 2013 at 4:35 AM, <kadey@comcast.net 
> <ma...@comcast.net>> wrote:
>
>     I'm not using multi-threads/processes. I'll try multi-threading to
>     see if I get a boost.
>
>     Thanks.
>
>     Ken....
>
>
>     ------------------------------------------------------------------------
>     *From: *"Tyler Hobbs" <tyler@datastax.com <ma...@datastax.com>>
>     *To: *user@cassandra.apache.org <ma...@cassandra.apache.org>
>     *Sent: *Wednesday, February 13, 2013 11:06:30 AM
>     *Subject: *Re: Write performance expectations...
>
>
>     2500 inserts per second is about what a single python thread using
>     pycassa can do against a local node.  Are you using multiple
>     threads for the inserts? Multiple processes?
>
>
>     On Wed, Feb 13, 2013 at 8:21 AM, Alain RODRIGUEZ
>     <arodrime@gmail.com <ma...@gmail.com>> wrote:
>
>         Is there a particular reason for you to use EBS ? Instance
>         Store are recommended because they improve performances by
>         reducing the I/O throttling.
>
>         An other thing you should be aware of is that replicating the
>         data to all node reduce your performance, it is more or less
>         like if you had only one node (at performance level I mean).
>
>         Also, writing to different datacenters probably induce some
>         network latency.
>
>         You should give the EC2 instance type (m1.xlarge / m1.large /
>         ...) if you want some feedback about the 2500 w/s, and also
>         give the mean size of your rows.
>
>         Alain
>
>
>         2013/2/13 <kadey@comcast.net <ma...@comcast.net>>
>
>             Hello,
>                  New member here, and I have (yet another) question on
>             write performance.
>
>             I'm using Apache Cassandra version 1.1, Python 2.7 and
>             Pycassa 1.7.
>
>             I have a cluster of 2 datacenters, each with 3 nodes, on
>             AWS EC2 using EBS and the RandomPartioner. I'm writing to
>             a column family in a keyspace that's replicated to all
>             nodes in both datacenters, with a consistency level of
>             LOCAL_QUORUM.
>
>             I'm seeing write performance of around 2500 rows per second.
>
>             Is this in the ballpark for this kind of configuration?
>
>             Thanks in advance.
>
>             Ken....
>
>
>
>
>
>     -- 
>     Tyler Hobbs
>     DataStax <http://datastax.com/>
>
>

Re: Write performance expectations...

Posted by Ben Bromhead <be...@instaclustr.com>.
If you are using CPython (most likely) remember to use the multiprocessing
interface rather than multithreading to avoid the global interpreter lock.

Cheers

Ben

On Thu, Feb 14, 2013 at 4:35 AM, <ka...@comcast.net> wrote:

> I'm not using multi-threads/processes. I'll try multi-threading to see if
> I get a boost.
>
> Thanks.
>
> Ken....
>
>
> ------------------------------
> *From: *"Tyler Hobbs" <ty...@datastax.com>
> *To: *user@cassandra.apache.org
> *Sent: *Wednesday, February 13, 2013 11:06:30 AM
> *Subject: *Re: Write performance expectations...
>
>
> 2500 inserts per second is about what a single python thread using pycassa
> can do against a local node.  Are you using multiple threads for the
> inserts? Multiple processes?
>
>
> On Wed, Feb 13, 2013 at 8:21 AM, Alain RODRIGUEZ <ar...@gmail.com>wrote:
>
>> Is there a particular reason for you to use EBS ? Instance Store
>> are recommended because they improve performances by reducing the I/O
>> throttling.
>>
>> An other thing you should be aware of is that replicating the data to all
>> node reduce your performance, it is more or less like if you had only one
>> node (at performance level I mean).
>>
>> Also, writing to different datacenters probably induce some network
>> latency.
>>
>> You should give the EC2 instance type (m1.xlarge / m1.large / ...) if you
>> want some feedback about the 2500 w/s, and also give the mean size of your
>> rows.
>>
>> Alain
>>
>>
>> 2013/2/13 <ka...@comcast.net>
>>
>> Hello,
>>>      New member here, and I have (yet another) question on write
>>> performance.
>>>
>>> I'm using Apache Cassandra version 1.1, Python 2.7 and Pycassa 1.7.
>>>
>>> I have a cluster of 2 datacenters, each with 3 nodes, on AWS EC2 using
>>> EBS and the RandomPartioner. I'm writing to a column family in a keyspace
>>> that's replicated to all nodes in both datacenters, with a consistency
>>> level of LOCAL_QUORUM.
>>>
>>> I'm seeing write performance of around 2500 rows per second.
>>>
>>> Is this in the ballpark for this kind of configuration?
>>>
>>> Thanks in advance.
>>>
>>> Ken....
>>>
>>>
>>
>
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>
>

Re: Write performance expectations...

Posted by Alain RODRIGUEZ <ar...@gmail.com>.
A m1.small will probably be unable to maximize throughput on your m1.large
cluster.

"If you don't use EBS, how is data persistence then maintained in the event
that an instance goes down for whatever reason?"

You answered by yourself earlier in this thread : "I'm writing to a column
family in a keyspace that's replicated to all nodes in both datacenters"
So if one of your node goes down for any reason you'll have to bootstrap a
new one to replace the dead node, which will take data on remaining
replicas.

You're in the first anti-pattern listed here :
http://www.datastax.com/docs/1.1/cluster_architecture/anti_patterns using
EBS.

Alain



2013/2/14 <ka...@comcast.net>

> Alain,
>      I found out that the client node is an m1.small, and the cassandra
> nodes are m1.large.
>
> This is what is contained in each row: {dev1-dc1r-redir-0.unica.net/B9tk:
> {batchID: 2486272}}. Not a whole lot of data.
>
> If you don't use EBS, how is data persistence then maintained in the event
> that an instance goes down for whatever reason?
>
> Ken....
>
> ------------------------------
> *From: *"Alain RODRIGUEZ" <ar...@gmail.com>
> *To: *user@cassandra.apache.org
> *Sent: *Thursday, February 14, 2013 8:34:06 AM
>
> *Subject: *Re: Write performance expectations...
>
> Hi Ken,
>
> You really should take a look at my first answer... and give us more
> information on the size of your inserts, the type of EC2 you are using at
> least. You should also consider using Instance store and not EBS. Well,
> look at all these things I already told you.
>
> Alain
>
>
> 2013/2/14 Peter Lin <wo...@gmail.com>
>
>> it could be the instances are IO limited.
>>
>> I've been running benchmarks with Cassandra 1.1.9 the last 2 weeks on
>> a AMD FX 8 core with 32GB of ram.
>>
>> with 24 threads I get roughly 20K inserts per second. each insert is
>> only about 100-150 bytes.
>>
>> On Thu, Feb 14, 2013 at 8:07 AM,  <ka...@comcast.net> wrote:
>> > Using multithreading, inserting 2000 per thread, resulted in no
>> throughput
>> > increase. Each thread is taking about 4 seconds per, indicating a
>> bottleneck
>> > elsewhere.
>> >
>> > Ken....
>> >
>> > ________________________________
>> > From: "Tyler Hobbs" <ty...@datastax.com>
>> > To: user@cassandra.apache.org
>> > Sent: Wednesday, February 13, 2013 11:06:30 AM
>> >
>> > Subject: Re: Write performance expectations...
>> >
>> > 2500 inserts per second is about what a single python thread using
>> pycassa
>> > can do against a local node.  Are you using multiple threads for the
>> > inserts? Multiple processes?
>> >
>> >
>> > On Wed, Feb 13, 2013 at 8:21 AM, Alain RODRIGUEZ <ar...@gmail.com>
>> wrote:
>> >>
>> >> Is there a particular reason for you to use EBS ? Instance Store are
>> >> recommended because they improve performances by reducing the I/O
>> >> throttling.
>> >>
>> >> An other thing you should be aware of is that replicating the data to
>> all
>> >> node reduce your performance, it is more or less like if you had only
>> one
>> >> node (at performance level I mean).
>> >>
>> >> Also, writing to different datacenters probably induce some network
>> >> latency.
>> >>
>> >> You should give the EC2 instance type (m1.xlarge / m1.large / ...) if
>> you
>> >> want some feedback about the 2500 w/s, and also give the mean size of
>> your
>> >> rows.
>> >>
>> >> Alain
>> >>
>> >>
>> >> 2013/2/13 <ka...@comcast.net>
>> >>
>> >>> Hello,
>> >>>      New member here, and I have (yet another) question on write
>> >>> performance.
>> >>>
>> >>> I'm using Apache Cassandra version 1.1, Python 2.7 and Pycassa 1.7.
>> >>>
>> >>> I have a cluster of 2 datacenters, each with 3 nodes, on AWS EC2 using
>> >>> EBS and the RandomPartioner. I'm writing to a column family in a
>> keyspace
>> >>> that's replicated to all nodes in both datacenters, with a
>> consistency level
>> >>> of LOCAL_QUORUM.
>> >>>
>> >>> I'm seeing write performance of around 2500 rows per second.
>> >>>
>> >>> Is this in the ballpark for this kind of configuration?
>> >>>
>> >>> Thanks in advance.
>> >>>
>> >>> Ken....
>> >>>
>> >>
>> >
>> >
>> >
>> > --
>> > Tyler Hobbs
>> > DataStax
>>
>
>

Re: Write performance expectations...

Posted by ka...@comcast.net.
Alain, 
I found out that the client node is an m1.small, and the cassandra nodes are m1.large. 

This is what is contained in each row: {dev1-dc1r-redir-0.unica.net/B9tk: {batchID: 2486272}}. Not a whole lot of data. 



If you don't use EBS, how is data persistence then maintained in the event that an instance goes down for whatever reason? 

Ken.... 
----- Original Message -----
From: "Alain RODRIGUEZ" <ar...@gmail.com> 
To: user@cassandra.apache.org 
Sent: Thursday, February 14, 2013 8:34:06 AM 
Subject: Re: Write performance expectations... 


Hi Ken, 


You really should take a look at my first answer... and give us more information on the size of your inserts, the type of EC2 you are using at least. You should also consider using Instance store and not EBS. Well, look at all these things I already told you. 


Alain 



2013/2/14 Peter Lin < woolfel@gmail.com > 


it could be the instances are IO limited. 

I've been running benchmarks with Cassandra 1.1.9 the last 2 weeks on 
a AMD FX 8 core with 32GB of ram. 

with 24 threads I get roughly 20K inserts per second. each insert is 
only about 100-150 bytes. 



On Thu, Feb 14, 2013 at 8:07 AM, < kadey@comcast.net > wrote: 
> Using multithreading, inserting 2000 per thread, resulted in no throughput 
> increase. Each thread is taking about 4 seconds per, indicating a bottleneck 
> elsewhere. 
> 
> Ken.... 
> 
> ________________________________ 
> From: "Tyler Hobbs" < tyler@datastax.com > 
> To: user@cassandra.apache.org 
> Sent: Wednesday, February 13, 2013 11:06:30 AM 
> 
> Subject: Re: Write performance expectations... 
> 
> 2500 inserts per second is about what a single python thread using pycassa 
> can do against a local node. Are you using multiple threads for the 
> inserts? Multiple processes? 
> 
> 
> On Wed, Feb 13, 2013 at 8:21 AM, Alain RODRIGUEZ < arodrime@gmail.com > wrote: 
>> 
>> Is there a particular reason for you to use EBS ? Instance Store are 
>> recommended because they improve performances by reducing the I/O 
>> throttling. 
>> 
>> An other thing you should be aware of is that replicating the data to all 
>> node reduce your performance, it is more or less like if you had only one 
>> node (at performance level I mean). 
>> 
>> Also, writing to different datacenters probably induce some network 
>> latency. 
>> 
>> You should give the EC2 instance type (m1.xlarge / m1.large / ...) if you 
>> want some feedback about the 2500 w/s, and also give the mean size of your 
>> rows. 
>> 
>> Alain 
>> 
>> 
>> 2013/2/13 < kadey@comcast.net > 
>> 
>>> Hello, 
>>> New member here, and I have (yet another) question on write 
>>> performance. 
>>> 
>>> I'm using Apache Cassandra version 1.1, Python 2.7 and Pycassa 1.7. 
>>> 
>>> I have a cluster of 2 datacenters, each with 3 nodes, on AWS EC2 using 
>>> EBS and the RandomPartioner. I'm writing to a column family in a keyspace 
>>> that's replicated to all nodes in both datacenters, with a consistency level 
>>> of LOCAL_QUORUM. 
>>> 
>>> I'm seeing write performance of around 2500 rows per second. 
>>> 
>>> Is this in the ballpark for this kind of configuration? 
>>> 
>>> Thanks in advance. 
>>> 
>>> Ken.... 
>>> 
>> 
> 
> 
> 
> -- 
> Tyler Hobbs 
> DataStax 




Re: Write performance expectations...

Posted by Alain RODRIGUEZ <ar...@gmail.com>.
Hi Ken,

You really should take a look at my first answer... and give us more
information on the size of your inserts, the type of EC2 you are using at
least. You should also consider using Instance store and not EBS. Well,
look at all these things I already told you.

Alain


2013/2/14 Peter Lin <wo...@gmail.com>

> it could be the instances are IO limited.
>
> I've been running benchmarks with Cassandra 1.1.9 the last 2 weeks on
> a AMD FX 8 core with 32GB of ram.
>
> with 24 threads I get roughly 20K inserts per second. each insert is
> only about 100-150 bytes.
>
> On Thu, Feb 14, 2013 at 8:07 AM,  <ka...@comcast.net> wrote:
> > Using multithreading, inserting 2000 per thread, resulted in no
> throughput
> > increase. Each thread is taking about 4 seconds per, indicating a
> bottleneck
> > elsewhere.
> >
> > Ken....
> >
> > ________________________________
> > From: "Tyler Hobbs" <ty...@datastax.com>
> > To: user@cassandra.apache.org
> > Sent: Wednesday, February 13, 2013 11:06:30 AM
> >
> > Subject: Re: Write performance expectations...
> >
> > 2500 inserts per second is about what a single python thread using
> pycassa
> > can do against a local node.  Are you using multiple threads for the
> > inserts? Multiple processes?
> >
> >
> > On Wed, Feb 13, 2013 at 8:21 AM, Alain RODRIGUEZ <ar...@gmail.com>
> wrote:
> >>
> >> Is there a particular reason for you to use EBS ? Instance Store are
> >> recommended because they improve performances by reducing the I/O
> >> throttling.
> >>
> >> An other thing you should be aware of is that replicating the data to
> all
> >> node reduce your performance, it is more or less like if you had only
> one
> >> node (at performance level I mean).
> >>
> >> Also, writing to different datacenters probably induce some network
> >> latency.
> >>
> >> You should give the EC2 instance type (m1.xlarge / m1.large / ...) if
> you
> >> want some feedback about the 2500 w/s, and also give the mean size of
> your
> >> rows.
> >>
> >> Alain
> >>
> >>
> >> 2013/2/13 <ka...@comcast.net>
> >>
> >>> Hello,
> >>>      New member here, and I have (yet another) question on write
> >>> performance.
> >>>
> >>> I'm using Apache Cassandra version 1.1, Python 2.7 and Pycassa 1.7.
> >>>
> >>> I have a cluster of 2 datacenters, each with 3 nodes, on AWS EC2 using
> >>> EBS and the RandomPartioner. I'm writing to a column family in a
> keyspace
> >>> that's replicated to all nodes in both datacenters, with a consistency
> level
> >>> of LOCAL_QUORUM.
> >>>
> >>> I'm seeing write performance of around 2500 rows per second.
> >>>
> >>> Is this in the ballpark for this kind of configuration?
> >>>
> >>> Thanks in advance.
> >>>
> >>> Ken....
> >>>
> >>
> >
> >
> >
> > --
> > Tyler Hobbs
> > DataStax
>

Re: Write performance expectations...

Posted by Peter Lin <wo...@gmail.com>.
it could be the instances are IO limited.

I've been running benchmarks with Cassandra 1.1.9 the last 2 weeks on
a AMD FX 8 core with 32GB of ram.

with 24 threads I get roughly 20K inserts per second. each insert is
only about 100-150 bytes.

On Thu, Feb 14, 2013 at 8:07 AM,  <ka...@comcast.net> wrote:
> Using multithreading, inserting 2000 per thread, resulted in no throughput
> increase. Each thread is taking about 4 seconds per, indicating a bottleneck
> elsewhere.
>
> Ken....
>
> ________________________________
> From: "Tyler Hobbs" <ty...@datastax.com>
> To: user@cassandra.apache.org
> Sent: Wednesday, February 13, 2013 11:06:30 AM
>
> Subject: Re: Write performance expectations...
>
> 2500 inserts per second is about what a single python thread using pycassa
> can do against a local node.  Are you using multiple threads for the
> inserts? Multiple processes?
>
>
> On Wed, Feb 13, 2013 at 8:21 AM, Alain RODRIGUEZ <ar...@gmail.com> wrote:
>>
>> Is there a particular reason for you to use EBS ? Instance Store are
>> recommended because they improve performances by reducing the I/O
>> throttling.
>>
>> An other thing you should be aware of is that replicating the data to all
>> node reduce your performance, it is more or less like if you had only one
>> node (at performance level I mean).
>>
>> Also, writing to different datacenters probably induce some network
>> latency.
>>
>> You should give the EC2 instance type (m1.xlarge / m1.large / ...) if you
>> want some feedback about the 2500 w/s, and also give the mean size of your
>> rows.
>>
>> Alain
>>
>>
>> 2013/2/13 <ka...@comcast.net>
>>
>>> Hello,
>>>      New member here, and I have (yet another) question on write
>>> performance.
>>>
>>> I'm using Apache Cassandra version 1.1, Python 2.7 and Pycassa 1.7.
>>>
>>> I have a cluster of 2 datacenters, each with 3 nodes, on AWS EC2 using
>>> EBS and the RandomPartioner. I'm writing to a column family in a keyspace
>>> that's replicated to all nodes in both datacenters, with a consistency level
>>> of LOCAL_QUORUM.
>>>
>>> I'm seeing write performance of around 2500 rows per second.
>>>
>>> Is this in the ballpark for this kind of configuration?
>>>
>>> Thanks in advance.
>>>
>>> Ken....
>>>
>>
>
>
>
> --
> Tyler Hobbs
> DataStax

Re: Write performance expectations...

Posted by ka...@comcast.net.
Using multithreading, inserting 2000 per thread, resulted in no throughput increase. Each thread is taking about 4 seconds per, indicating a bottleneck elsewhere. 




Ken.... 
----- Original Message -----
From: "Tyler Hobbs" <ty...@datastax.com> 
To: user@cassandra.apache.org 
Sent: Wednesday, February 13, 2013 11:06:30 AM 
Subject: Re: Write performance expectations... 


2500 inserts per second is about what a single python thread using pycassa can do against a local node. Are you using multiple threads for the inserts? Multiple processes? 




On Wed, Feb 13, 2013 at 8:21 AM, Alain RODRIGUEZ < arodrime@gmail.com > wrote: 



Is there a particular reason for you to use EBS ? Instance Store are recommended because they improve performances by reducing the I/O throttling. 


An other thing you should be aware of is that replicating the data to all node reduce your performance, it is more or less like if you had only one node (at performance level I mean). 


Also, writing to different datacenters probably induce some network latency. 


You should give the EC2 instance type (m1.xlarge / m1.large / ...) if you want some feedback about the 2500 w/s, and also give the mean size of your rows. 


Alain 



2013/2/13 < kadey@comcast.net > 



<blockquote>


Hello, 
New member here, and I have (yet another) question on write performance. 

I'm using Apache Cassandra version 1.1, Python 2.7 and Pycassa 1.7. 

I have a cluster of 2 datacenters, each with 3 nodes, on AWS EC2 using EBS and the RandomPartioner. I'm writing to a column family in a keyspace that's replicated to all nodes in both datacenters, with a consistency level of LOCAL_QUORUM. 

I'm seeing write performance of around 2500 rows per second. 

Is this in the ballpark for this kind of configuration? 

Thanks in advance. 




Ken.... 




</blockquote>



-- 
Tyler Hobbs 
DataStax 

Re: Write performance expectations...

Posted by ka...@comcast.net.
I'm not using multi-threads/processes. I'll try multi-threading to see if I get a boost. 

Thanks. 


Ken.... 


----- Original Message -----
From: "Tyler Hobbs" <ty...@datastax.com> 
To: user@cassandra.apache.org 
Sent: Wednesday, February 13, 2013 11:06:30 AM 
Subject: Re: Write performance expectations... 


2500 inserts per second is about what a single python thread using pycassa can do against a local node. Are you using multiple threads for the inserts? Multiple processes? 




On Wed, Feb 13, 2013 at 8:21 AM, Alain RODRIGUEZ < arodrime@gmail.com > wrote: 



Is there a particular reason for you to use EBS ? Instance Store are recommended because they improve performances by reducing the I/O throttling. 


An other thing you should be aware of is that replicating the data to all node reduce your performance, it is more or less like if you had only one node (at performance level I mean). 


Also, writing to different datacenters probably induce some network latency. 


You should give the EC2 instance type (m1.xlarge / m1.large / ...) if you want some feedback about the 2500 w/s, and also give the mean size of your rows. 


Alain 



2013/2/13 < kadey@comcast.net > 



<blockquote>


Hello, 
New member here, and I have (yet another) question on write performance. 

I'm using Apache Cassandra version 1.1, Python 2.7 and Pycassa 1.7. 

I have a cluster of 2 datacenters, each with 3 nodes, on AWS EC2 using EBS and the RandomPartioner. I'm writing to a column family in a keyspace that's replicated to all nodes in both datacenters, with a consistency level of LOCAL_QUORUM. 

I'm seeing write performance of around 2500 rows per second. 

Is this in the ballpark for this kind of configuration? 

Thanks in advance. 




Ken.... 




</blockquote>



-- 
Tyler Hobbs 
DataStax 

Re: Write performance expectations...

Posted by Tyler Hobbs <ty...@datastax.com>.
2500 inserts per second is about what a single python thread using pycassa
can do against a local node.  Are you using multiple threads for the
inserts? Multiple processes?


On Wed, Feb 13, 2013 at 8:21 AM, Alain RODRIGUEZ <ar...@gmail.com> wrote:

> Is there a particular reason for you to use EBS ? Instance Store
> are recommended because they improve performances by reducing the I/O
> throttling.
>
> An other thing you should be aware of is that replicating the data to all
> node reduce your performance, it is more or less like if you had only one
> node (at performance level I mean).
>
> Also, writing to different datacenters probably induce some network
> latency.
>
> You should give the EC2 instance type (m1.xlarge / m1.large / ...) if you
> want some feedback about the 2500 w/s, and also give the mean size of your
> rows.
>
> Alain
>
>
> 2013/2/13 <ka...@comcast.net>
>
> Hello,
>>      New member here, and I have (yet another) question on write
>> performance.
>>
>> I'm using Apache Cassandra version 1.1, Python 2.7 and Pycassa 1.7.
>>
>> I have a cluster of 2 datacenters, each with 3 nodes, on AWS EC2 using
>> EBS and the RandomPartioner. I'm writing to a column family in a keyspace
>> that's replicated to all nodes in both datacenters, with a consistency
>> level of LOCAL_QUORUM.
>>
>> I'm seeing write performance of around 2500 rows per second.
>>
>> Is this in the ballpark for this kind of configuration?
>>
>> Thanks in advance.
>>
>> Ken....
>>
>>
>


-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Write performance expectations...

Posted by ka...@comcast.net.
> Is there a particular reason for you to use EBS ? Instance Store are recommended because they improve performances by reducing the I/O throttling. 


> You should give the EC2 instance type (m1.xlarge / m1.large / ...) if you want some feedback about the 2500 w/s, and also give the mean size of your rows. 

The cluster was set up before I came onto the project, so I'm trying to get answers to these questions. 


> An other thing you should be aware of is that replicating the data to all node reduce your performance, it is more or less like if you had only one node (at performance level I mean). 


> Also, writing to different datacenters probably induce some network latency. 

In my understanding of how LOCAL_QUORUM works, the insert request is only waiting on (in my case) 2 nodes in the local datacenter to report a successful write . 


Ken.... 


----- Original Message -----
From: "Alain RODRIGUEZ" <ar...@gmail.com> 
To: user@cassandra.apache.org 
Sent: Wednesday, February 13, 2013 9:21:18 AM 
Subject: Re: Write performance expectations... 


Is there a particular reason for you to use EBS ? Instance Store are recommended because they improve performances by reducing the I/O throttling. 


An other thing you should be aware of is that replicating the data to all node reduce your performance, it is more or less like if you had only one node (at performance level I mean). 


Also, writing to different datacenters probably induce some network latency. 


You should give the EC2 instance type (m1.xlarge / m1.large / ...) if you want some feedback about the 2500 w/s, and also give the mean size of your rows. 


Alain 



2013/2/13 < kadey@comcast.net > 




Hello, 
New member here, and I have (yet another) question on write performance. 

I'm using Apache Cassandra version 1.1, Python 2.7 and Pycassa 1.7. 

I have a cluster of 2 datacenters, each with 3 nodes, on AWS EC2 using EBS and the RandomPartioner. I'm writing to a column family in a keyspace that's replicated to all nodes in both datacenters, with a consistency level of LOCAL_QUORUM. 

I'm seeing write performance of around 2500 rows per second. 

Is this in the ballpark for this kind of configuration? 

Thanks in advance. 




Ken.... 




Re: Write performance expectations...

Posted by Alain RODRIGUEZ <ar...@gmail.com>.
Is there a particular reason for you to use EBS ? Instance Store
are recommended because they improve performances by reducing the I/O
throttling.

An other thing you should be aware of is that replicating the data to all
node reduce your performance, it is more or less like if you had only one
node (at performance level I mean).

Also, writing to different datacenters probably induce some network latency.

You should give the EC2 instance type (m1.xlarge / m1.large / ...) if you
want some feedback about the 2500 w/s, and also give the mean size of your
rows.

Alain


2013/2/13 <ka...@comcast.net>

> Hello,
>      New member here, and I have (yet another) question on write
> performance.
>
> I'm using Apache Cassandra version 1.1, Python 2.7 and Pycassa 1.7.
>
> I have a cluster of 2 datacenters, each with 3 nodes, on AWS EC2 using EBS
> and the RandomPartioner. I'm writing to a column family in a keyspace
> that's replicated to all nodes in both datacenters, with a consistency
> level of LOCAL_QUORUM.
>
> I'm seeing write performance of around 2500 rows per second.
>
> Is this in the ballpark for this kind of configuration?
>
> Thanks in advance.
>
> Ken....
>
>