You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cassandra.apache.org by Bingbing Liu <ru...@gmail.com> on 2010/03/12 06:36:10 UTC

wo did some test on cassandra ,but the result puzzled us

We did some test on on Cassandra, and the benchmark is from Section 7 of the BigTable paper �Bigtable: A Distributed Storage System for Structured Data�, the benchmark task includes: random write, random read, sequential write, and sequential read. The test results made us puzzled. We use a cluster of 5 nodes (each node has a 4 cores cpu , 4G memory).The data for test is a table with 4,000,000  records each of which is 1000 bytes. The test results are as follows:
Sequential write:  875124 ms
Sequential read:  1972588 ms
Random read:  43331738 ms
Random write:  20193484 ms
We wondered why the speed of sequential write are so faster than the speed of sequential read, and why the speed of sequential write are so faster than that of random write? We think that the speed of read should be faster than that of data write, but the results are just the opposite, would you please give us some explanations, thanks a lot!

2010-03-12 



Bingbing Liu 

Re: wo did some test on cassandra ,but the result puzzled us

Posted by Jonathan Ellis <jb...@gmail.com>.
why reads are slower than writes:
http://wiki.apache.org/cassandra/FAQ#reads_slower_writes

no idea on seq vs random.  i would not be surprised if there is a bug
in your test code.

On Fri, Mar 12, 2010 at 12:36 AM, Bingbing Liu <ru...@gmail.com> wrote:
> We did some test on on Cassandra, and the benchmark is from Section 7 of the BigTable paper “Bigtable: A Distributed Storage System for Structured Data”, the benchmark task includes: random write, random read, sequential write, and sequential read. The test results made us puzzled. We use a cluster of 5 nodes (each node has a 4 cores cpu , 4G memory).The data for test is a table with 4,000,000  records each of which is 1000 bytes. The test results are as follows:
> Sequential write:  875124 ms
> Sequential read:  1972588 ms
> Random read:  43331738 ms
> Random write:  20193484 ms
> We wondered why the speed of sequential write are so faster than the speed of sequential read, and why the speed of sequential write are so faster than that of random write? We think that the speed of read should be faster than that of data write, but the results are just the opposite, would you please give us some explanations, thanks a lot!
>
> 2010-03-12
>
>
>
> Bingbing Liu
>

Re: Re: wo did some test on cassandra ,but the result puzzled us

Posted by Jonathan Ellis <jb...@gmail.com>.
yes, this is a single-threaded benchmark so if getRandomRow is slow at
all, it is going to skew the hell out of your results :)

2010/3/12 Bingbing Liu <ru...@gmail.com>:
> the difference between se and random the test code is just how the key of each record is generated.
>
> the test code is :
>
>
> long totalSWriteTime = 0;
> for (int i = 0; i < totalRows; i++) {
> byte[] key = dg.getRandomRow();//when sequential write , we use i as the key
> byte[] data = dg.generateValue();
> long start = System.currentTimeMillis();
> client.insert("Keyspace1", new String(key), new ColumnPath(
> "Standard1", null, "data".getBytes("UTF-8")), data,timestamp, ConsistencyLevel.ONE);
> totalSWriteTime += (System.currentTimeMillis() - start);
>    if(i % 10000 == 0){
> System.out.println("Has write " + i);
>    }
> }
>
> is there something wrong?
> 2010-03-12
>
>
>
> Bingbing Liu
>
>
>
> 发件人: Jonathan Ellis
> 发送时间: 2010-03-12  13:40:40
> 收件人: cassandra-dev
> 抄送:
> 主题: Re: wo did some test on cassandra ,but the result puzzled us
>
> why reads are slower than writes:
> http://wiki.apache.org/cassandra/FAQ#reads_slower_writes
> no idea on seq vs random.  i would not be surprised if there is a bug
> in your test code.
> On Fri, Mar 12, 2010 at 12:36 AM, Bingbing Liu <ru...@gmail.com> wrote:
>> We did some test on on Cassandra, and the benchmark is from Section 7 of the BigTable paper "Bigtable: A Distributed Storage System for Structured Data", the benchmark task includes: random write, random read, sequential write, and sequential read. The test results made us puzzled. We use a cluster of 5 nodes (each node has a 4 cores cpu , 4G memory).The data for test is a table with 4,000,000  records each of which is 1000 bytes. The test results are as follows:
>> Sequential write:  875124 ms
>> Sequential read:  1972588 ms
>> Random read:  43331738 ms
>> Random write:  20193484 ms
>> We wondered why the speed of sequential write are so faster than the speed of sequential read, and why the speed of sequential write are so faster than that of random write? We think that the speed of read should be faster than that of data write, but the results are just the opposite, would you please give us some explanations, thanks a lot!
>>
>> 2010-03-12
>>
>>
>>
>> Bingbing Liu
>>
>

Re: Re: wo did some test on cassandra ,but the result puzzled us

Posted by Bingbing Liu <ru...@gmail.com>.
the difference between se and random the test code is just how the key of each record is generated.

the test code is :

 
long totalSWriteTime = 0;
for (int i = 0; i < totalRows; i++) {
byte[] key = dg.getRandomRow();//when sequential write , we use i as the key
byte[] data = dg.generateValue();
long start = System.currentTimeMillis();
client.insert("Keyspace1", new String(key), new ColumnPath(
"Standard1", null, "data".getBytes("UTF-8")), data,timestamp, ConsistencyLevel.ONE);
totalSWriteTime += (System.currentTimeMillis() - start);
    if(i % 10000 == 0){
System.out.println("Has write " + i);
    }
}

is there something wrong?
2010-03-12 



Bingbing Liu 



发件人: Jonathan Ellis 
发送时间: 2010-03-12  13:40:40 
收件人: cassandra-dev 
抄送: 
主题: Re: wo did some test on cassandra ,but the result puzzled us 
 
why reads are slower than writes:
http://wiki.apache.org/cassandra/FAQ#reads_slower_writes
no idea on seq vs random.  i would not be surprised if there is a bug
in your test code.
On Fri, Mar 12, 2010 at 12:36 AM, Bingbing Liu <ru...@gmail.com> wrote:
> We did some test on on Cassandra, and the benchmark is from Section 7 of the BigTable paper “Bigtable: A Distributed Storage System for Structured Data”, the benchmark task includes: random write, random read, sequential write, and sequential read. The test results made us puzzled. We use a cluster of 5 nodes (each node has a 4 cores cpu , 4G memory).The data for test is a table with 4,000,000  records each of which is 1000 bytes. The test results are as follows:
> Sequential write:  875124 ms
> Sequential read:  1972588 ms
> Random read:  43331738 ms
> Random write:  20193484 ms
> We wondered why the speed of sequential write are so faster than the speed of sequential read, and why the speed of sequential write are so faster than that of random write? We think that the speed of read should be faster than that of data write, but the results are just the opposite, would you please give us some explanations, thanks a lot!
>
> 2010-03-12
>
>
>
> Bingbing Liu
>

Re: Cassandra Demo/Tutorial Applications

Posted by Krishna Sankar <ks...@gmail.com>.
Thanks Guys for the response.

Agreed, I won't be able to do all for my talk - in fact I might defer a lot
of hands-on Cassandra to Eric's PM session.

My question on multiple machines and EC2 was more for Cassandra-873 where we
want to have a set of good hands-on tutorials; while much simpler than
actual production systems, still capture the essentials of a Cassandra
infrastructure. And this also can be a homework for the attendees.

Cheers
<k/>

On 3/12/10 Fri Mar 12, 10, "Jonathan Ellis" <jb...@gmail.com> wrote:

> Also http://aws.amazon.com/publicdatasets/.
> 
> On Fri, Mar 12, 2010 at 11:59 PM, Ian Holsman <ia...@holsman.net> wrote:
>> There are several large data sets on the net you could use to build. Demo
>> with.
>> Search logs, wikipedia, uk govt stuff
>> Dbpedia may be interesting as they have some of the stuff extracted out
>> 
>> 
>> ---
>> Sent from my phone
>> Ian Holsman - 703 879-3128
>> 
>> On 13/03/2010, at 4:46 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>> 
>>> On Fri, Mar 12, 2010 at 1:55 PM, Krishna Sankar <ks...@gmail.com>
>>> wrote:
>>>> 
>>>> I was looking at this from CASSANDRA-873 as well as hands-on homework (!)
>>>> for my OSCON tutorial. Have couple of questions. Would appreciate
>>>> insights:
>>>> 
>>>> A)  Cassandra-873 suggests Luenandra as one demo application
>>>> B)  Are there other ideas that will bring out the various aspects of
>>>> Cassandra ?
>>> 
>>> multi-user blog (single-user is too easy :)
>>> - extra credit: with full-text search using lucandra
>>> 
>>> discussion forum
>>> - also w/ FTS
>>> 
>>>> C)  What would be the goal of demo apps ? Tutorial to help folks learn
>>>> the
>>>> ins and outs of Cassandra ? Show case capabilities ? I think
>>>> Cassandra-873
>>>> belongs to the latter; Twissandra most probably belongs to the former.
>>> 
>>> I think you nailed it.
>>> 
>>>> D)  Hadoop on Cassandra might be a good demo/tutorial
>>> 
>>> Sure, I'll buy that.
>>> 
>>> I can't think of any standalone projects for that, but "compute a
>>> twissandra tag cloud" would be pretty cool.  (Might need to write a
>>> twissandra bot to load stuff in to make an interesting cloud. :)
>>> 
>>>> E)  How would one structure the infrastructure for the demo/tutorials ?
>>>> What
>>>> assumptions can we make in creating them ? As AMIs to be run in EC2 ?
>>> 
>>> I'd probably go with "virtualbox images" as being simpler for people
>>> who don't have an AWS key already.  (VB can read vmware player images,
>>> i think.  But there is no free vmware for OS X, so you'd want to check
>>> that before going w/ vmware format.)
>>> 
>>> Or just have people d/l cassandra and a configuration xml.  Probably
>>> easier than teaching people to use virtualbox who haven't before.
>>> 
>>>> Also
>>>> to be run on 2-3 local machines for folks who can spare some ? Or as
>>>> multiple processes - all in one machine ?
>>> 
>>> You're not going to have time to teach cluster management.  Keep it to 1.
>> 



Re: Cassandra Demo/Tutorial Applications

Posted by Jonathan Ellis <jb...@gmail.com>.
Also http://aws.amazon.com/publicdatasets/.

On Fri, Mar 12, 2010 at 11:59 PM, Ian Holsman <ia...@holsman.net> wrote:
> There are several large data sets on the net you could use to build. Demo
> with.
> Search logs, wikipedia, uk govt stuff
> Dbpedia may be interesting as they have some of the stuff extracted out
>
>
> ---
> Sent from my phone
> Ian Holsman - 703 879-3128
>
> On 13/03/2010, at 4:46 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>
>> On Fri, Mar 12, 2010 at 1:55 PM, Krishna Sankar <ks...@gmail.com>
>> wrote:
>>>
>>> I was looking at this from CASSANDRA-873 as well as hands-on homework (!)
>>> for my OSCON tutorial. Have couple of questions. Would appreciate
>>> insights:
>>>
>>> A)  Cassandra-873 suggests Luenandra as one demo application
>>> B)  Are there other ideas that will bring out the various aspects of
>>> Cassandra ?
>>
>> multi-user blog (single-user is too easy :)
>> - extra credit: with full-text search using lucandra
>>
>> discussion forum
>> - also w/ FTS
>>
>>> C)  What would be the goal of demo apps ? Tutorial to help folks learn
>>> the
>>> ins and outs of Cassandra ? Show case capabilities ? I think
>>> Cassandra-873
>>> belongs to the latter; Twissandra most probably belongs to the former.
>>
>> I think you nailed it.
>>
>>> D)  Hadoop on Cassandra might be a good demo/tutorial
>>
>> Sure, I'll buy that.
>>
>> I can't think of any standalone projects for that, but "compute a
>> twissandra tag cloud" would be pretty cool.  (Might need to write a
>> twissandra bot to load stuff in to make an interesting cloud. :)
>>
>>> E)  How would one structure the infrastructure for the demo/tutorials ?
>>> What
>>> assumptions can we make in creating them ? As AMIs to be run in EC2 ?
>>
>> I'd probably go with "virtualbox images" as being simpler for people
>> who don't have an AWS key already.  (VB can read vmware player images,
>> i think.  But there is no free vmware for OS X, so you'd want to check
>> that before going w/ vmware format.)
>>
>> Or just have people d/l cassandra and a configuration xml.  Probably
>> easier than teaching people to use virtualbox who haven't before.
>>
>>> Also
>>> to be run on 2-3 local machines for folks who can spare some ? Or as
>>> multiple processes - all in one machine ?
>>
>> You're not going to have time to teach cluster management.  Keep it to 1.
>

Re: Cassandra Demo/Tutorial Applications

Posted by Ian Holsman <ia...@holsman.net>.
There are several large data sets on the net you could use to build.  
Demo with.
Search logs, wikipedia, uk govt stuff
Dbpedia may be interesting as they have some of the stuff extracted out


---
Sent from my phone
Ian Holsman - 703 879-3128

On 13/03/2010, at 4:46 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> On Fri, Mar 12, 2010 at 1:55 PM, Krishna Sankar  
> <ks...@gmail.com> wrote:
>> I was looking at this from CASSANDRA-873 as well as hands-on  
>> homework (!)
>> for my OSCON tutorial. Have couple of questions. Would appreciate  
>> insights:
>>
>> A)  Cassandra-873 suggests Luenandra as one demo application
>> B)  Are there other ideas that will bring out the various aspects of
>> Cassandra ?
>
> multi-user blog (single-user is too easy :)
> - extra credit: with full-text search using lucandra
>
> discussion forum
> - also w/ FTS
>
>> C)  What would be the goal of demo apps ? Tutorial to help folks  
>> learn the
>> ins and outs of Cassandra ? Show case capabilities ? I think  
>> Cassandra-873
>> belongs to the latter; Twissandra most probably belongs to the  
>> former.
>
> I think you nailed it.
>
>> D)  Hadoop on Cassandra might be a good demo/tutorial
>
> Sure, I'll buy that.
>
> I can't think of any standalone projects for that, but "compute a
> twissandra tag cloud" would be pretty cool.  (Might need to write a
> twissandra bot to load stuff in to make an interesting cloud. :)
>
>> E)  How would one structure the infrastructure for the demo/ 
>> tutorials ? What
>> assumptions can we make in creating them ? As AMIs to be run in EC2 ?
>
> I'd probably go with "virtualbox images" as being simpler for people
> who don't have an AWS key already.  (VB can read vmware player images,
> i think.  But there is no free vmware for OS X, so you'd want to check
> that before going w/ vmware format.)
>
> Or just have people d/l cassandra and a configuration xml.  Probably
> easier than teaching people to use virtualbox who haven't before.
>
>> Also
>> to be run on 2-3 local machines for folks who can spare some ? Or as
>> multiple processes - all in one machine ?
>
> You're not going to have time to teach cluster management.  Keep it  
> to 1.

Re: Cassandra Demo/Tutorial Applications

Posted by Vick Khera <vi...@khera.org>.
On Sat, Mar 13, 2010 at 1:46 AM, Jonathan Ellis <jb...@gmail.com> wrote:
> I'd probably go with "virtualbox images" as being simpler for people
> who don't have an AWS key already.  (VB can read vmware player images,
> i think.  But there is no free vmware for OS X, so you'd want to check
> that before going w/ vmware format.)

VirtualBox will read VMWare "vmdk" disk image files.  It will not
import nor use the whole VM.  You have to make a new VM in VirtualBox
and attach it to the vmdk disk image and then it works just perfectly.
 I'd say the vmware format is just fine.  Just be sure to delete any
"snapshots" in your VMware before distributing it.

Re: Cassandra Demo/Tutorial Applications

Posted by Vick Khera <vi...@khera.org>.
On Sat, Mar 13, 2010 at 1:46 AM, Jonathan Ellis <jb...@gmail.com> wrote:
> I'd probably go with "virtualbox images" as being simpler for people
> who don't have an AWS key already.  (VB can read vmware player images,
> i think.  But there is no free vmware for OS X, so you'd want to check
> that before going w/ vmware format.)

VirtualBox will read VMWare "vmdk" disk image files.  It will not
import nor use the whole VM.  You have to make a new VM in VirtualBox
and attach it to the vmdk disk image and then it works just perfectly.
 I'd say the vmware format is just fine.  Just be sure to delete any
"snapshots" in your VMware before distributing it.

Re: Cassandra Demo/Tutorial Applications

Posted by Ian Holsman <ia...@holsman.net>.
There are several large data sets on the net you could use to build.  
Demo with.
Search logs, wikipedia, uk govt stuff
Dbpedia may be interesting as they have some of the stuff extracted out


---
Sent from my phone
Ian Holsman - 703 879-3128

On 13/03/2010, at 4:46 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> On Fri, Mar 12, 2010 at 1:55 PM, Krishna Sankar  
> <ks...@gmail.com> wrote:
>> I was looking at this from CASSANDRA-873 as well as hands-on  
>> homework (!)
>> for my OSCON tutorial. Have couple of questions. Would appreciate  
>> insights:
>>
>> A)  Cassandra-873 suggests Luenandra as one demo application
>> B)  Are there other ideas that will bring out the various aspects of
>> Cassandra ?
>
> multi-user blog (single-user is too easy :)
> - extra credit: with full-text search using lucandra
>
> discussion forum
> - also w/ FTS
>
>> C)  What would be the goal of demo apps ? Tutorial to help folks  
>> learn the
>> ins and outs of Cassandra ? Show case capabilities ? I think  
>> Cassandra-873
>> belongs to the latter; Twissandra most probably belongs to the  
>> former.
>
> I think you nailed it.
>
>> D)  Hadoop on Cassandra might be a good demo/tutorial
>
> Sure, I'll buy that.
>
> I can't think of any standalone projects for that, but "compute a
> twissandra tag cloud" would be pretty cool.  (Might need to write a
> twissandra bot to load stuff in to make an interesting cloud. :)
>
>> E)  How would one structure the infrastructure for the demo/ 
>> tutorials ? What
>> assumptions can we make in creating them ? As AMIs to be run in EC2 ?
>
> I'd probably go with "virtualbox images" as being simpler for people
> who don't have an AWS key already.  (VB can read vmware player images,
> i think.  But there is no free vmware for OS X, so you'd want to check
> that before going w/ vmware format.)
>
> Or just have people d/l cassandra and a configuration xml.  Probably
> easier than teaching people to use virtualbox who haven't before.
>
>> Also
>> to be run on 2-3 local machines for folks who can spare some ? Or as
>> multiple processes - all in one machine ?
>
> You're not going to have time to teach cluster management.  Keep it  
> to 1.

Re: Cassandra Demo/Tutorial Applications

Posted by Ronald Bradford <ro...@gmail.com>.
I collated a list of public data last year,  you can check out
http://ronaldbradford.com/blog/seeking-public-data-for-benchmarks-2009-08-28/

I use VirtualBox when on Mac. It's free and it's trivial to create your own
images.

On Sat, Mar 13, 2010 at 5:01 AM, Christopher Brind <
christopher.brind@googlemail.com> wrote:

>
>> > E)  How would one structure the infrastructure for the demo/tutorials ?
>> What
>> > assumptions can we make in creating them ? As AMIs to be run in EC2 ?
>>
>> I'd probably go with "virtualbox images" as being simpler for people
>> who don't have an AWS key already.  (VB can read vmware player images,
>> i think.  But there is no free vmware for OS X, so you'd want to check
>> that before going w/ vmware format.)
>>
>>
> VirtualBox runs on Mac just fine and from the user manual:
>
> VirtualBox also fully supports the popular and open VMDK container format
> that is used by many other virtualization products, in particular, by
> VMware.3
>
> ... so that should be OK for Mac.
>
> Cheers,
> Chris
>
>

Re: Cassandra Demo/Tutorial Applications

Posted by Christopher Brind <ch...@googlemail.com>.
>
>
> > E)  How would one structure the infrastructure for the demo/tutorials ?
> What
> > assumptions can we make in creating them ? As AMIs to be run in EC2 ?
>
> I'd probably go with "virtualbox images" as being simpler for people
> who don't have an AWS key already.  (VB can read vmware player images,
> i think.  But there is no free vmware for OS X, so you'd want to check
> that before going w/ vmware format.)
>
>
VirtualBox runs on Mac just fine and from the user manual:

VirtualBox also fully supports the popular and open VMDK container format
that is used by many other virtualization products, in particular, by
VMware.3

... so that should be OK for Mac.

Cheers,
Chris

Re: Cassandra Demo/Tutorial Applications

Posted by Jonathan Ellis <jb...@gmail.com>.
On Fri, Mar 12, 2010 at 1:55 PM, Krishna Sankar <ks...@gmail.com> wrote:
> I was looking at this from CASSANDRA-873 as well as hands-on homework (!)
> for my OSCON tutorial. Have couple of questions. Would appreciate insights:
>
> A)  Cassandra-873 suggests Luenandra as one demo application
> B)  Are there other ideas that will bring out the various aspects of
> Cassandra ?

multi-user blog (single-user is too easy :)
 - extra credit: with full-text search using lucandra

discussion forum
 - also w/ FTS

> C)  What would be the goal of demo apps ? Tutorial to help folks learn the
> ins and outs of Cassandra ? Show case capabilities ? I think Cassandra-873
> belongs to the latter; Twissandra most probably belongs to the former.

I think you nailed it.

> D)  Hadoop on Cassandra might be a good demo/tutorial

Sure, I'll buy that.

I can't think of any standalone projects for that, but "compute a
twissandra tag cloud" would be pretty cool.  (Might need to write a
twissandra bot to load stuff in to make an interesting cloud. :)

> E)  How would one structure the infrastructure for the demo/tutorials ? What
> assumptions can we make in creating them ? As AMIs to be run in EC2 ?

I'd probably go with "virtualbox images" as being simpler for people
who don't have an AWS key already.  (VB can read vmware player images,
i think.  But there is no free vmware for OS X, so you'd want to check
that before going w/ vmware format.)

Or just have people d/l cassandra and a configuration xml.  Probably
easier than teaching people to use virtualbox who haven't before.

> Also
> to be run on 2-3 local machines for folks who can spare some ? Or as
> multiple processes - all in one machine ?

You're not going to have time to teach cluster management.  Keep it to 1.

Re: Cassandra Demo/Tutorial Applications

Posted by Jonathan Ellis <jb...@gmail.com>.
On Fri, Mar 12, 2010 at 1:55 PM, Krishna Sankar <ks...@gmail.com> wrote:
> I was looking at this from CASSANDRA-873 as well as hands-on homework (!)
> for my OSCON tutorial. Have couple of questions. Would appreciate insights:
>
> A)  Cassandra-873 suggests Luenandra as one demo application
> B)  Are there other ideas that will bring out the various aspects of
> Cassandra ?

multi-user blog (single-user is too easy :)
 - extra credit: with full-text search using lucandra

discussion forum
 - also w/ FTS

> C)  What would be the goal of demo apps ? Tutorial to help folks learn the
> ins and outs of Cassandra ? Show case capabilities ? I think Cassandra-873
> belongs to the latter; Twissandra most probably belongs to the former.

I think you nailed it.

> D)  Hadoop on Cassandra might be a good demo/tutorial

Sure, I'll buy that.

I can't think of any standalone projects for that, but "compute a
twissandra tag cloud" would be pretty cool.  (Might need to write a
twissandra bot to load stuff in to make an interesting cloud. :)

> E)  How would one structure the infrastructure for the demo/tutorials ? What
> assumptions can we make in creating them ? As AMIs to be run in EC2 ?

I'd probably go with "virtualbox images" as being simpler for people
who don't have an AWS key already.  (VB can read vmware player images,
i think.  But there is no free vmware for OS X, so you'd want to check
that before going w/ vmware format.)

Or just have people d/l cassandra and a configuration xml.  Probably
easier than teaching people to use virtualbox who haven't before.

> Also
> to be run on 2-3 local machines for folks who can spare some ? Or as
> multiple processes - all in one machine ?

You're not going to have time to teach cluster management.  Keep it to 1.

Cassandra Demo/Tutorial Applications

Posted by Krishna Sankar <ks...@gmail.com>.
I was looking at this from CASSANDRA-873 as well as hands-on homework (!)
for my OSCON tutorial. Have couple of questions. Would appreciate insights:

A)  Cassandra-873 suggests Luenandra as one demo application
B)  Are there other ideas that will bring out the various aspects of
Cassandra ?
C)  What would be the goal of demo apps ? Tutorial to help folks learn the
ins and outs of Cassandra ? Show case capabilities ? I think Cassandra-873
belongs to the latter; Twissandra most probably belongs to the former.
D)  Hadoop on Cassandra might be a good demo/tutorial
E)  How would one structure the infrastructure for the demo/tutorials ? What
assumptions can we make in creating them ? As AMIs to be run in EC2 ? Also
to be run on 2-3 local machines for folks who can spare some ? Or as
multiple processes - all in one machine ? What is an optimum configuration
for learning and demo ? We need to make it simple (to reflect the domain)
but not simpler.
F)  Am looking for ideas from developers and users - hence the cross
posting. I hope apache mailer is smart enough to dedup - will find it soon
...

Cheers
<k/>    



Cassandra Demo/Tutorial Applications

Posted by Krishna Sankar <ks...@gmail.com>.
I was looking at this from CASSANDRA-873 as well as hands-on homework (!)
for my OSCON tutorial. Have couple of questions. Would appreciate insights:

A)  Cassandra-873 suggests Luenandra as one demo application
B)  Are there other ideas that will bring out the various aspects of
Cassandra ?
C)  What would be the goal of demo apps ? Tutorial to help folks learn the
ins and outs of Cassandra ? Show case capabilities ? I think Cassandra-873
belongs to the latter; Twissandra most probably belongs to the former.
D)  Hadoop on Cassandra might be a good demo/tutorial
E)  How would one structure the infrastructure for the demo/tutorials ? What
assumptions can we make in creating them ? As AMIs to be run in EC2 ? Also
to be run on 2-3 local machines for folks who can spare some ? Or as
multiple processes - all in one machine ? What is an optimum configuration
for learning and demo ? We need to make it simple (to reflect the domain)
but not simpler.
F)  Am looking for ideas from developers and users - hence the cross
posting. I hope apache mailer is smart enough to dedup - will find it soon
...

Cheers
<k/>    



Re: Re: wo did some test on cassandra ,but the result puzzled us

Posted by Bingbing Liu <ru...@gmail.com>.
ok,thx,i will do that

So, that is to say , what i observed is natural ?

2010-03-12 



Bingbing Liu 



发件人: Masood Mortazavi 
发送时间: 2010-03-12  15:57:49 
收件人: cassandra-dev 
抄送: 
主题: Re: wo did some test on cassandra ,but the result puzzled us 
 
Bingbing Liu,
On Fri, Mar 12, 2010 at 1:36 PM, Bingbing Liu <ru...@gmail.com> wrote:
> We did some test on on Cassandra, and the benchmark is from Section 7 of
> the BigTable paper “Bigtable: A Distributed Storage System for Structured
> Data”, the benchmark task includes: random write, random read, sequential
> write, and sequential read. The test results made us puzzled. We use a
> cluster of 5 nodes (each node has a 4 cores cpu , 4G memory).The data for
> test is a table with 4,000,000  records each of which is 1000 bytes. The
> test results are as follows:
> Sequential write:  875124 ms
> Sequential read:  1972588 ms
> Random read:  43331738 ms
> Random write:  20193484 ms
> We wondered why the speed of sequential write are so faster than the speed
> of sequential read, and why the speed of sequential write are so faster than
> that of random write? We think that the speed of read should be faster than
> that of data write, but the results are just the opposite, would you please
> give us some explanations, thanks a lot!
>
Please read the BigTable paper, carefully, again.
They have similar characteristics and describe why this is the case. I think
you'll find that behavior you observed is quite consistent with the theory
of it all (and reading the text to which Jonathan has pointed you, will
essentially give you the same reasons).
It is part and parcel of the storage architecture of "BigTable" type
systems.
- m.

Re: wo did some test on cassandra ,but the result puzzled us

Posted by Masood Mortazavi <ma...@gmail.com>.
Bingbing Liu,

On Fri, Mar 12, 2010 at 1:36 PM, Bingbing Liu <ru...@gmail.com> wrote:

> We did some test on on Cassandra, and the benchmark is from Section 7 of
> the BigTable paper “Bigtable: A Distributed Storage System for Structured
> Data”, the benchmark task includes: random write, random read, sequential
> write, and sequential read. The test results made us puzzled. We use a
> cluster of 5 nodes (each node has a 4 cores cpu , 4G memory).The data for
> test is a table with 4,000,000  records each of which is 1000 bytes. The
> test results are as follows:
> Sequential write:  875124 ms
> Sequential read:  1972588 ms
> Random read:  43331738 ms
> Random write:  20193484 ms
> We wondered why the speed of sequential write are so faster than the speed
> of sequential read, and why the speed of sequential write are so faster than
> that of random write? We think that the speed of read should be faster than
> that of data write, but the results are just the opposite, would you please
> give us some explanations, thanks a lot!
>


Please read the BigTable paper, carefully, again.

They have similar characteristics and describe why this is the case. I think
you'll find that behavior you observed is quite consistent with the theory
of it all (and reading the text to which Jonathan has pointed you, will
essentially give you the same reasons).

It is part and parcel of the storage architecture of "BigTable" type
systems.

- m.