You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Cyril Auburtin <cy...@gmail.com> on 2012/09/17 19:33:06 UTC

Cassandra supercolumns with same name

First sorry but I'm using an old version 0.7.10

and recently I've come up seeing this

=> (super_column=MYMED_embrun.maire@gmail.com,
     (column=permission, value=1, timestamp=1347895421475))
=> (super_column=MYMED_embrun.maire@gmail.com,
     (column=email, value=embrun.maire@gmail.com, timestamp=1347894698217)
     (column=id, value=MYMED_embrun.maire@gmail.com,
timestamp=1347894698217)
     (column=permission, value=0, timestamp=1347894698217)
     (column=profile, value=e24af776b4a025456bd50f55633b2419,
timestamp=1347894698217))

as a part of of a supercolumnFamily

I thought supercolumn was meant to be unique?

Re: Cassandra supercolumns with same name

Posted by Cyril Auburtin <cy...@gmail.com>.

Yep Tyler is right
It seems I have trailing *\u0000* (null) characters , (one column name is
MYMED_embrun.maire@gmail.com the other MYMED_embrun.maire@gmail.com\u0000\u0000
for example)

I'm trying to know at what point they are created...
Thx

2012/9/21 Tyler Hobbs <ty...@datastax.com>

> If you're seeing that in cassandra-cli, it's possible that there are some
> non-printable characters in the name that the cli doesn't display, like the
> NUL char (ascii 0).  I opened a ticket for that somewhere, but in the
> meantime, you may want to verify that they are identical with a real client.
>
>
> On Tue, Sep 18, 2012 at 4:03 AM, aaron morton <aa...@thelastpickle.com>wrote:
>
>> They are. Can you provide some more information ?
>>
>> What happens when you read the super column ?
>>
>> Cheers
>>
>>   -----------------
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 18/09/2012, at 5:33 AM, Cyril Auburtin <cy...@gmail.com>
>> wrote:
>>
>> First sorry but I'm using an old version 0.7.10
>>
>> and recently I've come up seeing this
>>
>> => (super_column=MYMED_embrun.maire@gmail.com,
>>      (column=permission, value=1, timestamp=1347895421475))
>> => (super_column=MYMED_embrun.maire@gmail.com,
>>      (column=email, value=embrun.maire@gmail.com,
>> timestamp=1347894698217)
>>      (column=id, value=MYMED_embrun.maire@gmail.com,
>> timestamp=1347894698217)
>>      (column=permission, value=0, timestamp=1347894698217)
>>      (column=profile, value=e24af776b4a025456bd50f55633b2419,
>> timestamp=1347894698217))
>>
>> as a part of of a supercolumnFamily
>>
>> I thought supercolumn was meant to be unique?
>>
>>
>>
>
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>
>
>

Re: Cassandra vs Couchbase benchmarks

Posted by aaron morton <aa...@thelastpickle.com>.

A few notes:

* +1 for missing RF and CL cassandra stats.
* Using stripped EBS for m1.xlarge is a bad choice, unless they are using provisioned IOPS. Which they do not say. 
* Cassandra JVM settings are *not* standard. It's a low new heap size and a larger than default heap size. 
* "memtable size" which I assume they mean memtable_total_space_in_mb should default to 1/3 the heap. They have doubled it. 
* I would expect the above non standard memory settings to result in increased GC activity and increased latency / reduced throughput

* They presented the facts and said "you decide who is a winner" LOLS

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 2/10/2012, at 4:48 AM, horschi <ho...@gmail.com> wrote:

> Hi Andy,
> 
> things I find odd:
> 
> - Replicacount=1 for mongo and couchdb. How is that a realistic benchmark? I always want at least 2 replicas for my data. Maybe thats just me.
> - On the Mongo Config slide they said they disabled journaling. Why do you disable all safety mechanisms that you would want in a production environment? Maybe they should have added /dev/null to their benchmark ;-)
> - I dont see the replicacount for Cassandra in the slides. Also CL is not specified. Imho the important stuff is missing in the cassandra configuration.
> - In the goals section it said "more data than RAM". But they only have 12GB data per node, with 15GB of RAM per node!
> 
> I am very interested in a recent cassandra-benchmark, but I find this benchmark very disappointing.
> 
> cheers,
> Christian
> 
> 
> On Mon, Oct 1, 2012 at 5:05 PM, Andy Cobley <ac...@computing.dundee.ac.uk> wrote:
> There are some interesting results in the benchmarks below:
> 
> http://www.slideshare.net/renatko/couchbase-performance-benchmarking
> 
> Without starting a flame war etc, I'm interested if these results should
> be considered "Fair and Balanced" or if the methodology is flawed in some
> way ? (for instance is the use of Amazon EC2 sensible for Cassandra
> deployment) ?
> 
> Andy
> 
> 
> 
> The University of Dundee is a Scottish Registered Charity, No. SC015096.
> 
> 
>

Re: Cassandra vs Couchbase benchmarks

Posted by horschi <ho...@gmail.com>.

Hi Andy,

things I find odd:

- Replicacount=1 for mongo and couchdb. How is that a realistic benchmark?
I always want at least 2 replicas for my data. Maybe thats just me.
- On the Mongo Config slide they said they disabled journaling. Why do you
disable all safety mechanisms that you would want in a production
environment? Maybe they should have added /dev/null to their benchmark ;-)
- I dont see the replicacount for Cassandra in the slides. Also CL is not
specified. Imho the important stuff is missing in the cassandra
configuration.
- In the goals section it said "more data than RAM". But they only have
12GB data per node, with 15GB of RAM per node!

I am very interested in a recent cassandra-benchmark, but I find this
benchmark very disappointing.

cheers,
Christian


On Mon, Oct 1, 2012 at 5:05 PM, Andy Cobley
<ac...@computing.dundee.ac.uk>wrote:

> There are some interesting results in the benchmarks below:
>
> http://www.slideshare.net/renatko/couchbase-performance-benchmarking
>
> Without starting a flame war etc, I'm interested if these results should
> be considered "Fair and Balanced" or if the methodology is flawed in some
> way ? (for instance is the use of Amazon EC2 sensible for Cassandra
> deployment) ?
>
> Andy
>
>
>
> The University of Dundee is a Scottish Registered Charity, No. SC015096.
>
>
>

Re: Cassandra vs Couchbase benchmarks

Posted by Peter Lin <wo...@gmail.com>.

Here is my own experience testing couchdb versus cassandra for an
internal application.

My test wasn't some dummy test case, it was realistic workloads that
is 95% write and 5% read. We insert data in batches to maximize
throughput. The critical thing for my use case was to answer "when
does the server crash and what happens?"

Each row was approximately 4-8K. I tested with a variety of batch
sizes and found that couchdb choked around 20K rows of data. When I
looked at couchdb logs, there was no indication of why it crashed or
what caused it. I tested cassandra up to 50K rows per batch and didn't
have time to reach the failure point.

The troubling issue for me is the lack of information on the cause of
the crash. All server crash at some point. Having details on why and
how it crashed to me is critical to debugging and improving
performance. Having zero indication for me suggests major
architectural flaws and issues in the design and implementation. I
haven't had time to investigate the real cause, since the logs didn't
give any hint. If the log had details, I would have atleast taken a
look at the source file and try to debug it.

On Mon, Oct 1, 2012 at 11:05 AM, Andy Cobley
<ac...@computing.dundee.ac.uk> wrote:
> There are some interesting results in the benchmarks below:
>
> http://www.slideshare.net/renatko/couchbase-performance-benchmarking
>
> Without starting a flame war etc, I'm interested if these results should
> be considered "Fair and Balanced" or if the methodology is flawed in some
> way ? (for instance is the use of Amazon EC2 sensible for Cassandra
> deployment) ?
>
> Andy
>
>
>
> The University of Dundee is a Scottish Registered Charity, No. SC015096.
>
>

Re: Cassandra vs Couchbase benchmarks

Posted by Michael Kjellman <mk...@barracuda.com>.

From their wiki: "The replication is an incremental one way process
involving two databases (a source and a destination).
The aim of the replication is that at the end of the process, all active
documents on the source database are also in the destination database and
all documents that were deleted in the source databases are also deleted
(if exists) on the destination database."

CouchDB =! Cassandra for replication. From everything I've seen it behaves
much more like mysql replication than anything else.

If anything they tested replication. Also they barely tuned Cassandra from
those slides so I wonder if compactions etc bit them.

Finally, there are some very high profile people using Cassandra on Amazon
EC2. My understanding is that disk IO is the biggest limitation here.

My 2 cents.

Best,
michael

On 10/1/12 8:05 AM, "Andy Cobley" <ac...@computing.dundee.ac.uk> wrote:

>There are some interesting results in the benchmarks below:
>
>http://www.slideshare.net/renatko/couchbase-performance-benchmarking
>
>Without starting a flame war etc, I'm interested if these results should
>be considered "Fair and Balanced" or if the methodology is flawed in some
>way ? (for instance is the use of Amazon EC2 sensible for Cassandra
>deployment) ?
>
>Andy
>
>
>
>The University of Dundee is a Scottish Registered Charity, No. SC015096.
>
>

'Like' us on Facebook for exclusive content and other resources on all Barracuda Networks solutions.
Visit http://barracudanetworks.com/facebook

Cassandra vs Couchbase benchmarks

Posted by Andy Cobley <ac...@computing.dundee.ac.uk>.

There are some interesting results in the benchmarks below:

http://www.slideshare.net/renatko/couchbase-performance-benchmarking

Without starting a flame war etc, I'm interested if these results should
be considered "Fair and Balanced" or if the methodology is flawed in some
way ? (for instance is the use of Amazon EC2 sensible for Cassandra
deployment) ?

Andy



The University of Dundee is a Scottish Registered Charity, No. SC015096.

Re: Cassandra supercolumns with same name

Posted by Cyril Auburtin <cy...@gmail.com>.

Yep Tyler is right
It seems I have trailing *\u0000* (null) characters , (one column name is
MYMED_embrun.maire@gmail.com the other MYMED_embrun.maire@gmail.com\u0000\u0000
for example)

I'm trying to know at what point they are created...
Thx

2012/9/21 Tyler Hobbs <ty...@datastax.com>

> If you're seeing that in cassandra-cli, it's possible that there are some
> non-printable characters in the name that the cli doesn't display, like the
> NUL char (ascii 0).  I opened a ticket for that somewhere, but in the
> meantime, you may want to verify that they are identical with a real client.
>
>
> On Tue, Sep 18, 2012 at 4:03 AM, aaron morton <aa...@thelastpickle.com>wrote:
>
>> They are. Can you provide some more information ?
>>
>> What happens when you read the super column ?
>>
>> Cheers
>>
>>   -----------------
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 18/09/2012, at 5:33 AM, Cyril Auburtin <cy...@gmail.com>
>> wrote:
>>
>> First sorry but I'm using an old version 0.7.10
>>
>> and recently I've come up seeing this
>>
>> => (super_column=MYMED_embrun.maire@gmail.com,
>>      (column=permission, value=1, timestamp=1347895421475))
>> => (super_column=MYMED_embrun.maire@gmail.com,
>>      (column=email, value=embrun.maire@gmail.com,
>> timestamp=1347894698217)
>>      (column=id, value=MYMED_embrun.maire@gmail.com,
>> timestamp=1347894698217)
>>      (column=permission, value=0, timestamp=1347894698217)
>>      (column=profile, value=e24af776b4a025456bd50f55633b2419,
>> timestamp=1347894698217))
>>
>> as a part of of a supercolumnFamily
>>
>> I thought supercolumn was meant to be unique?
>>
>>
>>
>
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>
>
>

Re: Cassandra supercolumns with same name

Posted by Tyler Hobbs <ty...@datastax.com>.

If you're seeing that in cassandra-cli, it's possible that there are some
non-printable characters in the name that the cli doesn't display, like the
NUL char (ascii 0).  I opened a ticket for that somewhere, but in the
meantime, you may want to verify that they are identical with a real client.

On Tue, Sep 18, 2012 at 4:03 AM, aaron morton <aa...@thelastpickle.com>wrote:

> They are. Can you provide some more information ?
>
> What happens when you read the super column ?
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 18/09/2012, at 5:33 AM, Cyril Auburtin <cy...@gmail.com>
> wrote:
>
> First sorry but I'm using an old version 0.7.10
>
> and recently I've come up seeing this
>
> => (super_column=MYMED_embrun.maire@gmail.com,
>      (column=permission, value=1, timestamp=1347895421475))
> => (super_column=MYMED_embrun.maire@gmail.com,
>      (column=email, value=embrun.maire@gmail.com, timestamp=1347894698217)
>      (column=id, value=MYMED_embrun.maire@gmail.com,
> timestamp=1347894698217)
>      (column=permission, value=0, timestamp=1347894698217)
>      (column=profile, value=e24af776b4a025456bd50f55633b2419,
> timestamp=1347894698217))
>
> as a part of of a supercolumnFamily
>
> I thought supercolumn was meant to be unique?
>
>
>


-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: Cassandra supercolumns with same name

Posted by aaron morton <aa...@thelastpickle.com>.

They are. Can you provide some more information ? 

What happens when you read the super column ? 

Cheers
 
-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 18/09/2012, at 5:33 AM, Cyril Auburtin <cy...@gmail.com> wrote:

> First sorry but I'm using an old version 0.7.10
> 
> and recently I've come up seeing this
> 
> => (super_column=MYMED_embrun.maire@gmail.com,
>      (column=permission, value=1, timestamp=1347895421475))
> => (super_column=MYMED_embrun.maire@gmail.com,
>      (column=email, value=embrun.maire@gmail.com, timestamp=1347894698217)
>      (column=id, value=MYMED_embrun.maire@gmail.com, timestamp=1347894698217)
>      (column=permission, value=0, timestamp=1347894698217)
>      (column=profile, value=e24af776b4a025456bd50f55633b2419, timestamp=1347894698217))
> 
> as a part of of a supercolumnFamily
> 
> I thought supercolumn was meant to be unique?