You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Alexander Shutyaev <sh...@gmail.com> on 2013/08/30 09:50:40 UTC

mysterious 'column1' in cql describe

Hi all!

We have encountered the following problem. We create our column families
via hector like this:

ColumnFamilyDefinition cfdef = HFactory.createColumnFamilyDefinition(*
"mykeyspace"*, *"mycf"*);
cfdef.setColumnType(ColumnType.*STANDARD*);
cfdef.setComparatorType(ComparatorType.*UTF8TYPE*);
cfdef.setDefaultValidationClass(*"BytesType"*);
cfdef.setKeyValidationClass(*"UTF8Type"*);
cfdef.setReadRepairChance(0.1);
cfdef.setGcGraceSeconds(864000);
cfdef.setMinCompactionThreshold(4);
cfdef.setMaxCompactionThreshold(32);
cfdef.setReplicateOnWrite(*true*);
cfdef.setCompactionStrategy(*"SizeTieredCompactionStrategy"*);
Map<String, String> compressionOptions = *new* HashMap<String, String>();
compressionOptions.put(*"sstable_compression"*, *""*);
cfdef.setCompressionOptions(compressionOptions);
cluster.addColumnFamily(cfdef, *true*);

When we *describe *this column family via *cqlsh* we get this

CREATE TABLE "mycf" (
  key text,
  column1 text,
  value blob,
  PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
  bloom_filter_fp_chance=0.010000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=0.100000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={};

As you can see there is a mysterious *column1* and moreover it is added to
the primary key. We've thought it wrong so we've tried getting rid of it.
We've managed to do it by adding explicit column definitions like this:

BasicColumnDefinition cdef = new BasicColumnDefinition();
cdef.setName(StringSerializer.get().toByteBuffer(*"mycolumn"*));
cdef.setValidationClass(ComparatorType.*BYTESTYPE*.getTypeName());
cdef.setIndexType(ColumnIndexType.*CUSTOM*);
cfdef.addColumnDefinition(cDef);

After this the primary key was like

PRIMARY KEY (key)

The effect of this was *overwhelming* - we got a tremendous performance
improvement and according to stats, the key cache began working while
previously its hit ratio was close to zero.

My questions are

1) What is this all about? Is what we did right?
2) In this project we can provide explicit column definitions. But in
another project we have some column families where this is not possible
because column names are dynamic (based on timestamps). If what we did is
right - how can we adapt this solution to the dynamic column name case?

Re: mysterious 'column1' in cql describe

Posted by Sylvain Lebresne <sy...@datastax.com>.

> Why does the explicit definition of columns in a column family
> significantly improve performance and key cache hit ratio (the last one
> being almost zero when there are no explicit column definitions)?
>

It doesn't, not in itself at least. So something else has changed or
something is wrong in your comparison of before/after. But it's hard to say
without at least a minimum of information on how you actually observed such
"significant performance improvement" (which queries for instance).

As for the key cache hit rate, adding a column definition certainly have no
effect on it in itself. But defining a new 2ndary index might, and the code
to add the column you've provided does has a  setIndexType. Again, hard to
be definitive on that because the code you've show set a CUSTOM index type
without providing any indexOption, which is *invalid* (and rejected as so
by Cassandra). So either the code above is not complete, or it's not the
one you've used, or Hector is doing some weird stuff behind your back. In
any case, if index creation there has been, then *that* could easily
explain a before-after performance difference.

--
Sylvain



>
>
> 2013/8/30 Sylvain Lebresne <sy...@datastax.com>
>
>> The short story is that you're probably not up to date on how CQL and
>> thrift table definition relate to one another, and that may not be exactly
>> how you think it does. If you haven't done so, I'd suggest the reading of
>> http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows(should answer your "what about dynamic column name" case) and
>> http://www.datastax.com/dev/blog/thrift-to-cql3 (should help explain how
>> CQL3 interprets thrift table, and why your saw what you saw).
>>
>> --
>> Sylvain
>>
>>
>> On Fri, Aug 30, 2013 at 9:50 AM, Alexander Shutyaev <sh...@gmail.com>wrote:
>>
>>> Hi all!
>>>
>>> We have encountered the following problem. We create our column families
>>> via hector like this:
>>>
>>> ColumnFamilyDefinition cfdef = HFactory.createColumnFamilyDefinition(*
>>> "mykeyspace"*, *"mycf"*);
>>> cfdef.setColumnType(ColumnType.*STANDARD*);
>>> cfdef.setComparatorType(ComparatorType.*UTF8TYPE*);
>>> cfdef.setDefaultValidationClass(*"BytesType"*);
>>>  cfdef.setKeyValidationClass(*"UTF8Type"*);
>>> cfdef.setReadRepairChance(0.1);
>>> cfdef.setGcGraceSeconds(864000);
>>> cfdef.setMinCompactionThreshold(4);
>>> cfdef.setMaxCompactionThreshold(32);
>>> cfdef.setReplicateOnWrite(*true*);
>>> cfdef.setCompactionStrategy(*"SizeTieredCompactionStrategy"*);
>>> Map<String, String> compressionOptions = *new* HashMap<String,
>>> String>();
>>> compressionOptions.put(*"sstable_compression"*, *""*);
>>> cfdef.setCompressionOptions(compressionOptions);
>>> cluster.addColumnFamily(cfdef, *true*);
>>>
>>> When we *describe *this column family via *cqlsh* we get this
>>>
>>> CREATE TABLE "mycf" (
>>>   key text,
>>>   column1 text,
>>>   value blob,
>>>   PRIMARY KEY (key, column1)
>>> ) WITH COMPACT STORAGE AND
>>>   bloom_filter_fp_chance=0.010000 AND
>>>   caching='KEYS_ONLY' AND
>>>   comment='' AND
>>>   dclocal_read_repair_chance=0.000000 AND
>>>   gc_grace_seconds=864000 AND
>>>   read_repair_chance=0.100000 AND
>>>   replicate_on_write='true' AND
>>>   populate_io_cache_on_flush='false' AND
>>>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>>>   compression={};
>>>
>>> As you can see there is a mysterious *column1* and moreover it is added
>>> to the primary key. We've thought it wrong so we've tried getting rid of
>>> it. We've managed to do it by adding explicit column definitions like this:
>>>
>>> BasicColumnDefinition cdef = new BasicColumnDefinition();
>>> cdef.setName(StringSerializer.get().toByteBuffer(*"mycolumn"*));
>>> cdef.setValidationClass(ComparatorType.*BYTESTYPE*.getTypeName());
>>> cdef.setIndexType(ColumnIndexType.*CUSTOM*);
>>> cfdef.addColumnDefinition(cDef);
>>>
>>> After this the primary key was like
>>>
>>> PRIMARY KEY (key)
>>>
>>> The effect of this was *overwhelming* - we got a tremendous performance
>>> improvement and according to stats, the key cache began working while
>>> previously its hit ratio was close to zero.
>>>
>>> My questions are
>>>
>>> 1) What is this all about? Is what we did right?
>>> 2) In this project we can provide explicit column definitions. But in
>>> another project we have some column families where this is not possible
>>> because column names are dynamic (based on timestamps). If what we did is
>>> right - how can we adapt this solution to the dynamic column name case?
>>>
>>
>>
>

Re: mysterious 'column1' in cql describe

Posted by Alexander Shutyaev <sh...@gmail.com>.

Thanks, Sylvain! I'll read it most thoroughly but after a quick glance I
wish to repeat my another (implied) question that I believe will not be
answered in these articles.

Why does the explicit definition of columns in a column family
significantly improve performance and key cache hit ratio (the last one
being almost zero when there are no explicit column definitions)?


2013/8/30 Sylvain Lebresne <sy...@datastax.com>

> The short story is that you're probably not up to date on how CQL and
> thrift table definition relate to one another, and that may not be exactly
> how you think it does. If you haven't done so, I'd suggest the reading of
> http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows(should answer your "what about dynamic column name" case) and
> http://www.datastax.com/dev/blog/thrift-to-cql3 (should help explain how
> CQL3 interprets thrift table, and why your saw what you saw).
>
> --
> Sylvain
>
>
> On Fri, Aug 30, 2013 at 9:50 AM, Alexander Shutyaev <sh...@gmail.com>wrote:
>
>> Hi all!
>>
>> We have encountered the following problem. We create our column families
>> via hector like this:
>>
>> ColumnFamilyDefinition cfdef = HFactory.createColumnFamilyDefinition(*
>> "mykeyspace"*, *"mycf"*);
>> cfdef.setColumnType(ColumnType.*STANDARD*);
>> cfdef.setComparatorType(ComparatorType.*UTF8TYPE*);
>> cfdef.setDefaultValidationClass(*"BytesType"*);
>>  cfdef.setKeyValidationClass(*"UTF8Type"*);
>> cfdef.setReadRepairChance(0.1);
>> cfdef.setGcGraceSeconds(864000);
>> cfdef.setMinCompactionThreshold(4);
>> cfdef.setMaxCompactionThreshold(32);
>> cfdef.setReplicateOnWrite(*true*);
>> cfdef.setCompactionStrategy(*"SizeTieredCompactionStrategy"*);
>> Map<String, String> compressionOptions = *new* HashMap<String, String>();
>> compressionOptions.put(*"sstable_compression"*, *""*);
>> cfdef.setCompressionOptions(compressionOptions);
>> cluster.addColumnFamily(cfdef, *true*);
>>
>> When we *describe *this column family via *cqlsh* we get this
>>
>> CREATE TABLE "mycf" (
>>   key text,
>>   column1 text,
>>   value blob,
>>   PRIMARY KEY (key, column1)
>> ) WITH COMPACT STORAGE AND
>>   bloom_filter_fp_chance=0.010000 AND
>>   caching='KEYS_ONLY' AND
>>   comment='' AND
>>   dclocal_read_repair_chance=0.000000 AND
>>   gc_grace_seconds=864000 AND
>>   read_repair_chance=0.100000 AND
>>   replicate_on_write='true' AND
>>   populate_io_cache_on_flush='false' AND
>>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>>   compression={};
>>
>> As you can see there is a mysterious *column1* and moreover it is added
>> to the primary key. We've thought it wrong so we've tried getting rid of
>> it. We've managed to do it by adding explicit column definitions like this:
>>
>> BasicColumnDefinition cdef = new BasicColumnDefinition();
>> cdef.setName(StringSerializer.get().toByteBuffer(*"mycolumn"*));
>> cdef.setValidationClass(ComparatorType.*BYTESTYPE*.getTypeName());
>> cdef.setIndexType(ColumnIndexType.*CUSTOM*);
>> cfdef.addColumnDefinition(cDef);
>>
>> After this the primary key was like
>>
>> PRIMARY KEY (key)
>>
>> The effect of this was *overwhelming* - we got a tremendous performance
>> improvement and according to stats, the key cache began working while
>> previously its hit ratio was close to zero.
>>
>> My questions are
>>
>> 1) What is this all about? Is what we did right?
>> 2) In this project we can provide explicit column definitions. But in
>> another project we have some column families where this is not possible
>> because column names are dynamic (based on timestamps). If what we did is
>> right - how can we adapt this solution to the dynamic column name case?
>>
>
>

Re: mysterious 'column1' in cql describe

Posted by Sylvain Lebresne <sy...@datastax.com>.

The short story is that you're probably not up to date on how CQL and
thrift table definition relate to one another, and that may not be exactly
how you think it does. If you haven't done so, I'd suggest the reading of
http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows(should
answer your "what about dynamic column name" case) and
http://www.datastax.com/dev/blog/thrift-to-cql3 (should help explain how
CQL3 interprets thrift table, and why your saw what you saw).

--
Sylvain


On Fri, Aug 30, 2013 at 9:50 AM, Alexander Shutyaev <sh...@gmail.com>wrote:

> Hi all!
>
> We have encountered the following problem. We create our column families
> via hector like this:
>
> ColumnFamilyDefinition cfdef = HFactory.createColumnFamilyDefinition(*
> "mykeyspace"*, *"mycf"*);
> cfdef.setColumnType(ColumnType.*STANDARD*);
> cfdef.setComparatorType(ComparatorType.*UTF8TYPE*);
> cfdef.setDefaultValidationClass(*"BytesType"*);
> cfdef.setKeyValidationClass(*"UTF8Type"*);
> cfdef.setReadRepairChance(0.1);
> cfdef.setGcGraceSeconds(864000);
> cfdef.setMinCompactionThreshold(4);
> cfdef.setMaxCompactionThreshold(32);
> cfdef.setReplicateOnWrite(*true*);
> cfdef.setCompactionStrategy(*"SizeTieredCompactionStrategy"*);
> Map<String, String> compressionOptions = *new* HashMap<String, String>();
> compressionOptions.put(*"sstable_compression"*, *""*);
> cfdef.setCompressionOptions(compressionOptions);
> cluster.addColumnFamily(cfdef, *true*);
>
> When we *describe *this column family via *cqlsh* we get this
>
> CREATE TABLE "mycf" (
>   key text,
>   column1 text,
>   value blob,
>   PRIMARY KEY (key, column1)
> ) WITH COMPACT STORAGE AND
>   bloom_filter_fp_chance=0.010000 AND
>   caching='KEYS_ONLY' AND
>   comment='' AND
>   dclocal_read_repair_chance=0.000000 AND
>   gc_grace_seconds=864000 AND
>   read_repair_chance=0.100000 AND
>   replicate_on_write='true' AND
>   populate_io_cache_on_flush='false' AND
>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>   compression={};
>
> As you can see there is a mysterious *column1* and moreover it is added
> to the primary key. We've thought it wrong so we've tried getting rid of
> it. We've managed to do it by adding explicit column definitions like this:
>
> BasicColumnDefinition cdef = new BasicColumnDefinition();
> cdef.setName(StringSerializer.get().toByteBuffer(*"mycolumn"*));
> cdef.setValidationClass(ComparatorType.*BYTESTYPE*.getTypeName());
> cdef.setIndexType(ColumnIndexType.*CUSTOM*);
> cfdef.addColumnDefinition(cDef);
>
> After this the primary key was like
>
> PRIMARY KEY (key)
>
> The effect of this was *overwhelming* - we got a tremendous performance
> improvement and according to stats, the key cache began working while
> previously its hit ratio was close to zero.
>
> My questions are
>
> 1) What is this all about? Is what we did right?
> 2) In this project we can provide explicit column definitions. But in
> another project we have some column families where this is not possible
> because column names are dynamic (based on timestamps). If what we did is
> right - how can we adapt this solution to the dynamic column name case?
>