You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by osishkin osishkin <os...@gmail.com> on 2011/09/06 13:22:50 UTC

CQL and schema-less column family

Sorry for the newbie question but I failed to find a clear answer.
Can CQL be used to query a schema-less column family? can they be indexed?
That is, query for column names that do not necessarily exist in all
rows, and were not defined in advance when the column family was
created.

Thank you

Re: CQL and schema-less column family

Posted by osishkin osishkin <os...@gmail.com>.
Hi, sorry for re-posting, but it would be very helpful to get some
input on my previous post, so I'd know which direction to take.
So if anyone of the more experienced users here can help, it would be
greatly appreciated.

Thank you
Osi

---------- Forwarded message ----------
From: osishkin osishkin <os...@gmail.com>
Date: Wed, Sep 7, 2011 at 2:02 PM
Subject: Re: CQL and schema-less column family
To: user@cassandra.apache.org, eevans@acunu.com


Thank you very much Eric for your response.

Some follow-up questions come to mind:
1. What will be the performance hit for querying a coulmn name not
predefined in a schema? if it's not indexed, then I guess Cassandra
will have to iterate all rows,which will impose huge overhead.

2. Assuming my guess from the previous question is correct, then in
order to get decent performance I need to index a column.
Can you tell me if indexing a column name (not predefined in a schema)
has any performance impact?

I'm not yet sure whether CQL/secondary indexes is the right direction
for me, as opposed to manually-maintained indexes.
My application also requires range predicates columns with potentially
high numbers of unique values. From what I gather, both (range
predicates, high cardinality values) are very inefficient in
CQL/secondary indexes.
But I'd like to get the whole picture before deciding.

In my system each row may contain a lot of columns, common to only
part of the rows. If I understand correctly from the documentation,
every index is actually implemented as a new "hidden" column family.
This means that in my case if I use a secondary index for every column
name, I can quickly get a LOT of column families just to hold the
secondary indexes for all my rows.

My intuition says updating dozens of column families on every insert
would probably  be very bad performance-wise, in comparison with
manually updating a single "global" column family index of my own
(with multiple inserts)
Is this true?

Thank you

p.s.
Since I don't know whether a secondary index for a column already
exists, this means I have to check if such an index already exists
every time, and create it if not. Things seem to get even worse from
my point of view...:)

On Wed, Sep 7, 2011 at 12:34 PM, Eric Evans <ee...@acunu.com> wrote:
> On Tue, Sep 6, 2011 at 12:22 PM, osishkin osishkin <os...@gmail.com> wrote:
>> Sorry for the newbie question but I failed to find a clear answer.
>> Can CQL be used to query a schema-less column family? can they be indexed?
>> That is, query for column names that do not necessarily exist in all
>> rows, and were not defined in advance when the column family was
>> created.
>
> Absolutely, yes.
>
> If you don't create schema for columns, then their type will simply be
> the default for that column family.
>
> --
> Eric Evans
> Acunu | http://www.acunu.com | @acunu
>

Re: CQL and schema-less column family

Posted by osishkin osishkin <os...@gmail.com>.
Thank you very much Eric for your response.

Some follow-up questions come to mind:
1. What will be the performance hit for querying a coulmn name not
predefined in a schema? if it's not indexed, then I guess Cassandra
will have to iterate all rows,which will impose huge overhead.

2. Assuming my guess from the previous question is correct, then in
order to get decent performance I need to index a column.
Can you tell me if indexing a column name (not predefined in a schema)
has any performance impact?

I'm not yet sure whether CQL/secondary indexes is the right direction
for me, as opposed to manually-maintained indexes.
My application also requires range predicates columns with potentially
high numbers of unique values. From what I gather, both (range
predicates, high cardinality values) are very inefficient in
CQL/secondary indexes.
But I'd like to get the whole picture before deciding.

In my system each row may contain a lot of columns, common to only
part of the rows. If I understand correctly from the documentation,
every index is actually implemented as a new "hidden" column family.
This means that in my case if I use a secondary index for every column
name, I can quickly get a LOT of column families just to hold the
secondary indexes for all my rows.

My intuition says updating dozens of column families on every insert
would probably  be very bad performance-wise, in comparison with
manually updating a single "global" column family index of my own
(with multiple inserts)
Is this true?

Thank you

p.s.
Since I don't know whether a secondary index for a column already
exists, this means I have to check if such an index already exists
every time, and create it if not. Things seem to get even worse from
my point of view...:)

On Wed, Sep 7, 2011 at 12:34 PM, Eric Evans <ee...@acunu.com> wrote:
> On Tue, Sep 6, 2011 at 12:22 PM, osishkin osishkin <os...@gmail.com> wrote:
>> Sorry for the newbie question but I failed to find a clear answer.
>> Can CQL be used to query a schema-less column family? can they be indexed?
>> That is, query for column names that do not necessarily exist in all
>> rows, and were not defined in advance when the column family was
>> created.
>
> Absolutely, yes.
>
> If you don't create schema for columns, then their type will simply be
> the default for that column family.
>
> --
> Eric Evans
> Acunu | http://www.acunu.com | @acunu
>

Re: CQL and schema-less column family

Posted by Eric Evans <ee...@acunu.com>.
On Tue, Sep 6, 2011 at 12:22 PM, osishkin osishkin <os...@gmail.com> wrote:
> Sorry for the newbie question but I failed to find a clear answer.
> Can CQL be used to query a schema-less column family? can they be indexed?
> That is, query for column names that do not necessarily exist in all
> rows, and were not defined in advance when the column family was
> created.

Absolutely, yes.

If you don't create schema for columns, then their type will simply be
the default for that column family.

-- 
Eric Evans
Acunu | http://www.acunu.com | @acunu