You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Tony Anecito <ad...@yahoo.com> on 2013/06/26 18:39:27 UTC

Creating an "Index" column...

Hi All,

I have a column family with multiple columns and when I try a where clause with one of the columns that is not the "key" column Cassandra gives me an error about it not being an index column.
So where can I find an example of creating a column family with one or more columns being "Indexed"? I googled last night and could not find an example quickly.

Also I am assuming a index column in Cassandra does not have to be unique?


Thanks,
-Tony

Re: Creating an "Index" column...

Posted by Tony Anecito <ad...@yahoo.com>.
Thanks for the info. There are other reasons and the size you mention is small compared to other data I have worked with. The speed and size of data and cost of license have to be taken into consideration which I am looking at. Also dynamic columns is of interest to me also.
 
I am just really starting to understand it and I agree with you comments it just depends upon your requirements.
 
Regards,
-Tony

From: Arthur Zubarev <ar...@aol.com>
To: Tony Anecito <ad...@yahoo.com> 
Cc: Robert Coli <rc...@eventbrite.com>; Users-Cassandra <us...@cassandra.apache.org> 
Sent: Wednesday, June 26, 2013 9:40 PM
Subject: Re: Creating an "Index" column...



Appreciate your thoughts Tony,

in our DW there are composite keys, 500K of them say per customer to produce a report for which the client program needs to page through the entire set collecting data as it pages through yet to probably another desktop db. 

At this point the purpose of having a NoSQL has been defeated.

On 06/26/2013 05:21 PM, Tony Anecito wrote:

Thanks Arthur.
>
>
>Interesting you think NoSQL does not fit into large volumes of data, That is what it is touted to do.
>I have heard PK's are needed but remember that is what the "key" column is for I thought and composite key support is there also.
>
>
>The only issue I see is the all that duplicate data and a need to keep it in sync. So for example if the movie title "Superman" changed to "Superman the Man of Steel" you have to go change all those duplicate values. An easy problem to solve but the data modeler has to get past that. lol
>
>
>Acid transactions is the other but I think then the supplier of info has to think about that one.
>
>
>I have response times in my RDMS of several hundred microseconds which is the really important requirement for me to keep that the same or better.
>
>
>Just some thoughts on the matter.
>-Tony
>
>
>
>From: Arthur Zubarev mailto:Arthur.Zubarev@Aol.com
>To: Tony Anecito mailto:adanecito@yahoo.com; Robert Coli mailto:rcoli@eventbrite.com; Users-Cassandra mailto:user@cassandra.apache.org 
>Sent: Wednesday, June 26, 2013 3:08 PM
>Subject: Re: Creating an "Index" column...
>
>
>
>Tony hi,
>
>Yes, in some scenarios (e.g. a DW), e.g. absence of proper PKs or indexes (just too hard to envision, you need to think of future queries 1st) getting thru large volumes of data makes NoSQL IMHO hard to fit in.
>
>But you have other choices:
>
>1) pagination or
>2) slice queries.
>
>Both of that is covered here:
>
>http://pkghosh.wordpress.com/2012/03/04/cassandra-range-query-made-simple/
>
>Hope that helps.
>
>/Arthur
>
>From: Tony Anecito 
>Sent: Wednesday, June 26, 2013 1:55 PM
>To: Robert Coli ; Users-Cassandra 
>Subject: Re: Creating an "Index" column...
>Hi Robert,
>
>Actually that is what I did. I did that in my RDMS data model. In Cassandra or NOSQL without join or nested selects I have to do two queries. Also, since batching is not supported on the server side which makes the performance worse.
>
>I just started learning Cassandra but I am learning fast and there are some challenges when moving to a new data model driven by these factors.
>
>Regards,
>-Tony
>
>
>
>
>From: Robert Coli mailto:rcoli@eventbrite.com
>To: user@cassandra.apache.org; Tony Anecito mailto:adanecito@yahoo.com 
>Sent: Wednesday, June 26, 2013 11:32 AM
>Subject: Re: Creating an "Index" column...
>
>
>On Wed, Jun 26, 2013 at 10:20 AM, Tony Anecito <ad...@yahoo.com> wrote:
>> Never mind I figured it out. I found it via a search for Secondary indexes.
>
>In general unless you actually need atomic update of the row and its
>secondary index, you are probably better off creating your own pseudo
>secondary index column family.
>
>=Rob
>
>
>
>
>


-- 

Regards,

Arthur

Re: Creating an "Index" column...

Posted by Arthur Zubarev <ar...@aol.com>.
Appreciate your thoughts Tony,

in our DW there are composite keys, 500K of them say per customer to 
produce a report for which the client program needs to page through the 
entire set collecting data as it pages through yet to probably another 
desktop db.

At this point the purpose of having a NoSQL has been defeated.

On 06/26/2013 05:21 PM, Tony Anecito wrote:
> Thanks Arthur.
>
> Interesting you think NoSQL does not fit into large volumes of data, 
> That is what it is touted to do.
> I have heard PK's are needed but remember that is what the "key" 
> column is for I thought and composite key support is there also.
>
> The only issue I see is the all that duplicate data and a need to keep 
> it in sync. So for example if the movie title "Superman" changed to 
> "Superman the Man of Steel" you have to go change all those duplicate 
> values. An easy problem to solve but the data modeler has to get past 
> that. lol
>
> Acid transactions is the other but I think then the supplier of info 
> has to think about that one.
>
> I have response times in my RDMS of several hundred microseconds which 
> is the really important requirement for me to keep that the same or 
> better.
>
> Just some thoughts on the matter.
> -Tony
>
> ------------------------------------------------------------------------
> *From:* Arthur Zubarev <Ar...@Aol.com>
> *To:* Tony Anecito <ad...@yahoo.com>; Robert Coli 
> <rc...@eventbrite.com>; Users-Cassandra <us...@cassandra.apache.org>
> *Sent:* Wednesday, June 26, 2013 3:08 PM
> *Subject:* Re: Creating an "Index" column...
>
> Tony hi,
> Yes, in some scenarios (e.g. a DW), e.g. absence of proper PKs or 
> indexes (just too hard to envision, you need to think of future 
> queries 1st) getting thru large volumes of data makes NoSQL IMHO hard 
> to fit in.
> But you have other choices:
> 1) pagination or
> 2) slice queries.
> Both of that is covered here:
> http://pkghosh.wordpress.com/2012/03/04/cassandra-range-query-made-simple/
> Hope that helps.
> /Arthur
> *From:* Tony Anecito <ma...@yahoo.com>
> *Sent:* Wednesday, June 26, 2013 1:55 PM
> *To:* Robert Coli <ma...@eventbrite.com> ; Users-Cassandra 
> <ma...@cassandra.apache.org>
> *Subject:* Re: Creating an "Index" column...
> Hi Robert,
>
> Actually that is what I did. I did that in my RDMS data model. In 
> Cassandra or NOSQL without join or nested selects I have to do two 
> queries. Also, since batching is not supported on the server side 
> which makes the performance worse.
>
> I just started learning Cassandra but I am learning fast and there are 
> some challenges when moving to a new data model driven by these factors.
>
> Regards,
> -Tony
>
> ------------------------------------------------------------------------
> *From:* Robert Coli <rc...@eventbrite.com>
> *To:* user@cassandra.apache.org; Tony Anecito <ad...@yahoo.com>
> *Sent:* Wednesday, June 26, 2013 11:32 AM
> *Subject:* Re: Creating an "Index" column...
>
> On Wed, Jun 26, 2013 at 10:20 AM, Tony Anecito <adanecito@yahoo.com 
> <ma...@yahoo.com>> wrote:
> > Never mind I figured it out. I found it via a search for Secondary 
> indexes.
>
> In general unless you actually need atomic update of the row and its
> secondary index, you are probably better off creating your own pseudo
> secondary index column family.
>
> =Rob
>
>
>
>


-- 

Regards,

Arthur


Re: Creating an "Index" column...

Posted by Tony Anecito <ad...@yahoo.com>.
Thanks Arthur.

Interesting you think NoSQL does not fit into large volumes of data, That is what it is touted to do.
I have heard PK's are needed but remember that is what the "key" column is for I thought and composite key support is there also.

The only issue I see is the all that duplicate data and a need to keep it in sync. So for example if the movie title "Superman" changed to "Superman the Man of Steel" you have to go change all those duplicate values. An easy problem to solve but the data modeler has to get past that. lol

Acid transactions is the other but I think then the supplier of info has to think about that one.

I have response times in my RDMS of several hundred microseconds which is the really important requirement for me to keep that the same or better.

Just some thoughts on the matter.
-Tony



________________________________
 From: Arthur Zubarev <Ar...@Aol.com>
To: Tony Anecito <ad...@yahoo.com>; Robert Coli <rc...@eventbrite.com>; Users-Cassandra <us...@cassandra.apache.org> 
Sent: Wednesday, June 26, 2013 3:08 PM
Subject: Re: Creating an "Index" column...
 


Tony hi,
 
Yes, in some scenarios (e.g. a DW), e.g. absence of proper PKs or indexes 
(just too hard to envision, you need to think of future queries 1st) getting 
thru large volumes of data makes NoSQL IMHO hard to fit in.
 
But you have other choices:
 
1) pagination or
2) slice queries.
 
Both of that is covered here:
 
http://pkghosh.wordpress.com/2012/03/04/cassandra-range-query-made-simple/
 
Hope that helps.
 
/Arthur 
From: Tony Anecito 
Sent: Wednesday, June 26, 2013 1:55 PM
To: Robert Coli ; Users-Cassandra 
Subject: Re: Creating an "Index" column...
  Hi 
Robert,

Actually that is what I did. I did that in my RDMS data model. In 
Cassandra or NOSQL without join or nested selects I have to do two queries. 
Also, since batching is not supported on the server side which makes the 
performance worse.

I just started learning Cassandra but I am learning 
fast and there are some challenges when moving to a new data model driven by 
these factors.

Regards,
-Tony


 

________________________________
 From: Robert Coli <rc...@eventbrite.com>
To: user@cassandra.apache.org; Tony Anecito 
<ad...@yahoo.com> 
Sent: Wednesday, June 26, 2013 11:32 
AM
Subject: Re: Creating an 
"Index" column...


On Wed, Jun 26, 2013 at 10:20 AM, Tony Anecito 
<ad...@yahoo.com> wrote:
> 
Never mind I figured it out. I found it via a search for Secondary 
indexes.

In general unless you actually need atomic update of the row and 
its
secondary index, you are probably better off creating your own 
pseudo
secondary index column 
family.

=Rob

Re: Creating an "Index" column...

Posted by Arthur Zubarev <Ar...@Aol.com>.
Tony hi,

Yes, in some scenarios (e.g. a DW), e.g. absence of proper PKs or indexes (just too hard to envision, you need to think of future queries 1st) getting thru large volumes of data makes NoSQL IMHO hard to fit in.

But you have other choices:

1) pagination or
2) slice queries.

Both of that is covered here:

http://pkghosh.wordpress.com/2012/03/04/cassandra-range-query-made-simple/

Hope that helps.

/Arthur

From: Tony Anecito 
Sent: Wednesday, June 26, 2013 1:55 PM
To: Robert Coli ; Users-Cassandra 
Subject: Re: Creating an "Index" column...

Hi Robert,

Actually that is what I did. I did that in my RDMS data model. In Cassandra or NOSQL without join or nested selects I have to do two queries. Also, since batching is not supported on the server side which makes the performance worse.

I just started learning Cassandra but I am learning fast and there are some challenges when moving to a new data model driven by these factors.

Regards,
-Tony





--------------------------------------------------------------------------------
From: Robert Coli <rc...@eventbrite.com>
To: user@cassandra.apache.org; Tony Anecito <ad...@yahoo.com> 
Sent: Wednesday, June 26, 2013 11:32 AM
Subject: Re: Creating an "Index" column...


On Wed, Jun 26, 2013 at 10:20 AM, Tony Anecito <ad...@yahoo.com> wrote:
> Never mind I figured it out. I found it via a search for Secondary indexes.

In general unless you actually need atomic update of the row and its
secondary index, you are probably better off creating your own pseudo
secondary index column family.

=Rob



Re: Creating an "Index" column...

Posted by Tony Anecito <ad...@yahoo.com>.
Hi Robert,

Actually that is what I did. I did that in my RDMS data model. In Cassandra or NOSQL without join or nested selects I have to do two queries. Also, since batching is not supported on the server side which makes the performance worse.

I just started learning Cassandra but I am learning fast and there are some challenges when moving to a new data model driven by these factors.

Regards,
-Tony




________________________________
 From: Robert Coli <rc...@eventbrite.com>
To: user@cassandra.apache.org; Tony Anecito <ad...@yahoo.com> 
Sent: Wednesday, June 26, 2013 11:32 AM
Subject: Re: Creating an "Index" column...
 

On Wed, Jun 26, 2013 at 10:20 AM, Tony Anecito <ad...@yahoo.com> wrote:
> Never mind I figured it out. I found it via a search for Secondary indexes.

In general unless you actually need atomic update of the row and its
secondary index, you are probably better off creating your own pseudo
secondary index column family.

=Rob

Re: Creating an "Index" column...

Posted by Robert Coli <rc...@eventbrite.com>.
On Wed, Jun 26, 2013 at 10:20 AM, Tony Anecito <ad...@yahoo.com> wrote:
> Never mind I figured it out. I found it via a search for Secondary indexes.

In general unless you actually need atomic update of the row and its
secondary index, you are probably better off creating your own pseudo
secondary index column family.

=Rob

Re: Creating an "Index" column...

Posted by Tony Anecito <ad...@yahoo.com>.
Never mind I figured it out. I found it via a search for Secondary indexes.

Regards,
-Tony




________________________________
 From: Tony Anecito <ad...@yahoo.com>
To: Users-Cassandra <us...@cassandra.apache.org> 
Sent: Wednesday, June 26, 2013 10:39 AM
Subject: Creating an "Index" column...
 


Hi All,

I have a column family with multiple columns and when I try a where clause with one of the columns that is not the "key" column Cassandra gives me an error about it not being an index column.
So where can I find an example of creating a column family with one or more columns being "Indexed"? I googled last night and could not find an example quickly.

Also I am assuming a index column in Cassandra does not have to be unique?


Thanks,
-Tony