You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Chamila Wijayarathna <cd...@gmail.com> on 2014/11/12 17:59:48 UTC

Cassandra sort using updatable query

Hello all,

I have a data set with attributes content and year. I want to put them in
to CF 'words' with attributes ('content','year','frequency'). The CF should
support following operations.

   - Frequency attribute of a column can be updated (i.e. - : can run query
   like "UPDATE words SET frequency = 2 WHERE content='abc' AND year=1990;),
   where clause should contain content and year
   - Should support select query like "Select content from words where year
   = 2010 ORDER BY frequency DESC LIMIT 10;" (where clause only has year)
   where results can be ordered using frequency

Is this kind of requirement can be fulfilled using Cassandra? What is the
CF structure and indexing I need to use here? What queries should I use to
create CF and in indexing?


Thank You!



-- 
*Chamila Dilshan Wijayarathna,*
SMIEEE, SMIESL,
Undergraduate,
Department of Computer Science and Engineering,
University of Moratuwa.

Re: Cassandra sort using updatable query

Posted by Chamila Wijayarathna <cd...@gmail.com>.
Hi Jonathan,

Thank you very much, it worked this way.

On Thu, Nov 13, 2014 at 12:07 AM, Jonathan Haddad <jo...@jonhaddad.com> wrote:

> With Cassandra you're going to want to model tables to meet the
> requirements of your queries instead of like a relational database where
> you build tables in 3NF then optimize after.
>
> For your optimized select query, your table (with caveat, see below) could
> start out as:
>
> create table words (
>   year int,
>   frequency int,
>   content text,
>   primary key (year, frequency, content) );
>
> You may want to maintain other tables as well for different types of
> select statements.
>
> Your UPDATE statement above won't work, you'll have to DELETE and INSERT,
> since you can't change the value of a clustering column.  If you don't know
> what your old frequency is ahead of time (to do the delete), you'll need to
> keep another table mapping content,year -> frequency.
>
> Now, the tricky part here is that the above model will limit the total
> number of partitions you've got to the number of years you're working with,
> and will not scale as the cluster increases in size.  Ideally you could
> bucket frequencies.  If that feels like too much work (it's starting to for
> me), this may be better suited to something like solr, elastic search, or
> DSE (cassandra + solr).
>
> Does that help?
>
> Jon
>
>
>
>
>
>
> On Wed Nov 12 2014 at 9:01:44 AM Chamila Wijayarathna <
> cdwijayarathna@gmail.com> wrote:
>
>> Hello all,
>>
>> I have a data set with attributes content and year. I want to put them in
>> to CF 'words' with attributes ('content','year','frequency'). The CF should
>> support following operations.
>>
>>    - Frequency attribute of a column can be updated (i.e. - : can run
>>    query like "UPDATE words SET frequency = 2 WHERE content='abc' AND
>>    year=1990;), where clause should contain content and year
>>    - Should support select query like "Select content from words where
>>    year = 2010 ORDER BY frequency DESC LIMIT 10;" (where clause only has year)
>>    where results can be ordered using frequency
>>
>> Is this kind of requirement can be fulfilled using Cassandra? What is the
>> CF structure and indexing I need to use here? What queries should I use to
>> create CF and in indexing?
>>
>>
>> Thank You!
>>
>>
>>
>> --
>> *Chamila Dilshan Wijayarathna,*
>> SMIEEE, SMIESL,
>> Undergraduate,
>> Department of Computer Science and Engineering,
>> University of Moratuwa.
>>
>


-- 
*Chamila Dilshan Wijayarathna,*
SMIEEE, SMIESL,
Undergraduate,
Department of Computer Science and Engineering,
University of Moratuwa.

Re: Cassandra sort using updatable query

Posted by Jonathan Haddad <jo...@jonhaddad.com>.
With Cassandra you're going to want to model tables to meet the
requirements of your queries instead of like a relational database where
you build tables in 3NF then optimize after.

For your optimized select query, your table (with caveat, see below) could
start out as:

create table words (
  year int,
  frequency int,
  content text,
  primary key (year, frequency, content) );

You may want to maintain other tables as well for different types of select
statements.

Your UPDATE statement above won't work, you'll have to DELETE and INSERT,
since you can't change the value of a clustering column.  If you don't know
what your old frequency is ahead of time (to do the delete), you'll need to
keep another table mapping content,year -> frequency.

Now, the tricky part here is that the above model will limit the total
number of partitions you've got to the number of years you're working with,
and will not scale as the cluster increases in size.  Ideally you could
bucket frequencies.  If that feels like too much work (it's starting to for
me), this may be better suited to something like solr, elastic search, or
DSE (cassandra + solr).

Does that help?

Jon






On Wed Nov 12 2014 at 9:01:44 AM Chamila Wijayarathna <
cdwijayarathna@gmail.com> wrote:

> Hello all,
>
> I have a data set with attributes content and year. I want to put them in
> to CF 'words' with attributes ('content','year','frequency'). The CF should
> support following operations.
>
>    - Frequency attribute of a column can be updated (i.e. - : can run
>    query like "UPDATE words SET frequency = 2 WHERE content='abc' AND
>    year=1990;), where clause should contain content and year
>    - Should support select query like "Select content from words where
>    year = 2010 ORDER BY frequency DESC LIMIT 10;" (where clause only has year)
>    where results can be ordered using frequency
>
> Is this kind of requirement can be fulfilled using Cassandra? What is the
> CF structure and indexing I need to use here? What queries should I use to
> create CF and in indexing?
>
>
> Thank You!
>
>
>
> --
> *Chamila Dilshan Wijayarathna,*
> SMIEEE, SMIESL,
> Undergraduate,
> Department of Computer Science and Engineering,
> University of Moratuwa.
>