You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Behroz Sikander <be...@gmail.com> on 2018/03/01 12:36:28 UTC

Cassandra filter with ordering query modeling

Hi,own vote
favorite
<https://stackoverflow.com/questions/49049760/cassandra-filter-with-ordering-query-modeling#>

I am new to Cassandra and I am trying to model a table in Cassandra. My
queries look like the following

Query #1: select * from TableA where Id = "123"Query #2: select * from
TableA where name="test" orderby startTime DESCQuery #3: select * from
TableA where state="running" orderby startTime DESC

I have been able to build the table for Query #1 which looks like

    val tableAStatement = SchemaBuilder.createTable("tableA").ifNotExists.
    addPartitionKey(Id, DataType.uuid).
    addColumn(Name, DataType.text).
    addColumn(StartTime, DataType.timestamp).
    addColumn(EndTime, DataType.timestamp).
    addColumn(State, DataType.text)

    session.execute(tableAStatement)

but for Query#2 and 3, I have tried many different things but failed.
Everytime, I get stuck in a different error from cassandra.

Considering the above queries, what would be the right table model? What is
the right way to model such queries.
Regards,
Behroz

Re: Cassandra filter with ordering query modeling

Posted by DuyHai Doan <do...@gmail.com>.

https://www.slideshare.net/doanduyhai/datastax-day-2016-cassandra-data-modeling-basics

On Thu, Mar 1, 2018 at 3:48 PM, Valentina Crisan <valentina.crisan@gmail.com
> wrote:

> 1) I created another table for Query#2/3. The partition Key was StartTime
> and clustering key was name. When I execute my queries, I get an exception
> saying that I need to ALLOW FILTERING.
>
> *Primary key(startTime,name) - the only queries that can be answered by
> this model are: where startTime = , where startTime IN (value1, value2),
> where startTime = and name = . Clustering keys support =,<,<=,>,>= while
> partition key supports = and IN operators. *
> *Your query was with name first and then startTime so in this case
> Cassandra is telling you that cannot answer this unless you use Allow
> Filtering at the end of the query = which basically is a disaster for
> performance since will bring all data in the coordinator and perform local
> filtering of the data. So, the model is not good for this query. *
>
> 2) I created a table with Name as partitioning key and startTime as
> clustering key. This way I was able to order the data in descending order
> based on startTime. But the problem was that if a row with same "name" was
> inserted, it was overriding the previously inserted row.
>
> *In Cassandra the primary key has 2 main purposes: to answer the queries
> and to provide uniqueness for the entries. This means that every variation
> of ( name, startTime) should be unique otherwise Cassandra will overwrite
> existing values ( actually C* doesn't read before write by default) and
> write the new values. In your case name in combination with different
> starttimes should provide unicity to the entries. If it's likely to have 2
> entries for 1 name and 1 startTime then you need to insert in the primary
> key another column that will provide the uniqueness. This column will be
> last clustering key and you will not need to involve it in queries - the
> role will be only for uniqueness. *
>
>
>  Valentina
>
>
> On Thu, Mar 1, 2018 at 3:26 PM, Behroz Sikander <be...@gmail.com>
> wrote:
>
>> Thank you for your response.
>>
>> I have been through the document and I have tried these techniques but I
>> failed to model my queries correctly.
>>
>> Forexample, I have already tried the following:
>> 1) I created another table for Query#2/3. The partition Key was StartTime
>> and clustering key was name. When I execute my queries, I get an exception
>> saying that I need to ALLOW FILTERING.
>> 2) I created a table with Name as partitioning key and startTime as
>> clustering key. This way I was able to order the data in descending order
>> based on startTime. But the problem was that if a row with same "name" was
>> inserted, it was overriding the previously inserted row.
>>
>> I am not sure how to model such queries.
>>
>>
>> On Thu, Mar 1, 2018 at 2:02 PM, Kyrylo Lebediev <Kyrylo_Lebediev@epam.com
>> > wrote:
>>
>>> Hi!
>>>
>>>
>>> Partition key (Id in your case) must be in WHERE cause if not using
>>> indexes (but indexes should be used carefully, not like in case of
>>> relational DB's). Also, only columns which belong to primary key ( =
>>> partition key + clustering key) can be used in WHERE in such cases. That's
>>> why 2nd and 3rd are failing.
>>> You might find this useful: http://cassandra.apache.org/do
>>> c/latest/cql/dml.html#the-where-clause
>>>
>>> There are several Cassandra handbooks available on Amazon, maybe it
>>> would be helpful for you to use some of them as starting point to
>>> understand aspects of Cassandra data[query] modeling.
>>>
>>>
>>> Regards,
>>>
>>> Kyrill
>>> ------------------------------
>>> *From:* Behroz Sikander <be...@gmail.com>
>>> *Sent:* Thursday, March 1, 2018 2:36:28 PM
>>> *To:* user@cassandra.apache.org
>>> *Subject:* Cassandra filter with ordering query modeling
>>>
>>> Hi,own vote
>>> favorite
>>> <https://stackoverflow.com/questions/49049760/cassandra-filter-with-ordering-query-modeling#>
>>>
>>> I am new to Cassandra and I am trying to model a table in Cassandra. My
>>> queries look like the following
>>>
>>> Query #1: select * from TableA where Id = "123"Query #2: select * from TableA where name="test" orderby startTime DESCQuery #3: select * from TableA where state="running" orderby startTime DESC
>>>
>>> I have been able to build the table for Query #1 which looks like
>>>
>>>     val tableAStatement = SchemaBuilder.createTable("tableA").ifNotExists.
>>>     addPartitionKey(Id, DataType.uuid).
>>>     addColumn(Name, DataType.text).
>>>     addColumn(StartTime, DataType.timestamp).
>>>     addColumn(EndTime, DataType.timestamp).
>>>     addColumn(State, DataType.text)
>>>
>>>     session.execute(tableAStatement)
>>>
>>> but for Query#2 and 3, I have tried many different things but failed.
>>> Everytime, I get stuck in a different error from cassandra.
>>>
>>> Considering the above queries, what would be the right table model? What
>>> is the right way to model such queries.
>>> Regards,
>>> Behroz
>>>
>>
>>
>

Re: Cassandra filter with ordering query modeling

Posted by Valentina Crisan <va...@gmail.com>.

1) I created another table for Query#2/3. The partition Key was StartTime
and clustering key was name. When I execute my queries, I get an exception
saying that I need to ALLOW FILTERING.

*Primary key(startTime,name) - the only queries that can be answered by
this model are: where startTime = , where startTime IN (value1, value2),
where startTime = and name = . Clustering keys support =,<,<=,>,>= while
partition key supports = and IN operators. *
*Your query was with name first and then startTime so in this case
Cassandra is telling you that cannot answer this unless you use Allow
Filtering at the end of the query = which basically is a disaster for
performance since will bring all data in the coordinator and perform local
filtering of the data. So, the model is not good for this query. *

2) I created a table with Name as partitioning key and startTime as
clustering key. This way I was able to order the data in descending order
based on startTime. But the problem was that if a row with same "name" was
inserted, it was overriding the previously inserted row.

*In Cassandra the primary key has 2 main purposes: to answer the queries
and to provide uniqueness for the entries. This means that every variation
of ( name, startTime) should be unique otherwise Cassandra will overwrite
existing values ( actually C* doesn't read before write by default) and
write the new values. In your case name in combination with different
starttimes should provide unicity to the entries. If it's likely to have 2
entries for 1 name and 1 startTime then you need to insert in the primary
key another column that will provide the uniqueness. This column will be
last clustering key and you will not need to involve it in queries - the
role will be only for uniqueness. *

 Valentina

On Thu, Mar 1, 2018 at 3:26 PM, Behroz Sikander <be...@gmail.com> wrote:

> Thank you for your response.
>
> I have been through the document and I have tried these techniques but I
> failed to model my queries correctly.
>
> Forexample, I have already tried the following:
> 1) I created another table for Query#2/3. The partition Key was StartTime
> and clustering key was name. When I execute my queries, I get an exception
> saying that I need to ALLOW FILTERING.
> 2) I created a table with Name as partitioning key and startTime as
> clustering key. This way I was able to order the data in descending order
> based on startTime. But the problem was that if a row with same "name" was
> inserted, it was overriding the previously inserted row.
>
> I am not sure how to model such queries.
>
>
> On Thu, Mar 1, 2018 at 2:02 PM, Kyrylo Lebediev <Ky...@epam.com>
> wrote:
>
>> Hi!
>>
>>
>> Partition key (Id in your case) must be in WHERE cause if not using
>> indexes (but indexes should be used carefully, not like in case of
>> relational DB's). Also, only columns which belong to primary key ( =
>> partition key + clustering key) can be used in WHERE in such cases. That's
>> why 2nd and 3rd are failing.
>> You might find this useful: http://cassandra.apache.org/do
>> c/latest/cql/dml.html#the-where-clause
>>
>> There are several Cassandra handbooks available on Amazon, maybe it would
>> be helpful for you to use some of them as starting point to understand
>> aspects of Cassandra data[query] modeling.
>>
>>
>> Regards,
>>
>> Kyrill
>> ------------------------------
>> *From:* Behroz Sikander <be...@gmail.com>
>> *Sent:* Thursday, March 1, 2018 2:36:28 PM
>> *To:* user@cassandra.apache.org
>> *Subject:* Cassandra filter with ordering query modeling
>>
>> Hi,own vote
>> favorite
>> <https://stackoverflow.com/questions/49049760/cassandra-filter-with-ordering-query-modeling#>
>>
>> I am new to Cassandra and I am trying to model a table in Cassandra. My
>> queries look like the following
>>
>> Query #1: select * from TableA where Id = "123"Query #2: select * from TableA where name="test" orderby startTime DESCQuery #3: select * from TableA where state="running" orderby startTime DESC
>>
>> I have been able to build the table for Query #1 which looks like
>>
>>     val tableAStatement = SchemaBuilder.createTable("tableA").ifNotExists.
>>     addPartitionKey(Id, DataType.uuid).
>>     addColumn(Name, DataType.text).
>>     addColumn(StartTime, DataType.timestamp).
>>     addColumn(EndTime, DataType.timestamp).
>>     addColumn(State, DataType.text)
>>
>>     session.execute(tableAStatement)
>>
>> but for Query#2 and 3, I have tried many different things but failed.
>> Everytime, I get stuck in a different error from cassandra.
>>
>> Considering the above queries, what would be the right table model? What
>> is the right way to model such queries.
>> Regards,
>> Behroz
>>
>
>

Re: Cassandra filter with ordering query modeling

Posted by Behroz Sikander <be...@gmail.com>.

Thank you for your response.

I have been through the document and I have tried these techniques but I
failed to model my queries correctly.

Forexample, I have already tried the following:
1) I created another table for Query#2/3. The partition Key was StartTime
and clustering key was name. When I execute my queries, I get an exception
saying that I need to ALLOW FILTERING.
2) I created a table with Name as partitioning key and startTime as
clustering key. This way I was able to order the data in descending order
based on startTime. But the problem was that if a row with same "name" was
inserted, it was overriding the previously inserted row.

I am not sure how to model such queries.


On Thu, Mar 1, 2018 at 2:02 PM, Kyrylo Lebediev <Ky...@epam.com>
wrote:

> Hi!
>
>
> Partition key (Id in your case) must be in WHERE cause if not using
> indexes (but indexes should be used carefully, not like in case of
> relational DB's). Also, only columns which belong to primary key ( =
> partition key + clustering key) can be used in WHERE in such cases. That's
> why 2nd and 3rd are failing.
> You might find this useful: http://cassandra.apache.org/
> doc/latest/cql/dml.html#the-where-clause
>
> There are several Cassandra handbooks available on Amazon, maybe it would
> be helpful for you to use some of them as starting point to understand
> aspects of Cassandra data[query] modeling.
>
>
> Regards,
>
> Kyrill
> ------------------------------
> *From:* Behroz Sikander <be...@gmail.com>
> *Sent:* Thursday, March 1, 2018 2:36:28 PM
> *To:* user@cassandra.apache.org
> *Subject:* Cassandra filter with ordering query modeling
>
> Hi,own vote
> favorite
> <https://stackoverflow.com/questions/49049760/cassandra-filter-with-ordering-query-modeling#>
>
> I am new to Cassandra and I am trying to model a table in Cassandra. My
> queries look like the following
>
> Query #1: select * from TableA where Id = "123"Query #2: select * from TableA where name="test" orderby startTime DESCQuery #3: select * from TableA where state="running" orderby startTime DESC
>
> I have been able to build the table for Query #1 which looks like
>
>     val tableAStatement = SchemaBuilder.createTable("tableA").ifNotExists.
>     addPartitionKey(Id, DataType.uuid).
>     addColumn(Name, DataType.text).
>     addColumn(StartTime, DataType.timestamp).
>     addColumn(EndTime, DataType.timestamp).
>     addColumn(State, DataType.text)
>
>     session.execute(tableAStatement)
>
> but for Query#2 and 3, I have tried many different things but failed.
> Everytime, I get stuck in a different error from cassandra.
>
> Considering the above queries, what would be the right table model? What
> is the right way to model such queries.
> Regards,
> Behroz
>

Re: Cassandra filter with ordering query modeling

Posted by Kyrylo Lebediev <Ky...@epam.com>.

Hi!

Partition key (Id in your case) must be in WHERE cause if not using indexes (but indexes should be used carefully, not like in case of relational DB's). Also, only columns which belong to primary key ( = partition key + clustering key) can be used in WHERE in such cases. That's why 2nd and 3rd are failing.
You might find this useful: http://cassandra.apache.org/doc/latest/cql/dml.html#the-where-clause

There are several Cassandra handbooks available on Amazon, maybe it would be helpful for you to use some of them as starting point to understand aspects of Cassandra data[query] modeling.


Regards,

Kyrill

________________________________
From: Behroz Sikander <be...@gmail.com>
Sent: Thursday, March 1, 2018 2:36:28 PM
To: user@cassandra.apache.org
Subject: Cassandra filter with ordering query modeling

Hi,own vote
favorite<https://stackoverflow.com/questions/49049760/cassandra-filter-with-ordering-query-modeling#>

I am new to Cassandra and I am trying to model a table in Cassandra. My queries look like the following

Query #1: select * from TableA where Id = "123"
Query #2: select * from TableA where name="test" orderby startTime DESC
Query #3: select * from TableA where state="running" orderby startTime DESC

I have been able to build the table for Query #1 which looks like

    val tableAStatement = SchemaBuilder.createTable("tableA").ifNotExists.
    addPartitionKey(Id, DataType.uuid).
    addColumn(Name, DataType.text).
    addColumn(StartTime, DataType.timestamp).
    addColumn(EndTime, DataType.timestamp).
    addColumn(State, DataType.text)

    session.execute(tableAStatement)

but for Query#2 and 3, I have tried many different things but failed. Everytime, I get stuck in a different error from cassandra.

Considering the above queries, what would be the right table model? What is the right way to model such queries.

Regards,
Behroz