You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Joseph Wonesh <jo...@sticknfind.com> on 2019/02/12 16:45:06 UTC

Group By Does Not Follow Clustering Order

Hello,

I have a materialized view defined by the following:

CREATE MATERIALIZED VIEW m_ps_project_policy_device0 AS
       SELECT policy_id, device_id, project_id, namespace,
metric_type, blufi_id, beacon_id, event_uuid, state, date_created,
policy_name, beacon_name, blufi_name, value, duration FROM
policy_state0
       WHERE policy_id IS NOT NULL AND device_id IS NOT NULL AND
project_id IS NOT NULL AND namespace IS NOT NULL AND metric_type IS
NOT NULL AND blufi_id IS NOT NULL AND beacon_id IS NOT NULL AND
event_uuid IS NOT NULL AND state IS NOT NULL AND date_created IS NOT
NULL
       PRIMARY KEY ((project_id), policy_id, device_id, date_created,
blufi_id, beacon_id, state, namespace, metric_type, event_uuid)
       WITH CLUSTERING ORDER BY (date_created DESC);


This view works fine if i run a query like the following:

SELECT * FROM m_ps_project_policy_device4 where project_id=1337 and
policy_id=7331 and device_id='1234567890' limit 1;

The result of this query gives me the most recent due to the date_created
desc clustering order.

However, this query does not behave as expected:

SELECT * FROM m_ps_project_policy_device4 where project_id=1337 and
policy_id=7331 and device_id='1234567890' group by policy_id, device_id
limit 1;

The result of this query gives me the FIRST record from the partition,
which is the OLDEST record due to the clustering order desc.

Is this a natural result due to my ordering? Would I need to use a view
that has order by ASC to achieve what I want to do using the built-in group
by aggregations? I am hoping there is a way to achieve what I want to do
(getting the most recent record for each of the <project_id, policy_id,
device_id> tuples using the built-in aggregation functions.

Thanks,
Joseph Wonesh

-- 
This message is private and confidential. If you have received message in 
error, please notify us and remove from your system. 

Re: Group By Does Not Follow Clustering Order

Posted by Joseph Wonesh <jo...@sticknfind.com>.
Kenneth,

Thank you for the reply, and sorry for my late response. I did not notice
this had a reply until just now!

To answer your questions:

In this statement from your email:
>
> SELECT * FROM m_ps_project_policy_device4 where project_id=1337 and
> policy_id=7331 and device_id='1234567890' group by policy_id, device_id
> limit 1;
>
>
>
> …why would you have the “group by policy_id, device_id” section at all
> when you are already doing “policy_id=7331 and device_id=’1234567890’” and
> returning one (limit 1)?
>

The clustering key for the materialized view includes far more than just
policy_id and device_id. Using group by allows me to collapse the
extraneous keys into just the keys I am interested in (policy_id and
device_id). The limit 1 is extraneous from my query generator and can be
removed.

Also, in your materialized view I think the “clustering order by” should
> have started with policy_id, so it would have been:   …WITH CLUSTERING
> ORDER BY(policy_id, device_id, date_created DESC);
>
> however that doesn’t seem very performant.
>

I ordered the clustering keys specifically to have date_created as the
first key so that I can (attempt to) do what was described in my question:
take the first result from a query which specifies the exact partition keys
and get the most recent record back. The issue now is that I seem to be
getting the oldest data first when using group by; this is what I would
like to solve -- hopefully without having to create yet another
materialized view with a different ordering.

On Thu, Feb 21, 2019 at 7:18 PM Kenneth Brotman
<ke...@yahoo.com.invalid> wrote:

> Hey Joseph,
>
>
>
> Also, in your materialized view I think the “clustering order by” should
> have started with policy_id, so it would have been:   …WITH CLUSTERING
> ORDER BY(policy_id, device_id, date_created DESC);
>
> however that doesn’t seem very performant.
>
>
>
> Maybe in the materialized view the primary key should have a compound
> partition key of (project_id, policy_id, device_id).
>
>
>
>
>
> *From:* Kenneth Brotman [mailto:kenbrotman@yahoo.com.INVALID]
> *Sent:* Thursday, February 21, 2019 3:38 PM
> *To:* user@cassandra.apache.org
> *Subject:* RE: Group By Does Not Follow Clustering Order
>
>
>
> Joseph,
>
>
>
> In this statement from your email:
>
> SELECT * FROM m_ps_project_policy_device4 where project_id=1337 and
> policy_id=7331 and device_id='1234567890' group by policy_id, device_id
> limit 1;
>
>
>
> …why would you have the “group by policy_id, device_id” section at all
> when you are already doing “policy_id=7331 and device_id=’1234567890’” and
> returning one (limit 1)?
>
>
>
> Kenneth Brotman
>
>
>
> *From:* Joseph Wonesh [mailto:joseph.wonesh@sticknfind.com]
> *Sent:* Thursday, February 21, 2019 10:39 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Group By Does Not Follow Clustering Order
>
>
>
> Hi all,
>
>
>
> I am bumping this email hoping that it can reach a larger audience.
>
>
>
> Thanks,
>
> Joseph
>
>
>
> On Tue, Feb 12, 2019 at 11:45 AM Joseph Wonesh <
> joseph.wonesh@sticknfind.com> wrote:
>
> Hello,
>
>
>
> I have a materialized view defined by the following:
>
>
>
> *CREATE *MATERIALIZED VIEW m_ps_project_policy_device0
> *AS       *SELECT policy_id, device_id, project_id, namespace, metric_type, blufi_id, beacon_id, event_uuid, state, date_created, policy_name, beacon_name, blufi_name, value, duration *FROM *policy_state0
>        *WHERE *policy_id *IS NOT NULL AND *device_id *IS NOT NULL AND *project_id *IS NOT NULL AND *namespace *IS NOT NULL AND *metric_type *IS NOT NULL AND *blufi_id *IS NOT NULL AND *beacon_id *IS NOT NULL AND *event_uuid *IS NOT NULL AND *state *IS NOT NULL AND *date_created
> *IS NOT NULL       *PRIMARY KEY ((project_id), policy_id, device_id, date_created, blufi_id, beacon_id, state, namespace, metric_type, event_uuid)
>        *WITH *CLUSTERING *ORDER BY *(date_created *DESC*);
>
>
>
> This view works fine if i run a query like the following:
>
>
>
> SELECT * FROM m_ps_project_policy_device4 where project_id=1337 and
> policy_id=7331 and device_id='1234567890' limit 1;
>
>
>
> The result of this query gives me the most recent due to the date_created
> desc clustering order.
>
>
>
> However, this query does not behave as expected:
>
>
>
> SELECT * FROM m_ps_project_policy_device4 where project_id=1337 and
> policy_id=7331 and device_id='1234567890' group by policy_id, device_id
> limit 1;
>
>
>
> The result of this query gives me the FIRST record from the partition,
> which is the OLDEST record due to the clustering order desc.
>
>
>
> Is this a natural result due to my ordering? Would I need to use a view
> that has order by ASC to achieve what I want to do using the built-in group
> by aggregations? I am hoping there is a way to achieve what I want to do
> (getting the most recent record for each of the <project_id, policy_id,
> device_id> tuples using the built-in aggregation functions.
>
>
>
> Thanks,
>
> Joseph Wonesh
>
>
> This message is private and confidential. If you have received message in
> error, please notify us and remove from your system.
>

-- 
This message is private and confidential. If you have received message in 
error, please notify us and remove from your system. 

RE: Group By Does Not Follow Clustering Order

Posted by Kenneth Brotman <ke...@yahoo.com.INVALID>.
Hey Joseph,

 

Also, in your materialized view I think the “clustering order by” should have started with policy_id, so it would have been:   …WITH CLUSTERING ORDER BY(policy_id, device_id, date_created DESC);

however that doesn’t seem very performant. 

 

Maybe in the materialized view the primary key should have a compound partition key of (project_id, policy_id, device_id).

 

 

From: Kenneth Brotman [mailto:kenbrotman@yahoo.com.INVALID] 
Sent: Thursday, February 21, 2019 3:38 PM
To: user@cassandra.apache.org
Subject: RE: Group By Does Not Follow Clustering Order

 

Joseph,

 

In this statement from your email:

SELECT * FROM m_ps_project_policy_device4 where project_id=1337 and policy_id=7331 and device_id='1234567890' group by policy_id, device_id limit 1;

 

…why would you have the “group by policy_id, device_id” section at all when you are already doing “policy_id=7331 and device_id=’1234567890’” and returning one (limit 1)?

 

Kenneth Brotman

 

From: Joseph Wonesh [mailto:joseph.wonesh@sticknfind.com] 
Sent: Thursday, February 21, 2019 10:39 AM
To: user@cassandra.apache.org
Subject: Re: Group By Does Not Follow Clustering Order

 

Hi all,

 

I am bumping this email hoping that it can reach a larger audience.

 

Thanks,

Joseph

 

On Tue, Feb 12, 2019 at 11:45 AM Joseph Wonesh <jo...@sticknfind.com> wrote:

Hello,

 

I have a materialized view defined by the following:

 

CREATE MATERIALIZED VIEW m_ps_project_policy_device0 AS
       SELECT policy_id, device_id, project_id, namespace, metric_type, blufi_id, beacon_id, event_uuid, state, date_created, policy_name, beacon_name, blufi_name, value, duration FROM policy_state0
       WHERE policy_id IS NOT NULL AND device_id IS NOT NULL AND project_id IS NOT NULL AND namespace IS NOT NULL AND metric_type IS NOT NULL AND blufi_id IS NOT NULL AND beacon_id IS NOT NULL AND event_uuid IS NOT NULL AND state IS NOT NULL AND date_created IS NOT NULL
       PRIMARY KEY ((project_id), policy_id, device_id, date_created, blufi_id, beacon_id, state, namespace, metric_type, event_uuid)
       WITH CLUSTERING ORDER BY (date_created DESC);

 

This view works fine if i run a query like the following:

 

SELECT * FROM m_ps_project_policy_device4 where project_id=1337 and policy_id=7331 and device_id='1234567890' limit 1;

 

The result of this query gives me the most recent due to the date_created desc clustering order.

 

However, this query does not behave as expected:

 

SELECT * FROM m_ps_project_policy_device4 where project_id=1337 and policy_id=7331 and device_id='1234567890' group by policy_id, device_id limit 1;

 

The result of this query gives me the FIRST record from the partition, which is the OLDEST record due to the clustering order desc.

 

Is this a natural result due to my ordering? Would I need to use a view that has order by ASC to achieve what I want to do using the built-in group by aggregations? I am hoping there is a way to achieve what I want to do (getting the most recent record for each of the <project_id, policy_id, device_id> tuples using the built-in aggregation functions.

 

Thanks,

Joseph Wonesh


This message is private and confidential. If you have received message in error, please notify us and remove from your system. 


RE: Group By Does Not Follow Clustering Order

Posted by Kenneth Brotman <ke...@yahoo.com.INVALID>.
Joseph,

 

In this statement from your email:

SELECT * FROM m_ps_project_policy_device4 where project_id=1337 and policy_id=7331 and device_id='1234567890' group by policy_id, device_id limit 1;

 

…why would you have the “group by policy_id, device_id” section at all when you are already doing “policy_id=7331 and device_id=’1234567890’” and returning one (limit 1)?

 

Kenneth Brotman

 

From: Joseph Wonesh [mailto:joseph.wonesh@sticknfind.com] 
Sent: Thursday, February 21, 2019 10:39 AM
To: user@cassandra.apache.org
Subject: Re: Group By Does Not Follow Clustering Order

 

Hi all,

 

I am bumping this email hoping that it can reach a larger audience.

 

Thanks,

Joseph

 

On Tue, Feb 12, 2019 at 11:45 AM Joseph Wonesh <jo...@sticknfind.com> wrote:

Hello,

 

I have a materialized view defined by the following:

 

CREATE MATERIALIZED VIEW m_ps_project_policy_device0 AS
       SELECT policy_id, device_id, project_id, namespace, metric_type, blufi_id, beacon_id, event_uuid, state, date_created, policy_name, beacon_name, blufi_name, value, duration FROM policy_state0
       WHERE policy_id IS NOT NULL AND device_id IS NOT NULL AND project_id IS NOT NULL AND namespace IS NOT NULL AND metric_type IS NOT NULL AND blufi_id IS NOT NULL AND beacon_id IS NOT NULL AND event_uuid IS NOT NULL AND state IS NOT NULL AND date_created IS NOT NULL
       PRIMARY KEY ((project_id), policy_id, device_id, date_created, blufi_id, beacon_id, state, namespace, metric_type, event_uuid)
       WITH CLUSTERING ORDER BY (date_created DESC);

 

This view works fine if i run a query like the following:

 

SELECT * FROM m_ps_project_policy_device4 where project_id=1337 and policy_id=7331 and device_id='1234567890' limit 1;

 

The result of this query gives me the most recent due to the date_created desc clustering order.

 

However, this query does not behave as expected:

 

SELECT * FROM m_ps_project_policy_device4 where project_id=1337 and policy_id=7331 and device_id='1234567890' group by policy_id, device_id limit 1;

 

The result of this query gives me the FIRST record from the partition, which is the OLDEST record due to the clustering order desc.

 

Is this a natural result due to my ordering? Would I need to use a view that has order by ASC to achieve what I want to do using the built-in group by aggregations? I am hoping there is a way to achieve what I want to do (getting the most recent record for each of the <project_id, policy_id, device_id> tuples using the built-in aggregation functions.

 

Thanks,

Joseph Wonesh


This message is private and confidential. If you have received message in error, please notify us and remove from your system. 


Re: Group By Does Not Follow Clustering Order

Posted by Joseph Wonesh <jo...@sticknfind.com>.
Hi all,

I am bumping this email hoping that it can reach a larger audience.

Thanks,
Joseph

On Tue, Feb 12, 2019 at 11:45 AM Joseph Wonesh <jo...@sticknfind.com>
wrote:

> Hello,
>
> I have a materialized view defined by the following:
>
> CREATE MATERIALIZED VIEW m_ps_project_policy_device0 AS
>        SELECT policy_id, device_id, project_id, namespace, metric_type, blufi_id, beacon_id, event_uuid, state, date_created, policy_name, beacon_name, blufi_name, value, duration FROM policy_state0
>        WHERE policy_id IS NOT NULL AND device_id IS NOT NULL AND project_id IS NOT NULL AND namespace IS NOT NULL AND metric_type IS NOT NULL AND blufi_id IS NOT NULL AND beacon_id IS NOT NULL AND event_uuid IS NOT NULL AND state IS NOT NULL AND date_created IS NOT NULL
>        PRIMARY KEY ((project_id), policy_id, device_id, date_created, blufi_id, beacon_id, state, namespace, metric_type, event_uuid)
>        WITH CLUSTERING ORDER BY (date_created DESC);
>
>
> This view works fine if i run a query like the following:
>
> SELECT * FROM m_ps_project_policy_device4 where project_id=1337 and
> policy_id=7331 and device_id='1234567890' limit 1;
>
> The result of this query gives me the most recent due to the date_created
> desc clustering order.
>
> However, this query does not behave as expected:
>
> SELECT * FROM m_ps_project_policy_device4 where project_id=1337 and
> policy_id=7331 and device_id='1234567890' group by policy_id, device_id
> limit 1;
>
> The result of this query gives me the FIRST record from the partition,
> which is the OLDEST record due to the clustering order desc.
>
> Is this a natural result due to my ordering? Would I need to use a view
> that has order by ASC to achieve what I want to do using the built-in group
> by aggregations? I am hoping there is a way to achieve what I want to do
> (getting the most recent record for each of the <project_id, policy_id,
> device_id> tuples using the built-in aggregation functions.
>
> Thanks,
> Joseph Wonesh
>

-- 
This message is private and confidential. If you have received message in 
error, please notify us and remove from your system.