You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Joseph Wonesh <jo...@sticknfind.com> on 2019/02/12 16:45:06 UTC
Group By Does Not Follow Clustering Order
Hello,
I have a materialized view defined by the following:
CREATE MATERIALIZED VIEW m_ps_project_policy_device0 AS
SELECT policy_id, device_id, project_id, namespace,
metric_type, blufi_id, beacon_id, event_uuid, state, date_created,
policy_name, beacon_name, blufi_name, value, duration FROM
policy_state0
WHERE policy_id IS NOT NULL AND device_id IS NOT NULL AND
project_id IS NOT NULL AND namespace IS NOT NULL AND metric_type IS
NOT NULL AND blufi_id IS NOT NULL AND beacon_id IS NOT NULL AND
event_uuid IS NOT NULL AND state IS NOT NULL AND date_created IS NOT
NULL
PRIMARY KEY ((project_id), policy_id, device_id, date_created,
blufi_id, beacon_id, state, namespace, metric_type, event_uuid)
WITH CLUSTERING ORDER BY (date_created DESC);
This view works fine if i run a query like the following:
SELECT * FROM m_ps_project_policy_device4 where project_id=1337 and
policy_id=7331 and device_id='1234567890' limit 1;
The result of this query gives me the most recent due to the date_created
desc clustering order.
However, this query does not behave as expected:
SELECT * FROM m_ps_project_policy_device4 where project_id=1337 and
policy_id=7331 and device_id='1234567890' group by policy_id, device_id
limit 1;
The result of this query gives me the FIRST record from the partition,
which is the OLDEST record due to the clustering order desc.
Is this a natural result due to my ordering? Would I need to use a view
that has order by ASC to achieve what I want to do using the built-in group
by aggregations? I am hoping there is a way to achieve what I want to do
(getting the most recent record for each of the <project_id, policy_id,
device_id> tuples using the built-in aggregation functions.
Thanks,
Joseph Wonesh
--
This message is private and confidential. If you have received message in
error, please notify us and remove from your system.
Re: Group By Does Not Follow Clustering Order
Posted by Joseph Wonesh <jo...@sticknfind.com>.
Kenneth,
Thank you for the reply, and sorry for my late response. I did not notice
this had a reply until just now!
To answer your questions:
In this statement from your email:
>
> SELECT * FROM m_ps_project_policy_device4 where project_id=1337 and
> policy_id=7331 and device_id='1234567890' group by policy_id, device_id
> limit 1;
>
>
>
> …why would you have the “group by policy_id, device_id” section at all
> when you are already doing “policy_id=7331 and device_id=’1234567890’” and
> returning one (limit 1)?
>
The clustering key for the materialized view includes far more than just
policy_id and device_id. Using group by allows me to collapse the
extraneous keys into just the keys I am interested in (policy_id and
device_id). The limit 1 is extraneous from my query generator and can be
removed.
Also, in your materialized view I think the “clustering order by” should
> have started with policy_id, so it would have been: …WITH CLUSTERING
> ORDER BY(policy_id, device_id, date_created DESC);
>
> however that doesn’t seem very performant.
>
I ordered the clustering keys specifically to have date_created as the
first key so that I can (attempt to) do what was described in my question:
take the first result from a query which specifies the exact partition keys
and get the most recent record back. The issue now is that I seem to be
getting the oldest data first when using group by; this is what I would
like to solve -- hopefully without having to create yet another
materialized view with a different ordering.
On Thu, Feb 21, 2019 at 7:18 PM Kenneth Brotman
<ke...@yahoo.com.invalid> wrote:
> Hey Joseph,
>
>
>
> Also, in your materialized view I think the “clustering order by” should
> have started with policy_id, so it would have been: …WITH CLUSTERING
> ORDER BY(policy_id, device_id, date_created DESC);
>
> however that doesn’t seem very performant.
>
>
>
> Maybe in the materialized view the primary key should have a compound
> partition key of (project_id, policy_id, device_id).
>
>
>
>
>
> *From:* Kenneth Brotman [mailto:kenbrotman@yahoo.com.INVALID]
> *Sent:* Thursday, February 21, 2019 3:38 PM
> *To:* user@cassandra.apache.org
> *Subject:* RE: Group By Does Not Follow Clustering Order
>
>
>
> Joseph,
>
>
>
> In this statement from your email:
>
> SELECT * FROM m_ps_project_policy_device4 where project_id=1337 and
> policy_id=7331 and device_id='1234567890' group by policy_id, device_id
> limit 1;
>
>
>
> …why would you have the “group by policy_id, device_id” section at all
> when you are already doing “policy_id=7331 and device_id=’1234567890’” and
> returning one (limit 1)?
>
>
>
> Kenneth Brotman
>
>
>
> *From:* Joseph Wonesh [mailto:joseph.wonesh@sticknfind.com]
> *Sent:* Thursday, February 21, 2019 10:39 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Group By Does Not Follow Clustering Order
>
>
>
> Hi all,
>
>
>
> I am bumping this email hoping that it can reach a larger audience.
>
>
>
> Thanks,
>
> Joseph
>
>
>
> On Tue, Feb 12, 2019 at 11:45 AM Joseph Wonesh <
> joseph.wonesh@sticknfind.com> wrote:
>
> Hello,
>
>
>
> I have a materialized view defined by the following:
>
>
>
> *CREATE *MATERIALIZED VIEW m_ps_project_policy_device0
> *AS *SELECT policy_id, device_id, project_id, namespace, metric_type, blufi_id, beacon_id, event_uuid, state, date_created, policy_name, beacon_name, blufi_name, value, duration *FROM *policy_state0
> *WHERE *policy_id *IS NOT NULL AND *device_id *IS NOT NULL AND *project_id *IS NOT NULL AND *namespace *IS NOT NULL AND *metric_type *IS NOT NULL AND *blufi_id *IS NOT NULL AND *beacon_id *IS NOT NULL AND *event_uuid *IS NOT NULL AND *state *IS NOT NULL AND *date_created
> *IS NOT NULL *PRIMARY KEY ((project_id), policy_id, device_id, date_created, blufi_id, beacon_id, state, namespace, metric_type, event_uuid)
> *WITH *CLUSTERING *ORDER BY *(date_created *DESC*);
>
>
>
> This view works fine if i run a query like the following:
>
>
>
> SELECT * FROM m_ps_project_policy_device4 where project_id=1337 and
> policy_id=7331 and device_id='1234567890' limit 1;
>
>
>
> The result of this query gives me the most recent due to the date_created
> desc clustering order.
>
>
>
> However, this query does not behave as expected:
>
>
>
> SELECT * FROM m_ps_project_policy_device4 where project_id=1337 and
> policy_id=7331 and device_id='1234567890' group by policy_id, device_id
> limit 1;
>
>
>
> The result of this query gives me the FIRST record from the partition,
> which is the OLDEST record due to the clustering order desc.
>
>
>
> Is this a natural result due to my ordering? Would I need to use a view
> that has order by ASC to achieve what I want to do using the built-in group
> by aggregations? I am hoping there is a way to achieve what I want to do
> (getting the most recent record for each of the <project_id, policy_id,
> device_id> tuples using the built-in aggregation functions.
>
>
>
> Thanks,
>
> Joseph Wonesh
>
>
> This message is private and confidential. If you have received message in
> error, please notify us and remove from your system.
>
--
This message is private and confidential. If you have received message in
error, please notify us and remove from your system.
RE: Group By Does Not Follow Clustering Order
Posted by Kenneth Brotman <ke...@yahoo.com.INVALID>.
Hey Joseph,
Also, in your materialized view I think the “clustering order by” should have started with policy_id, so it would have been: …WITH CLUSTERING ORDER BY(policy_id, device_id, date_created DESC);
however that doesn’t seem very performant.
Maybe in the materialized view the primary key should have a compound partition key of (project_id, policy_id, device_id).
From: Kenneth Brotman [mailto:kenbrotman@yahoo.com.INVALID]
Sent: Thursday, February 21, 2019 3:38 PM
To: user@cassandra.apache.org
Subject: RE: Group By Does Not Follow Clustering Order
Joseph,
In this statement from your email:
SELECT * FROM m_ps_project_policy_device4 where project_id=1337 and policy_id=7331 and device_id='1234567890' group by policy_id, device_id limit 1;
…why would you have the “group by policy_id, device_id” section at all when you are already doing “policy_id=7331 and device_id=’1234567890’” and returning one (limit 1)?
Kenneth Brotman
From: Joseph Wonesh [mailto:joseph.wonesh@sticknfind.com]
Sent: Thursday, February 21, 2019 10:39 AM
To: user@cassandra.apache.org
Subject: Re: Group By Does Not Follow Clustering Order
Hi all,
I am bumping this email hoping that it can reach a larger audience.
Thanks,
Joseph
On Tue, Feb 12, 2019 at 11:45 AM Joseph Wonesh <jo...@sticknfind.com> wrote:
Hello,
I have a materialized view defined by the following:
CREATE MATERIALIZED VIEW m_ps_project_policy_device0 AS
SELECT policy_id, device_id, project_id, namespace, metric_type, blufi_id, beacon_id, event_uuid, state, date_created, policy_name, beacon_name, blufi_name, value, duration FROM policy_state0
WHERE policy_id IS NOT NULL AND device_id IS NOT NULL AND project_id IS NOT NULL AND namespace IS NOT NULL AND metric_type IS NOT NULL AND blufi_id IS NOT NULL AND beacon_id IS NOT NULL AND event_uuid IS NOT NULL AND state IS NOT NULL AND date_created IS NOT NULL
PRIMARY KEY ((project_id), policy_id, device_id, date_created, blufi_id, beacon_id, state, namespace, metric_type, event_uuid)
WITH CLUSTERING ORDER BY (date_created DESC);
This view works fine if i run a query like the following:
SELECT * FROM m_ps_project_policy_device4 where project_id=1337 and policy_id=7331 and device_id='1234567890' limit 1;
The result of this query gives me the most recent due to the date_created desc clustering order.
However, this query does not behave as expected:
SELECT * FROM m_ps_project_policy_device4 where project_id=1337 and policy_id=7331 and device_id='1234567890' group by policy_id, device_id limit 1;
The result of this query gives me the FIRST record from the partition, which is the OLDEST record due to the clustering order desc.
Is this a natural result due to my ordering? Would I need to use a view that has order by ASC to achieve what I want to do using the built-in group by aggregations? I am hoping there is a way to achieve what I want to do (getting the most recent record for each of the <project_id, policy_id, device_id> tuples using the built-in aggregation functions.
Thanks,
Joseph Wonesh
This message is private and confidential. If you have received message in error, please notify us and remove from your system.
RE: Group By Does Not Follow Clustering Order
Posted by Kenneth Brotman <ke...@yahoo.com.INVALID>.
Joseph,
In this statement from your email:
SELECT * FROM m_ps_project_policy_device4 where project_id=1337 and policy_id=7331 and device_id='1234567890' group by policy_id, device_id limit 1;
…why would you have the “group by policy_id, device_id” section at all when you are already doing “policy_id=7331 and device_id=’1234567890’” and returning one (limit 1)?
Kenneth Brotman
From: Joseph Wonesh [mailto:joseph.wonesh@sticknfind.com]
Sent: Thursday, February 21, 2019 10:39 AM
To: user@cassandra.apache.org
Subject: Re: Group By Does Not Follow Clustering Order
Hi all,
I am bumping this email hoping that it can reach a larger audience.
Thanks,
Joseph
On Tue, Feb 12, 2019 at 11:45 AM Joseph Wonesh <jo...@sticknfind.com> wrote:
Hello,
I have a materialized view defined by the following:
CREATE MATERIALIZED VIEW m_ps_project_policy_device0 AS
SELECT policy_id, device_id, project_id, namespace, metric_type, blufi_id, beacon_id, event_uuid, state, date_created, policy_name, beacon_name, blufi_name, value, duration FROM policy_state0
WHERE policy_id IS NOT NULL AND device_id IS NOT NULL AND project_id IS NOT NULL AND namespace IS NOT NULL AND metric_type IS NOT NULL AND blufi_id IS NOT NULL AND beacon_id IS NOT NULL AND event_uuid IS NOT NULL AND state IS NOT NULL AND date_created IS NOT NULL
PRIMARY KEY ((project_id), policy_id, device_id, date_created, blufi_id, beacon_id, state, namespace, metric_type, event_uuid)
WITH CLUSTERING ORDER BY (date_created DESC);
This view works fine if i run a query like the following:
SELECT * FROM m_ps_project_policy_device4 where project_id=1337 and policy_id=7331 and device_id='1234567890' limit 1;
The result of this query gives me the most recent due to the date_created desc clustering order.
However, this query does not behave as expected:
SELECT * FROM m_ps_project_policy_device4 where project_id=1337 and policy_id=7331 and device_id='1234567890' group by policy_id, device_id limit 1;
The result of this query gives me the FIRST record from the partition, which is the OLDEST record due to the clustering order desc.
Is this a natural result due to my ordering? Would I need to use a view that has order by ASC to achieve what I want to do using the built-in group by aggregations? I am hoping there is a way to achieve what I want to do (getting the most recent record for each of the <project_id, policy_id, device_id> tuples using the built-in aggregation functions.
Thanks,
Joseph Wonesh
This message is private and confidential. If you have received message in error, please notify us and remove from your system.
Re: Group By Does Not Follow Clustering Order
Posted by Joseph Wonesh <jo...@sticknfind.com>.
Hi all,
I am bumping this email hoping that it can reach a larger audience.
Thanks,
Joseph
On Tue, Feb 12, 2019 at 11:45 AM Joseph Wonesh <jo...@sticknfind.com>
wrote:
> Hello,
>
> I have a materialized view defined by the following:
>
> CREATE MATERIALIZED VIEW m_ps_project_policy_device0 AS
> SELECT policy_id, device_id, project_id, namespace, metric_type, blufi_id, beacon_id, event_uuid, state, date_created, policy_name, beacon_name, blufi_name, value, duration FROM policy_state0
> WHERE policy_id IS NOT NULL AND device_id IS NOT NULL AND project_id IS NOT NULL AND namespace IS NOT NULL AND metric_type IS NOT NULL AND blufi_id IS NOT NULL AND beacon_id IS NOT NULL AND event_uuid IS NOT NULL AND state IS NOT NULL AND date_created IS NOT NULL
> PRIMARY KEY ((project_id), policy_id, device_id, date_created, blufi_id, beacon_id, state, namespace, metric_type, event_uuid)
> WITH CLUSTERING ORDER BY (date_created DESC);
>
>
> This view works fine if i run a query like the following:
>
> SELECT * FROM m_ps_project_policy_device4 where project_id=1337 and
> policy_id=7331 and device_id='1234567890' limit 1;
>
> The result of this query gives me the most recent due to the date_created
> desc clustering order.
>
> However, this query does not behave as expected:
>
> SELECT * FROM m_ps_project_policy_device4 where project_id=1337 and
> policy_id=7331 and device_id='1234567890' group by policy_id, device_id
> limit 1;
>
> The result of this query gives me the FIRST record from the partition,
> which is the OLDEST record due to the clustering order desc.
>
> Is this a natural result due to my ordering? Would I need to use a view
> that has order by ASC to achieve what I want to do using the built-in group
> by aggregations? I am hoping there is a way to achieve what I want to do
> (getting the most recent record for each of the <project_id, policy_id,
> device_id> tuples using the built-in aggregation functions.
>
> Thanks,
> Joseph Wonesh
>
--
This message is private and confidential. If you have received message in
error, please notify us and remove from your system.