You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Jason Kania (JIRA)" <ji...@apache.org> on 2016/03/09 02:30:41 UTC

[jira] [Created] (CASSANDRA-11319) SELECT DISTINCT Should allow filtering by where clause to support time series

Jason Kania created CASSANDRA-11319:
---------------------------------------

             Summary: SELECT DISTINCT Should allow filtering by where clause to support time series
                 Key: CASSANDRA-11319
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11319
             Project: Cassandra
          Issue Type: Improvement
          Components: CQL
         Environment: Cassandra 3.0.3
            Reporter: Jason Kania


Due to the lack of built in sharding, we have been trying to split very wide rows. However in trying to find all the sharding column values after the fact, we have not been able to find a solution that is manageable. If a table is defined as follows:

CREATE TABLE IF NOT EXISTS "sensorReadings"
(
	"measurementList" blob,
	"sensorId" int,
	"sensorUnitId" int,
	"shardId" int,
	"time" timestamp,
	PRIMARY KEY ( ("sensorUnitId", "sensorId", "shardId"), "time" )
);

then

select DISTINCT "sensorUnitId","sensorId","shardId" from "sensorReadings";

will give all the unique partition keys but this can still be a very large set and so it should be possible to refine this with a where clause that contains only partition columns ie:

select DISTINCT "sensorUnitId","sensorId","shardId" from "sensorReadings" WHERE "sensorUnitId"='sensor17' AND "sensorId"=8;

Without this ability, we am forced to keep a table with available shardIds and update on every write so that we can even query the original table. While several scenarios allow the shardId to be determined automatically, attempts to iterate over the shards are seriously hampered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)