You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Jason Kania <ja...@ymail.com> on 2016/03/10 14:54:27 UTC

Strategy for dividing wide rows beyond just adding to the partition key

Hi,
We have sensor input that creates very wide rows and operations on these rows have started to timeout regulary. We have been trying to find a solution to dividing wide rows but keep hitting limitations that move the problem around instead of solving it.
We have a partition key consisting of a sensorUnitId and a sensorId and use a time field to access each column in the row. We tried adding a time based entry, timeShardId, to the partition key that consists of the year and week of year during which the reading was taken. This works for a number of queries but for scanning all the readings against a particular sensorUnitId and sensorId combination, we seem to be stuck.
We won't know the range of valid values of the timeShardId for a given sensorUnitId and sensorId combination so would have to write to an additional table to track the valid timeShardId. We suspect this would create tombstone accumulation problems given the number of updates required to the same row so haven't tried this option.

Alternatively, we hit a different bottleneck in the form of SELECT DISTINCT in trying to directly access the partition keys. Since SELECT DISTINCT does not allow for a where clause to filter on the partition key values, we have to filter several hundred thousand partition keys just to find those related to the relevant sensorUnitId and sensorId. This problem will only grow worse for us.

Are there any other approaches that can be suggested? We have been looking around, but haven't found any references beyond the initial suggestion to add some sort of shard id to the partition key to handle wide rows.
Thanks,
Jason

Re: Strategy for dividing wide rows beyond just adding to the partition key

Posted by Carlos Alonso <in...@mrcalonso.com>.

Hi.

+1 to this @Jack's sentence 'Generally, Cassandra is ideal for only two use
cases (access patterns really): 1) retrieval by a specific key, and 2)
retrieval of a relatively narrow slice of contiguous data, beginning with a
specific key.'

So I think you're modelling it properly (to have fairly narrow rows). I
think you can then store in another table the initial bucket for a sensor
and either don't have the end one (taking advantage that Cassandra is very
quick at finding empty partitions) and query until today. Or, given that
your bucketing is per week, only update the 'last partition' entry for a
sensor if we're really one week after the latest saved. That will generate
one single tombstone per sensor and that doesn't sound scary I think.

On the other hand. Did you considered offloading the historical data to a
better data warehouse?

Regards

Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>

On 12 March 2016 at 16:59, Jack Krupansky <ja...@gmail.com> wrote:

> Generally, secondary indexes are not recommended in Cassandra. Query
> tables and/or materialized views are the recommended alternative. But it
> all depends on the specific nature of the queries and the cardinality of
> the data.
>
> Generally, Cassandra is ideal for only two use cases (access patterns
> really): 1) retrieval by a specific key, and 2) retrieval of a relatively
> narrow slice of contiguous data, beginning with a specific key.
>
> Bulk retrieval is not a great access pattern for Cassandra. The emphasis
> is on being a database (that's why CQL is so similar to SQL) rather than a
> raw data store.
>
> Sure, technically you can do bulk retrieval, but essentially that requires
> modeling and accessing using relatively narrow slices.
>
> Closing the circle, Cassandra is always enhancing its capabilities and
> there is indeed that effort underway to support wider rows, but the
> emphasis of modeling still needs to be centered on point queries and narrow
> contiguous slices.
>
> Even with Spark and analytics that may indeed need to do a full scan of a
> large amount of data, the model needs to be that the big scan is done in
> small chunks.
>
>
> -- Jack Krupansky
>
> On Sat, Mar 12, 2016 at 10:23 AM, Jason Kania <ja...@ymail.com>
> wrote:
>
>> Our analytics currently pulls in all the data for a single sensor reading
>> as we use it in its entirety during signal processing. We may add secondary
>> indices to the table in the future to pull in broadly classified data, but
>> right now, our only goal is this bulk retrieval.
>>
>> ------------------------------
>> *From:* Jack Krupansky <ja...@gmail.com>
>> *To:* user@cassandra.apache.org
>> *Sent:* Friday, March 11, 2016 7:25 PM
>>
>> *Subject:* Re: Strategy for dividing wide rows beyond just adding to the
>> partition key
>>
>> Thanks, that level of query detail gives us a better picture to focus on.
>> I think through this some more over the weekend.
>>
>> Also, these queries focus on raw, bulk retrieval of sensor data readings,
>> but do you have reading-based queries, such as range of an actual sensor
>> reading?
>>
>> -- Jack Krupansky
>>
>> On Fri, Mar 11, 2016 at 7:08 PM, Jason Kania <ja...@ymail.com>
>> wrote:
>>
>> The 5000 readings mentioned would be against a single sensor on a single
>> sensor unit.
>>
>> The scope of the queries on this table is intended to be fairly simple.
>> Here are some example queries, without 'sharding', that we would perform on
>> this table:
>>
>> SELECT "time","readings" FROM "sensorReadings"
>> WHERE "sensorUnitId"=5123 AND "sensorId"=17 AND time<=?
>> ORDER BY time DESC LIMIT 5000
>>
>> SELECT "time","readings" FROM "sensorReadings"
>> WHERE "sensorUnitId"=5123 AND "sensorId"=17 AND time>=?
>> ORDER BY time LIMIT 5000
>>
>> SELECT "time","readings" FROM "sensorReadings"
>> WHERE "sensorUnitId"=5123 AND "sensorId"=17 AND time<=? AND
>> classification=?
>> ORDER BY time DESC LIMIT 5000
>>
>> where 'classification' is secondary index that we expect to add.
>>
>> In some cases, we have to revisit all values too so a complete table scan
>> is needed:
>>
>> SELECT "time","readings" FROM "sensorReadings"
>>
>> Getting the "next" and "previous" 5000 readings is also something we do,
>> but is manageable from our standpoint as we can look at the range-end
>> timestamps that are returned and use those in the subsequent queries.
>>
>> SELECT "time","readings" FROM "sensorReadings"
>> WHERE "sensorUnitId"=5123 AND "sensorId"=17 AND time>=? AND time<=?
>> ORDER BY time LIMIT 5000
>>
>> Splitting the bulk content out of the main table is something we
>> considered too but we didn't find any detail on whether that would solve
>> our timeout problem. If there is a reference for using this approach, it
>> would be of interest to us to avoid any assumptions on how we would
>> approach it.
>>
>> A question: Is the probability of a timeout directly linked to a longer
>> seek time in reading through a partition's contents? If that is the case,
>> splitting the partition keys into a separate table would be straightforward.
>>
>> Regards,
>>
>> Jason
>>
>> ------------------------------
>> *From:* Jack Krupansky <ja...@gmail.com>
>> *To:* user@cassandra.apache.org; Jason Kania <ja...@ymail.com>
>> *Sent:* Friday, March 11, 2016 6:22 PM
>>
>> *Subject:* Re: Strategy for dividing wide rows beyond just adding to the
>> partition key
>>
>> Thanks for the additional information, but there is still not enough
>> color on the queries and too much focus on a premature data model.
>>
>> Is this 5000 readings for a single sensor of a single sensor unit, or for
>> all sensors of a specified unit, or... both?
>>
>> I presume you want "next" and "previous" 5000 readings as well as first
>> and last, but... you will have to confirm that.
>>
>> One technique is to store the bulk of your raw sensor data in a separate
>> table and then simply store the PK of that data in your time series. That
>> way you can have a much wider row of time series (number of rows) without
>> hitting a bulk size issue for the partition. But... I don't want to jump to
>> solutions until we have a firmer handle on the query side of the fence.
>>
>> -- Jack Krupansky
>>
>> On Fri, Mar 11, 2016 at 5:37 PM, Jason Kania <ja...@ymail.com>
>> wrote:
>>
>> Jack,
>>
>> Thanks for the response.
>>
>> We are targeting our database design to 10000 sensor units and each
>> sensor unit has 32 sensors. We are seeing about 700 events per day per
>> sensor, each providing about 2K of data. Based on keeping each partition to
>> about 10 Mb (based on readings we saw on performance), we chose to break
>> our partitions on a weekly basis. This is possibly finer than we need as we
>> were seeing timeouts only once a single partition was about 150Mb in size
>>
>> When pulling in data, we will typically need to pull 1 to 4 months of
>> data for our analysis and will use only the sensorUnitId and sensorId to
>> uniquely identify the data source with the timeShard value used to break up
>> our partitions. We have handling to sequentially scan based on our
>> "timeShard" value, but don't have a good handle on the determination of the
>> "timeShard" portion of the partition key at read time. The data starts
>> coming in when a subscriber starts using our system and finishes when they
>> discontinue service or put the service on hold temporarily.
>>
>> When I talk about hotspots, it isn't the time series data that is the
>> concern, it is with respect to storing the maximum and minimum timeShard
>> values in another table for subsequent lookup or the cost of running the
>> current implementation of SELECT DISTINCT. We need to run queries such as
>> getting the first or last 5000 sensor readings when we don't know the time
>> frame at which they occurred so cannot directly supply the timeShard
>> portion of our partition key.
>>
>> I appreciate your input,
>>
>> Thanks,
>>
>> Jason
>>
>> ------------------------------
>> *From:* Jack Krupansky <ja...@gmail.com>
>> *To:* "user@cassandra.apache.org" <us...@cassandra.apache.org>
>> *Sent:* Friday, March 11, 2016 4:45 PM
>>
>> *Subject:* Re: Strategy for dividing wide rows beyond just adding to the
>> partition key
>>
>> I'll stay away from advising on a specific schema per se, but I'll stick
>> to the advice that you need to make sure that your queries are depending
>> solely on the columns of the primary key or relatively short slices/scans,
>> rather than run the risk of very long scans or having to process multiple
>> partitions for a single query. That's canned to some extent, but still
>> essential.
>>
>> Of course we generally wish to avoid hotspots, but with time series they
>> are unavoidable. I mean, sure you could place successive events at separate
>> partitions, but then you can't do any kind of scanning/slicing.
>>
>> But, events for separate sensors are not true hotspots in the traditional
>> sense - unless you have only a single sensor/unit.
>>
>> After considering your queries, the next step is to consider the
>> cardinality of your data - how many sensors, how many units, rate of
>> events, etc. That will feedback into queries as well, such as how big a
>> slice or scan might be, as well as sizing of partitions.
>>
>> So, how many sensor units do you expect, how many sensors per unit, and
>> expected rate of events per sensor?
>>
>> Try not to jump too quickly to specific solutions - there really is a
>> method to understanding all of this other stuff upfront.
>>
>> -- Jack Krupansky
>>
>> On Thu, Mar 10, 2016 at 12:39 PM, Jason Kania <ja...@ymail.com>
>> wrote:
>>
>> Jack,
>>
>> Thanks for the response. I don't think I provided enough information and
>> used the wrong terminology as your response is more the canned advice is
>> response to Cassandra antipatterns.
>>
>> To make this clearer, this is what we are doing:
>>
>> create table sensorReadings (
>> sensorUnitId int,
>> sensorId int,
>> time timestamp,
>> timeShard int,
>> readings blob,
>> primary key((sensorUnitId, sensorId, timeShard), time);
>>
>> where timeShard is a combination of year and week of year
>>
>> For known time range based queries, this works great. However, the
>> specific problem is in knowing the maximum and minimum timeShard values
>> when we want to select the entire range of data. Our understanding is that
>> if we update another related table with the maximum and minimum timeShard
>> value for a given sensorUnitId and sensorId combination, we will create a
>> hotspot and lots of tombstones. If we SELECT DISTINCT, we get a huge list
>> of partition keys for the table because we cannot reduce the scope with a
>> where clause.
>>
>> If there is a recommended pattern that solves this, we haven't come
>> across it.
>>
>> I hope makes the problem clearer.
>>
>> Thanks,
>>
>> Jason
>>
>> ------------------------------
>> *From:* Jack Krupansky <ja...@gmail.com>
>> *To:* user@cassandra.apache.org; Jason Kania <ja...@ymail.com>
>> *Sent:* Thursday, March 10, 2016 10:42 AM
>> *Subject:* Re: Strategy for dividing wide rows beyond just adding to the
>> partition key
>>
>> There is an effort underway to support wider rows:
>> https://issues.apache.org/jira/browse/CASSANDRA-9754
>>
>> This won't help you now though. Even with that improvement you still may
>> need a more optimal data model since large-scale scanning/filtering is
>> always a very bad idea with Cassandra.
>>
>> The data modeling methodology for Cassandra dictates that queries drive
>> the data model and that each form of query requires a separate table
>> ("query table.") Materialized view can automate that process for a lot of
>> cases, but in any case it does sound as if some of your queries do require
>> additional tables.
>>
>> As a general proposition, Cassandra should not be used for heavy
>> filtering - query tables with the filtering criteria baked into the PK is
>> the way to go.
>>
>>
>> -- Jack Krupansky
>>
>> On Thu, Mar 10, 2016 at 8:54 AM, Jason Kania <ja...@ymail.com>
>> wrote:
>>
>> Hi,
>>
>> We have sensor input that creates very wide rows and operations on these
>> rows have started to timeout regulary. We have been trying to find a
>> solution to dividing wide rows but keep hitting limitations that move the
>> problem around instead of solving it.
>>
>> We have a partition key consisting of a sensorUnitId and a sensorId and
>> use a time field to access each column in the row. We tried adding a time
>> based entry, timeShardId, to the partition key that consists of the year
>> and week of year during which the reading was taken. This works for a
>> number of queries but for scanning all the readings against a particular
>> sensorUnitId and sensorId combination, we seem to be stuck.
>>
>> We won't know the range of valid values of the timeShardId for a given
>> sensorUnitId and sensorId combination so would have to write to an
>> additional table to track the valid timeShardId. We suspect this would
>> create tombstone accumulation problems given the number of updates required
>> to the same row so haven't tried this option.
>>
>> Alternatively, we hit a different bottleneck in the form of SELECT
>> DISTINCT in trying to directly access the partition keys. Since SELECT
>> DISTINCT does not allow for a where clause to filter on the partition key
>> values, we have to filter several hundred thousand partition keys just to
>> find those related to the relevant sensorUnitId and sensorId. This problem
>> will only grow worse for us.
>>
>> Are there any other approaches that can be suggested? We have been
>> looking around, but haven't found any references beyond the initial
>> suggestion to add some sort of shard id to the partition key to handle wide
>> rows.
>>
>> Thanks,
>>
>> Jason
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>

Re: Strategy for dividing wide rows beyond just adding to the partition key

Posted by Jack Krupansky <ja...@gmail.com>.

Generally, secondary indexes are not recommended in Cassandra. Query tables
and/or materialized views are the recommended alternative. But it all
depends on the specific nature of the queries and the cardinality of the
data.

Generally, Cassandra is ideal for only two use cases (access patterns
really): 1) retrieval by a specific key, and 2) retrieval of a relatively
narrow slice of contiguous data, beginning with a specific key.

Bulk retrieval is not a great access pattern for Cassandra. The emphasis is
on being a database (that's why CQL is so similar to SQL) rather than a raw
data store.

Sure, technically you can do bulk retrieval, but essentially that requires
modeling and accessing using relatively narrow slices.

Closing the circle, Cassandra is always enhancing its capabilities and
there is indeed that effort underway to support wider rows, but the
emphasis of modeling still needs to be centered on point queries and narrow
contiguous slices.

Even with Spark and analytics that may indeed need to do a full scan of a
large amount of data, the model needs to be that the big scan is done in
small chunks.


-- Jack Krupansky

On Sat, Mar 12, 2016 at 10:23 AM, Jason Kania <ja...@ymail.com> wrote:

> Our analytics currently pulls in all the data for a single sensor reading
> as we use it in its entirety during signal processing. We may add secondary
> indices to the table in the future to pull in broadly classified data, but
> right now, our only goal is this bulk retrieval.
>
> ------------------------------
> *From:* Jack Krupansky <ja...@gmail.com>
> *To:* user@cassandra.apache.org
> *Sent:* Friday, March 11, 2016 7:25 PM
>
> *Subject:* Re: Strategy for dividing wide rows beyond just adding to the
> partition key
>
> Thanks, that level of query detail gives us a better picture to focus on.
> I think through this some more over the weekend.
>
> Also, these queries focus on raw, bulk retrieval of sensor data readings,
> but do you have reading-based queries, such as range of an actual sensor
> reading?
>
> -- Jack Krupansky
>
> On Fri, Mar 11, 2016 at 7:08 PM, Jason Kania <ja...@ymail.com>
> wrote:
>
> The 5000 readings mentioned would be against a single sensor on a single
> sensor unit.
>
> The scope of the queries on this table is intended to be fairly simple.
> Here are some example queries, without 'sharding', that we would perform on
> this table:
>
> SELECT "time","readings" FROM "sensorReadings"
> WHERE "sensorUnitId"=5123 AND "sensorId"=17 AND time<=?
> ORDER BY time DESC LIMIT 5000
>
> SELECT "time","readings" FROM "sensorReadings"
> WHERE "sensorUnitId"=5123 AND "sensorId"=17 AND time>=?
> ORDER BY time LIMIT 5000
>
> SELECT "time","readings" FROM "sensorReadings"
> WHERE "sensorUnitId"=5123 AND "sensorId"=17 AND time<=? AND
> classification=?
> ORDER BY time DESC LIMIT 5000
>
> where 'classification' is secondary index that we expect to add.
>
> In some cases, we have to revisit all values too so a complete table scan
> is needed:
>
> SELECT "time","readings" FROM "sensorReadings"
>
> Getting the "next" and "previous" 5000 readings is also something we do,
> but is manageable from our standpoint as we can look at the range-end
> timestamps that are returned and use those in the subsequent queries.
>
> SELECT "time","readings" FROM "sensorReadings"
> WHERE "sensorUnitId"=5123 AND "sensorId"=17 AND time>=? AND time<=?
> ORDER BY time LIMIT 5000
>
> Splitting the bulk content out of the main table is something we
> considered too but we didn't find any detail on whether that would solve
> our timeout problem. If there is a reference for using this approach, it
> would be of interest to us to avoid any assumptions on how we would
> approach it.
>
> A question: Is the probability of a timeout directly linked to a longer
> seek time in reading through a partition's contents? If that is the case,
> splitting the partition keys into a separate table would be straightforward.
>
> Regards,
>
> Jason
>
> ------------------------------
> *From:* Jack Krupansky <ja...@gmail.com>
> *To:* user@cassandra.apache.org; Jason Kania <ja...@ymail.com>
> *Sent:* Friday, March 11, 2016 6:22 PM
>
> *Subject:* Re: Strategy for dividing wide rows beyond just adding to the
> partition key
>
> Thanks for the additional information, but there is still not enough color
> on the queries and too much focus on a premature data model.
>
> Is this 5000 readings for a single sensor of a single sensor unit, or for
> all sensors of a specified unit, or... both?
>
> I presume you want "next" and "previous" 5000 readings as well as first
> and last, but... you will have to confirm that.
>
> One technique is to store the bulk of your raw sensor data in a separate
> table and then simply store the PK of that data in your time series. That
> way you can have a much wider row of time series (number of rows) without
> hitting a bulk size issue for the partition. But... I don't want to jump to
> solutions until we have a firmer handle on the query side of the fence.
>
> -- Jack Krupansky
>
> On Fri, Mar 11, 2016 at 5:37 PM, Jason Kania <ja...@ymail.com>
> wrote:
>
> Jack,
>
> Thanks for the response.
>
> We are targeting our database design to 10000 sensor units and each sensor
> unit has 32 sensors. We are seeing about 700 events per day per sensor,
> each providing about 2K of data. Based on keeping each partition to about
> 10 Mb (based on readings we saw on performance), we chose to break our
> partitions on a weekly basis. This is possibly finer than we need as we
> were seeing timeouts only once a single partition was about 150Mb in size
>
> When pulling in data, we will typically need to pull 1 to 4 months of data
> for our analysis and will use only the sensorUnitId and sensorId to
> uniquely identify the data source with the timeShard value used to break up
> our partitions. We have handling to sequentially scan based on our
> "timeShard" value, but don't have a good handle on the determination of the
> "timeShard" portion of the partition key at read time. The data starts
> coming in when a subscriber starts using our system and finishes when they
> discontinue service or put the service on hold temporarily.
>
> When I talk about hotspots, it isn't the time series data that is the
> concern, it is with respect to storing the maximum and minimum timeShard
> values in another table for subsequent lookup or the cost of running the
> current implementation of SELECT DISTINCT. We need to run queries such as
> getting the first or last 5000 sensor readings when we don't know the time
> frame at which they occurred so cannot directly supply the timeShard
> portion of our partition key.
>
> I appreciate your input,
>
> Thanks,
>
> Jason
>
> ------------------------------
> *From:* Jack Krupansky <ja...@gmail.com>
> *To:* "user@cassandra.apache.org" <us...@cassandra.apache.org>
> *Sent:* Friday, March 11, 2016 4:45 PM
>
> *Subject:* Re: Strategy for dividing wide rows beyond just adding to the
> partition key
>
> I'll stay away from advising on a specific schema per se, but I'll stick
> to the advice that you need to make sure that your queries are depending
> solely on the columns of the primary key or relatively short slices/scans,
> rather than run the risk of very long scans or having to process multiple
> partitions for a single query. That's canned to some extent, but still
> essential.
>
> Of course we generally wish to avoid hotspots, but with time series they
> are unavoidable. I mean, sure you could place successive events at separate
> partitions, but then you can't do any kind of scanning/slicing.
>
> But, events for separate sensors are not true hotspots in the traditional
> sense - unless you have only a single sensor/unit.
>
> After considering your queries, the next step is to consider the
> cardinality of your data - how many sensors, how many units, rate of
> events, etc. That will feedback into queries as well, such as how big a
> slice or scan might be, as well as sizing of partitions.
>
> So, how many sensor units do you expect, how many sensors per unit, and
> expected rate of events per sensor?
>
> Try not to jump too quickly to specific solutions - there really is a
> method to understanding all of this other stuff upfront.
>
> -- Jack Krupansky
>
> On Thu, Mar 10, 2016 at 12:39 PM, Jason Kania <ja...@ymail.com>
> wrote:
>
> Jack,
>
> Thanks for the response. I don't think I provided enough information and
> used the wrong terminology as your response is more the canned advice is
> response to Cassandra antipatterns.
>
> To make this clearer, this is what we are doing:
>
> create table sensorReadings (
> sensorUnitId int,
> sensorId int,
> time timestamp,
> timeShard int,
> readings blob,
> primary key((sensorUnitId, sensorId, timeShard), time);
>
> where timeShard is a combination of year and week of year
>
> For known time range based queries, this works great. However, the
> specific problem is in knowing the maximum and minimum timeShard values
> when we want to select the entire range of data. Our understanding is that
> if we update another related table with the maximum and minimum timeShard
> value for a given sensorUnitId and sensorId combination, we will create a
> hotspot and lots of tombstones. If we SELECT DISTINCT, we get a huge list
> of partition keys for the table because we cannot reduce the scope with a
> where clause.
>
> If there is a recommended pattern that solves this, we haven't come across
> it.
>
> I hope makes the problem clearer.
>
> Thanks,
>
> Jason
>
> ------------------------------
> *From:* Jack Krupansky <ja...@gmail.com>
> *To:* user@cassandra.apache.org; Jason Kania <ja...@ymail.com>
> *Sent:* Thursday, March 10, 2016 10:42 AM
> *Subject:* Re: Strategy for dividing wide rows beyond just adding to the
> partition key
>
> There is an effort underway to support wider rows:
> https://issues.apache.org/jira/browse/CASSANDRA-9754
>
> This won't help you now though. Even with that improvement you still may
> need a more optimal data model since large-scale scanning/filtering is
> always a very bad idea with Cassandra.
>
> The data modeling methodology for Cassandra dictates that queries drive
> the data model and that each form of query requires a separate table
> ("query table.") Materialized view can automate that process for a lot of
> cases, but in any case it does sound as if some of your queries do require
> additional tables.
>
> As a general proposition, Cassandra should not be used for heavy filtering
> - query tables with the filtering criteria baked into the PK is the way to
> go.
>
>
> -- Jack Krupansky
>
> On Thu, Mar 10, 2016 at 8:54 AM, Jason Kania <ja...@ymail.com>
> wrote:
>
> Hi,
>
> We have sensor input that creates very wide rows and operations on these
> rows have started to timeout regulary. We have been trying to find a
> solution to dividing wide rows but keep hitting limitations that move the
> problem around instead of solving it.
>
> We have a partition key consisting of a sensorUnitId and a sensorId and
> use a time field to access each column in the row. We tried adding a time
> based entry, timeShardId, to the partition key that consists of the year
> and week of year during which the reading was taken. This works for a
> number of queries but for scanning all the readings against a particular
> sensorUnitId and sensorId combination, we seem to be stuck.
>
> We won't know the range of valid values of the timeShardId for a given
> sensorUnitId and sensorId combination so would have to write to an
> additional table to track the valid timeShardId. We suspect this would
> create tombstone accumulation problems given the number of updates required
> to the same row so haven't tried this option.
>
> Alternatively, we hit a different bottleneck in the form of SELECT
> DISTINCT in trying to directly access the partition keys. Since SELECT
> DISTINCT does not allow for a where clause to filter on the partition key
> values, we have to filter several hundred thousand partition keys just to
> find those related to the relevant sensorUnitId and sensorId. This problem
> will only grow worse for us.
>
> Are there any other approaches that can be suggested? We have been looking
> around, but haven't found any references beyond the initial suggestion to
> add some sort of shard id to the partition key to handle wide rows.
>
> Thanks,
>
> Jason
>
>
>
>
>
>
>
>
>
>
>
>
>
>

Re: Strategy for dividing wide rows beyond just adding to the partition key

Posted by Jason Kania <ja...@ymail.com>.

Our analytics currently pulls in all the data for a single sensor reading as we use it in its entirety during signal processing. We may add secondary indices to the table in the future to pull in broadly classified data, but right now, our only goal is this bulk retrieval.
From: Jack Krupansky <ja...@gmail.com>
To: user@cassandra.apache.org
Sent: Friday, March 11, 2016 7:25 PM
Subject: Re: Strategy for dividing wide rows beyond just adding to the partition key

Thanks, that level of query detail gives us a better picture to focus on. I think through this some more over the weekend.
Also, these queries focus on raw, bulk retrieval of sensor data readings, but do you have reading-based queries, such as range of an actual sensor reading?
-- Jack Krupansky
On Fri, Mar 11, 2016 at 7:08 PM, Jason Kania <ja...@ymail.com> wrote:

The 5000 readings mentioned would be against a single sensor on a single sensor unit.

The scope of the queries on this table is intended to be fairly simple. Here are some example queries, without 'sharding', that we would perform on this table:

SELECT "time","readings" FROM "sensorReadings"WHERE "sensorUnitId"=5123 AND "sensorId"=17 AND time<=?ORDER BY time DESC LIMIT 5000
SELECT "time","readings" FROM "sensorReadings"WHERE "sensorUnitId"=5123 AND "sensorId"=17 AND time>=?ORDER BY time LIMIT 5000
SELECT "time","readings" FROM "sensorReadings"WHERE "sensorUnitId"=5123 AND "sensorId"=17 AND time<=? AND classification=?
ORDER BY time DESC LIMIT 5000
where 'classification' is secondary index that we expect to add.

In some cases, we have to revisit all values too so a complete table scan is needed:
SELECT "time","readings" FROM "sensorReadings"
Getting the "next" and "previous" 5000 readings is also something we do, but is manageable from our standpoint as we can look at the range-end timestamps that are returned and use those in the subsequent queries.

SELECT "time","readings" FROM "sensorReadings"WHERE "sensorUnitId"=5123 AND "sensorId"=17 AND time>=? AND time<=?ORDER BY time LIMIT 5000
Splitting the bulk content out of the main table is something we considered too but we didn't find any detail on whether that would solve our timeout problem. If there is a reference for using this approach, it would be of interest to us to avoid any assumptions on how we would approach it.

A question: Is the probability of a timeout directly linked to a longer seek time in reading through a partition's contents? If that is the case, splitting the partition keys into a separate table would be straightforward.

Regards,
Jason

From: Jack Krupansky <ja...@gmail.com>
To: user@cassandra.apache.org; Jason Kania <ja...@ymail.com>
Sent: Friday, March 11, 2016 6:22 PM
Subject: Re: Strategy for dividing wide rows beyond just adding to the partition key

Thanks for the additional information, but there is still not enough color on the queries and too much focus on a premature data model.
Is this 5000 readings for a single sensor of a single sensor unit, or for all sensors of a specified unit, or... both?
I presume you want "next" and "previous" 5000 readings as well as first and last, but... you will have to confirm that.
One technique is to store the bulk of your raw sensor data in a separate table and then simply store the PK of that data in your time series. That way you can have a much wider row of time series (number of rows) without hitting a bulk size issue for the partition. But... I don't want to jump to solutions until we have a firmer handle on the query side of the fence.
-- Jack Krupansky
On Fri, Mar 11, 2016 at 5:37 PM, Jason Kania <ja...@ymail.com> wrote:

Jack,
Thanks for the response.
We are targeting our database design to 10000 sensor units and each sensor unit has 32 sensors. We are seeing about 700 events per day per sensor, each providing about 2K of data. Based on keeping each partition to about 10 Mb (based on readings we saw on performance), we chose to break our partitions on a weekly basis. This is possibly finer than we need as we were seeing timeouts only once a single partition was about 150Mb in size

When pulling in data, we will typically need to pull 1 to 4 months of data for our analysis and will use only the sensorUnitId and sensorId to uniquely identify the data source with the timeShard value used to break up our partitions. We have handling to sequentially scan based on our "timeShard" value, but don't have a good handle on the determination of the "timeShard" portion of the partition key at read time. The data starts coming in when a subscriber starts using our system and finishes when they discontinue service or put the service on hold temporarily.

When I talk about hotspots, it isn't the time series data that is the concern, it is with respect to storing the maximum and minimum timeShard values in another table for subsequent lookup or the cost of running the current implementation of SELECT DISTINCT. We need to run queries such as getting the first or last 5000 sensor readings when we don't know the time frame at which they occurred so cannot directly supply the timeShard portion of our partition key.

I appreciate your input,
Thanks,
Jason

From: Jack Krupansky <ja...@gmail.com>
To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Sent: Friday, March 11, 2016 4:45 PM
Subject: Re: Strategy for dividing wide rows beyond just adding to the partition key

I'll stay away from advising on a specific schema per se, but I'll stick to the advice that you need to make sure that your queries are depending solely on the columns of the primary key or relatively short slices/scans, rather than run the risk of very long scans or having to process multiple partitions for a single query. That's canned to some extent, but still essential.
Of course we generally wish to avoid hotspots, but with time series they are unavoidable. I mean, sure you could place successive events at separate partitions, but then you can't do any kind of scanning/slicing.
But, events for separate sensors are not true hotspots in the traditional sense - unless you have only a single sensor/unit.
After considering your queries, the next step is to consider the cardinality of your data - how many sensors, how many units, rate of events, etc. That will feedback into queries as well, such as how big a slice or scan might be, as well as sizing of partitions.
So, how many sensor units do you expect, how many sensors per unit, and expected rate of events per sensor?
Try not to jump too quickly to specific solutions - there really is a method to understanding all of this other stuff upfront.
-- Jack Krupansky
On Thu, Mar 10, 2016 at 12:39 PM, Jason Kania <ja...@ymail.com> wrote:

Jack,
Thanks for the response. I don't think I provided enough information and used the wrong terminology as your response is more the canned advice is response to Cassandra antipatterns.
To make this clearer, this is what we are doing:
create table sensorReadings (sensorUnitId int,
sensorId int,time timestamp,timeShard int,
readings blob,primary key((sensorUnitId, sensorId, timeShard), time);
where timeShard is a combination of year and week of year
For known time range based queries, this works great. However, the specific problem is in knowing the maximum and minimum timeShard values when we want to select the entire range of data. Our understanding is that if we update another related table with the maximum and minimum timeShard value for a given sensorUnitId and sensorId combination, we will create a hotspot and lots of tombstones. If we SELECT DISTINCT, we get a huge list of partition keys for the table because we cannot reduce the scope with a where clause.

If there is a recommended pattern that solves this, we haven't come across it.

I hope makes the problem clearer.
Thanks,
Jason

From: Jack Krupansky <ja...@gmail.com>
To: user@cassandra.apache.org; Jason Kania <ja...@ymail.com>
Sent: Thursday, March 10, 2016 10:42 AM
Subject: Re: Strategy for dividing wide rows beyond just adding to the partition key

There is an effort underway to support wider rows:https://issues.apache.org/jira/browse/CASSANDRA-9754

This won't help you now though. Even with that improvement you still may need a more optimal data model since large-scale scanning/filtering is always a very bad idea with Cassandra.
The data modeling methodology for Cassandra dictates that queries drive the data model and that each form of query requires a separate table ("query table.") Materialized view can automate that process for a lot of cases, but in any case it does sound as if some of your queries do require additional tables.
As a general proposition, Cassandra should not be used for heavy filtering - query tables with the filtering criteria baked into the PK is the way to go.

-- Jack Krupansky
On Thu, Mar 10, 2016 at 8:54 AM, Jason Kania <ja...@ymail.com> wrote:

Hi,
We have sensor input that creates very wide rows and operations on these rows have started to timeout regulary. We have been trying to find a solution to dividing wide rows but keep hitting limitations that move the problem around instead of solving it.
We have a partition key consisting of a sensorUnitId and a sensorId and use a time field to access each column in the row. We tried adding a time based entry, timeShardId, to the partition key that consists of the year and week of year during which the reading was taken. This works for a number of queries but for scanning all the readings against a particular sensorUnitId and sensorId combination, we seem to be stuck.
We won't know the range of valid values of the timeShardId for a given sensorUnitId and sensorId combination so would have to write to an additional table to track the valid timeShardId. We suspect this would create tombstone accumulation problems given the number of updates required to the same row so haven't tried this option.

Alternatively, we hit a different bottleneck in the form of SELECT DISTINCT in trying to directly access the partition keys. Since SELECT DISTINCT does not allow for a where clause to filter on the partition key values, we have to filter several hundred thousand partition keys just to find those related to the relevant sensorUnitId and sensorId. This problem will only grow worse for us.

Are there any other approaches that can be suggested? We have been looking around, but haven't found any references beyond the initial suggestion to add some sort of shard id to the partition key to handle wide rows.
Thanks,
Jason

Re: Strategy for dividing wide rows beyond just adding to the partition key

Posted by Jack Krupansky <ja...@gmail.com>.

Thanks, that level of query detail gives us a better picture to focus on. I
think through this some more over the weekend.

Also, these queries focus on raw, bulk retrieval of sensor data readings,
but do you have reading-based queries, such as range of an actual sensor
reading?

-- Jack Krupansky

On Fri, Mar 11, 2016 at 7:08 PM, Jason Kania <ja...@ymail.com> wrote:

> The 5000 readings mentioned would be against a single sensor on a single
> sensor unit.
>
> The scope of the queries on this table is intended to be fairly simple.
> Here are some example queries, without 'sharding', that we would perform on
> this table:
>
> SELECT "time","readings" FROM "sensorReadings"
> WHERE "sensorUnitId"=5123 AND "sensorId"=17 AND time<=?
> ORDER BY time DESC LIMIT 5000
>
> SELECT "time","readings" FROM "sensorReadings"
> WHERE "sensorUnitId"=5123 AND "sensorId"=17 AND time>=?
> ORDER BY time LIMIT 5000
>
> SELECT "time","readings" FROM "sensorReadings"
> WHERE "sensorUnitId"=5123 AND "sensorId"=17 AND time<=? AND
> classification=?
> ORDER BY time DESC LIMIT 5000
>
> where 'classification' is secondary index that we expect to add.
>
> In some cases, we have to revisit all values too so a complete table scan
> is needed:
>
> SELECT "time","readings" FROM "sensorReadings"
>
> Getting the "next" and "previous" 5000 readings is also something we do,
> but is manageable from our standpoint as we can look at the range-end
> timestamps that are returned and use those in the subsequent queries.
>
> SELECT "time","readings" FROM "sensorReadings"
> WHERE "sensorUnitId"=5123 AND "sensorId"=17 AND time>=? AND time<=?
> ORDER BY time LIMIT 5000
>
> Splitting the bulk content out of the main table is something we
> considered too but we didn't find any detail on whether that would solve
> our timeout problem. If there is a reference for using this approach, it
> would be of interest to us to avoid any assumptions on how we would
> approach it.
>
> A question: Is the probability of a timeout directly linked to a longer
> seek time in reading through a partition's contents? If that is the case,
> splitting the partition keys into a separate table would be straightforward.
>
> Regards,
>
> Jason
>
> ------------------------------
> *From:* Jack Krupansky <ja...@gmail.com>
> *To:* user@cassandra.apache.org; Jason Kania <ja...@ymail.com>
> *Sent:* Friday, March 11, 2016 6:22 PM
>
> *Subject:* Re: Strategy for dividing wide rows beyond just adding to the
> partition key
>
> Thanks for the additional information, but there is still not enough color
> on the queries and too much focus on a premature data model.
>
> Is this 5000 readings for a single sensor of a single sensor unit, or for
> all sensors of a specified unit, or... both?
>
> I presume you want "next" and "previous" 5000 readings as well as first
> and last, but... you will have to confirm that.
>
> One technique is to store the bulk of your raw sensor data in a separate
> table and then simply store the PK of that data in your time series. That
> way you can have a much wider row of time series (number of rows) without
> hitting a bulk size issue for the partition. But... I don't want to jump to
> solutions until we have a firmer handle on the query side of the fence.
>
> -- Jack Krupansky
>
> On Fri, Mar 11, 2016 at 5:37 PM, Jason Kania <ja...@ymail.com>
> wrote:
>
> Jack,
>
> Thanks for the response.
>
> We are targeting our database design to 10000 sensor units and each sensor
> unit has 32 sensors. We are seeing about 700 events per day per sensor,
> each providing about 2K of data. Based on keeping each partition to about
> 10 Mb (based on readings we saw on performance), we chose to break our
> partitions on a weekly basis. This is possibly finer than we need as we
> were seeing timeouts only once a single partition was about 150Mb in size
>
> When pulling in data, we will typically need to pull 1 to 4 months of data
> for our analysis and will use only the sensorUnitId and sensorId to
> uniquely identify the data source with the timeShard value used to break up
> our partitions. We have handling to sequentially scan based on our
> "timeShard" value, but don't have a good handle on the determination of the
> "timeShard" portion of the partition key at read time. The data starts
> coming in when a subscriber starts using our system and finishes when they
> discontinue service or put the service on hold temporarily.
>
> When I talk about hotspots, it isn't the time series data that is the
> concern, it is with respect to storing the maximum and minimum timeShard
> values in another table for subsequent lookup or the cost of running the
> current implementation of SELECT DISTINCT. We need to run queries such as
> getting the first or last 5000 sensor readings when we don't know the time
> frame at which they occurred so cannot directly supply the timeShard
> portion of our partition key.
>
> I appreciate your input,
>
> Thanks,
>
> Jason
>
> ------------------------------
> *From:* Jack Krupansky <ja...@gmail.com>
> *To:* "user@cassandra.apache.org" <us...@cassandra.apache.org>
> *Sent:* Friday, March 11, 2016 4:45 PM
>
> *Subject:* Re: Strategy for dividing wide rows beyond just adding to the
> partition key
>
> I'll stay away from advising on a specific schema per se, but I'll stick
> to the advice that you need to make sure that your queries are depending
> solely on the columns of the primary key or relatively short slices/scans,
> rather than run the risk of very long scans or having to process multiple
> partitions for a single query. That's canned to some extent, but still
> essential.
>
> Of course we generally wish to avoid hotspots, but with time series they
> are unavoidable. I mean, sure you could place successive events at separate
> partitions, but then you can't do any kind of scanning/slicing.
>
> But, events for separate sensors are not true hotspots in the traditional
> sense - unless you have only a single sensor/unit.
>
> After considering your queries, the next step is to consider the
> cardinality of your data - how many sensors, how many units, rate of
> events, etc. That will feedback into queries as well, such as how big a
> slice or scan might be, as well as sizing of partitions.
>
> So, how many sensor units do you expect, how many sensors per unit, and
> expected rate of events per sensor?
>
> Try not to jump too quickly to specific solutions - there really is a
> method to understanding all of this other stuff upfront.
>
> -- Jack Krupansky
>
> On Thu, Mar 10, 2016 at 12:39 PM, Jason Kania <ja...@ymail.com>
> wrote:
>
> Jack,
>
> Thanks for the response. I don't think I provided enough information and
> used the wrong terminology as your response is more the canned advice is
> response to Cassandra antipatterns.
>
> To make this clearer, this is what we are doing:
>
> create table sensorReadings (
> sensorUnitId int,
> sensorId int,
> time timestamp,
> timeShard int,
> readings blob,
> primary key((sensorUnitId, sensorId, timeShard), time);
>
> where timeShard is a combination of year and week of year
>
> For known time range based queries, this works great. However, the
> specific problem is in knowing the maximum and minimum timeShard values
> when we want to select the entire range of data. Our understanding is that
> if we update another related table with the maximum and minimum timeShard
> value for a given sensorUnitId and sensorId combination, we will create a
> hotspot and lots of tombstones. If we SELECT DISTINCT, we get a huge list
> of partition keys for the table because we cannot reduce the scope with a
> where clause.
>
> If there is a recommended pattern that solves this, we haven't come across
> it.
>
> I hope makes the problem clearer.
>
> Thanks,
>
> Jason
>
> ------------------------------
> *From:* Jack Krupansky <ja...@gmail.com>
> *To:* user@cassandra.apache.org; Jason Kania <ja...@ymail.com>
> *Sent:* Thursday, March 10, 2016 10:42 AM
> *Subject:* Re: Strategy for dividing wide rows beyond just adding to the
> partition key
>
> There is an effort underway to support wider rows:
> https://issues.apache.org/jira/browse/CASSANDRA-9754
>
> This won't help you now though. Even with that improvement you still may
> need a more optimal data model since large-scale scanning/filtering is
> always a very bad idea with Cassandra.
>
> The data modeling methodology for Cassandra dictates that queries drive
> the data model and that each form of query requires a separate table
> ("query table.") Materialized view can automate that process for a lot of
> cases, but in any case it does sound as if some of your queries do require
> additional tables.
>
> As a general proposition, Cassandra should not be used for heavy filtering
> - query tables with the filtering criteria baked into the PK is the way to
> go.
>
>
> -- Jack Krupansky
>
> On Thu, Mar 10, 2016 at 8:54 AM, Jason Kania <ja...@ymail.com>
> wrote:
>
> Hi,
>
> We have sensor input that creates very wide rows and operations on these
> rows have started to timeout regulary. We have been trying to find a
> solution to dividing wide rows but keep hitting limitations that move the
> problem around instead of solving it.
>
> We have a partition key consisting of a sensorUnitId and a sensorId and
> use a time field to access each column in the row. We tried adding a time
> based entry, timeShardId, to the partition key that consists of the year
> and week of year during which the reading was taken. This works for a
> number of queries but for scanning all the readings against a particular
> sensorUnitId and sensorId combination, we seem to be stuck.
>
> We won't know the range of valid values of the timeShardId for a given
> sensorUnitId and sensorId combination so would have to write to an
> additional table to track the valid timeShardId. We suspect this would
> create tombstone accumulation problems given the number of updates required
> to the same row so haven't tried this option.
>
> Alternatively, we hit a different bottleneck in the form of SELECT
> DISTINCT in trying to directly access the partition keys. Since SELECT
> DISTINCT does not allow for a where clause to filter on the partition key
> values, we have to filter several hundred thousand partition keys just to
> find those related to the relevant sensorUnitId and sensorId. This problem
> will only grow worse for us.
>
> Are there any other approaches that can be suggested? We have been looking
> around, but haven't found any references beyond the initial suggestion to
> add some sort of shard id to the partition key to handle wide rows.
>
> Thanks,
>
> Jason
>
>
>
>
>
>
>
>
>
>
>

Re: Strategy for dividing wide rows beyond just adding to the partition key

Posted by Jason Kania <ja...@ymail.com>.

The 5000 readings mentioned would be against a single sensor on a single sensor unit.

The scope of the queries on this table is intended to be fairly simple. Here are some example queries, without 'sharding', that we would perform on this table:

SELECT "time","readings" FROM "sensorReadings"WHERE "sensorUnitId"=5123 AND "sensorId"=17 AND time<=?ORDER BY time DESC LIMIT 5000
SELECT "time","readings" FROM "sensorReadings"WHERE "sensorUnitId"=5123 AND "sensorId"=17 AND time>=?ORDER BY time LIMIT 5000
SELECT "time","readings" FROM "sensorReadings"WHERE "sensorUnitId"=5123 AND "sensorId"=17 AND time<=? AND classification=?
ORDER BY time DESC LIMIT 5000
where 'classification' is secondary index that we expect to add.

In some cases, we have to revisit all values too so a complete table scan is needed:
SELECT "time","readings" FROM "sensorReadings"
Getting the "next" and "previous" 5000 readings is also something we do, but is manageable from our standpoint as we can look at the range-end timestamps that are returned and use those in the subsequent queries.

SELECT "time","readings" FROM "sensorReadings"WHERE "sensorUnitId"=5123 AND "sensorId"=17 AND time>=? AND time<=?ORDER BY time LIMIT 5000
Splitting the bulk content out of the main table is something we considered too but we didn't find any detail on whether that would solve our timeout problem. If there is a reference for using this approach, it would be of interest to us to avoid any assumptions on how we would approach it.

A question: Is the probability of a timeout directly linked to a longer seek time in reading through a partition's contents? If that is the case, splitting the partition keys into a separate table would be straightforward.

Regards,
Jason

      From: Jack Krupansky <ja...@gmail.com>
 To: user@cassandra.apache.org; Jason Kania <ja...@ymail.com> 
 Sent: Friday, March 11, 2016 6:22 PM
 Subject: Re: Strategy for dividing wide rows beyond just adding to the partition key
   
Thanks for the additional information, but there is still not enough color on the queries and too much focus on a premature data model.
Is this 5000 readings for a single sensor of a single sensor unit, or for all sensors of a specified unit, or... both?
I presume you want "next" and "previous" 5000 readings as well as first and last, but... you will have to confirm that.
One technique is to store the bulk of your raw sensor data in a separate table and then simply store the PK of that data in your time series. That way you can have a much wider row of time series (number of rows) without hitting a bulk size issue for the partition. But... I don't want to jump to solutions until we have a firmer handle on the query side of the fence.
-- Jack Krupansky
On Fri, Mar 11, 2016 at 5:37 PM, Jason Kania <ja...@ymail.com> wrote:

Jack,
Thanks for the response.
We are targeting our database design to 10000 sensor units and each sensor unit has 32 sensors. We are seeing about 700 events per day per sensor, each providing about 2K of data. Based on keeping each partition to about 10 Mb (based on readings we saw on performance), we chose to break our partitions on a weekly basis. This is possibly finer than we need as we were seeing timeouts only once a single partition was about 150Mb in size

When pulling in data, we will typically need to pull 1 to 4 months of data for our analysis and will use only the sensorUnitId and sensorId to uniquely identify the data source with the timeShard value used to break up our partitions. We have handling to sequentially scan based on our "timeShard" value, but don't have a good handle on the determination of the "timeShard" portion of the partition key at read time. The data starts coming in when a subscriber starts using our system and finishes when they discontinue service or put the service on hold temporarily.

When I talk about hotspots, it isn't the time series data that is the concern, it is with respect to storing the maximum and minimum timeShard values in another table for subsequent lookup or the cost of running the current implementation of SELECT DISTINCT. We need to run queries such as getting the first or last 5000 sensor readings when we don't know the time frame at which they occurred so cannot directly supply the timeShard portion of our partition key.

I appreciate your input,
Thanks,
Jason

      From: Jack Krupansky <ja...@gmail.com>
 To: "user@cassandra.apache.org" <us...@cassandra.apache.org> 
 Sent: Friday, March 11, 2016 4:45 PM
 Subject: Re: Strategy for dividing wide rows beyond just adding to the partition key
   
I'll stay away from advising on a specific schema per se, but I'll stick to the advice that you need to make sure that your queries are depending solely on the columns of the primary key or relatively short slices/scans, rather than run the risk of very long scans or having to process multiple partitions for a single query. That's canned to some extent, but still essential.
Of course we generally wish to avoid hotspots, but with time series they are unavoidable. I mean, sure you could place successive events at separate partitions, but then you can't do any kind of scanning/slicing.
But, events for separate sensors are not true hotspots in the traditional sense - unless you have only a single sensor/unit.
After considering your queries, the next step is to consider the cardinality of your data - how many sensors, how many units, rate of events, etc. That will feedback into queries as well, such as how big a slice or scan might be, as well as sizing of partitions.
So, how many sensor units do you expect, how many sensors per unit, and expected rate of events per sensor?
Try not to jump too quickly to specific solutions - there really is a method to understanding all of this other stuff upfront.
-- Jack Krupansky
On Thu, Mar 10, 2016 at 12:39 PM, Jason Kania <ja...@ymail.com> wrote:

Jack,
Thanks for the response. I don't think I provided enough information and used the wrong terminology as your response is more the canned advice is response to Cassandra antipatterns.
To make this clearer, this is what we are doing:
create table sensorReadings (sensorUnitId int,
sensorId int,time timestamp,timeShard int,
readings blob,primary key((sensorUnitId, sensorId, timeShard), time);
where timeShard is a combination of year and week of year
For known time range based queries, this works great. However, the specific problem is in knowing the maximum and minimum timeShard values when we want to select the entire range of data. Our understanding is that if we update another related table with the maximum and minimum timeShard value for a given sensorUnitId and sensorId combination, we will create a hotspot and lots of tombstones. If we SELECT DISTINCT, we get a huge list of partition keys for the table because we cannot reduce the scope with a where clause.

If there is a recommended pattern that solves this, we haven't come across it.

I hope makes the problem clearer.
Thanks,
Jason

      From: Jack Krupansky <ja...@gmail.com>
 To: user@cassandra.apache.org; Jason Kania <ja...@ymail.com> 
 Sent: Thursday, March 10, 2016 10:42 AM
 Subject: Re: Strategy for dividing wide rows beyond just adding to the partition key
   
There is an effort underway to support wider rows:https://issues.apache.org/jira/browse/CASSANDRA-9754

This won't help you now though. Even with that improvement you still may need a more optimal data model since large-scale scanning/filtering is always a very bad idea with Cassandra.
The data modeling methodology for Cassandra dictates that queries drive the data model and that each form of query requires a separate table ("query table.") Materialized view can automate that process for a lot of cases, but in any case it does sound as if some of your queries do require additional tables.
As a general proposition, Cassandra should not be used for heavy filtering - query tables with the filtering criteria baked into the PK is the way to go.

-- Jack Krupansky
On Thu, Mar 10, 2016 at 8:54 AM, Jason Kania <ja...@ymail.com> wrote:

Hi,
We have sensor input that creates very wide rows and operations on these rows have started to timeout regulary. We have been trying to find a solution to dividing wide rows but keep hitting limitations that move the problem around instead of solving it.
We have a partition key consisting of a sensorUnitId and a sensorId and use a time field to access each column in the row. We tried adding a time based entry, timeShardId, to the partition key that consists of the year and week of year during which the reading was taken. This works for a number of queries but for scanning all the readings against a particular sensorUnitId and sensorId combination, we seem to be stuck.
We won't know the range of valid values of the timeShardId for a given sensorUnitId and sensorId combination so would have to write to an additional table to track the valid timeShardId. We suspect this would create tombstone accumulation problems given the number of updates required to the same row so haven't tried this option.

Alternatively, we hit a different bottleneck in the form of SELECT DISTINCT in trying to directly access the partition keys. Since SELECT DISTINCT does not allow for a where clause to filter on the partition key values, we have to filter several hundred thousand partition keys just to find those related to the relevant sensorUnitId and sensorId. This problem will only grow worse for us.

Are there any other approaches that can be suggested? We have been looking around, but haven't found any references beyond the initial suggestion to add some sort of shard id to the partition key to handle wide rows.
Thanks,
Jason

Re: Strategy for dividing wide rows beyond just adding to the partition key

Posted by Jason Kania <ja...@ymail.com>.

Hi Carlos,
Thanks for the suggestions.
We are having partition size issues and that was why we started to do custom sharding/partition division based on time. As you mentioned, we are having problems with identification. Its the identification of shard range that we need to understand and our data doesn't necessarily run until the current time. My worry with storing that last shard id in another table is that we would update the same row in that table all the time creating tombstones.
It is good to know that returning empty partitions is not that costly as that is a concern when we don't know where to start and end.
Thanks,
Jason

From: Carlos Alonso <in...@mrcalonso.com>
To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Sent: Friday, March 11, 2016 7:24 PM
Subject: Re: Strategy for dividing wide rows beyond just adding to the partition key

Hi Jason,
If I understand correctly you have no problems with the size of your partitions or transactional queries but with the 'identification' of them when having to do analytical queries.
I'd then suggest two options:1. Keep using Cassandra and store the first 'bucket' of each sensor in a separate table to use as the starting point of your full scan queries. Then issue async queries incrementing the bucket until today (logical end of the data). Cassandra is very efficient at returning empty partitions, so querying on empty buckets is normally fine.
2. Periodically offload your 'historic' data to another storage more appropriate for analytics (Parquet + S3) and query it using Spark.
Hope it helps
On Saturday, 12 March 2016, Jack Krupansky <ja...@gmail.com> wrote:

I appreciate your input,
Thanks,
Jason

If there is a recommended pattern that solves this, we haven't come across it.

I hope makes the problem clearer.
Thanks,
Jason

There is an effort underway to support wider rows:https://issues.apache.org/jira/browse/CASSANDRA-9754

-- Jack Krupansky
On Thu, Mar 10, 2016 at 8:54 AM, Jason Kania <ja...@ymail.com> wrote:

--
Carlos Alonso | Software Engineer | @calonso

Re: Strategy for dividing wide rows beyond just adding to the partition key

Posted by Carlos Alonso <in...@mrcalonso.com>.

Hi Jason,

If I understand correctly you have no problems with the size of your
partitions or transactional queries but with the 'identification' of them
when having to do analytical queries.

I'd then suggest two options:
1. Keep using Cassandra and store the first 'bucket' of each sensor in a
separate table to use as the starting point of your full scan queries. Then
issue async queries incrementing the bucket until today (logical end of the
data). Cassandra is very efficient at returning empty partitions, so
querying on empty buckets is normally fine.

2. Periodically offload your 'historic' data to another storage more
appropriate for analytics (Parquet + S3) and query it using Spark.

Hope it helps

On Saturday, 12 March 2016, Jack Krupansky <ja...@gmail.com> wrote:

> Thanks for the additional information, but there is still not enough color
> on the queries and too much focus on a premature data model.
>
> Is this 5000 readings for a single sensor of a single sensor unit, or for
> all sensors of a specified unit, or... both?
>
> I presume you want "next" and "previous" 5000 readings as well as first
> and last, but... you will have to confirm that.
>
> One technique is to store the bulk of your raw sensor data in a separate
> table and then simply store the PK of that data in your time series. That
> way you can have a much wider row of time series (number of rows) without
> hitting a bulk size issue for the partition. But... I don't want to jump to
> solutions until we have a firmer handle on the query side of the fence.
>
> -- Jack Krupansky
>
> On Fri, Mar 11, 2016 at 5:37 PM, Jason Kania <jason.kania@ymail.com
> <javascript:_e(%7B%7D,'cvml','jason.kania@ymail.com');>> wrote:
>
>> Jack,
>>
>> Thanks for the response.
>>
>> We are targeting our database design to 10000 sensor units and each
>> sensor unit has 32 sensors. We are seeing about 700 events per day per
>> sensor, each providing about 2K of data. Based on keeping each partition to
>> about 10 Mb (based on readings we saw on performance), we chose to break
>> our partitions on a weekly basis. This is possibly finer than we need as we
>> were seeing timeouts only once a single partition was about 150Mb in size
>>
>> When pulling in data, we will typically need to pull 1 to 4 months of
>> data for our analysis and will use only the sensorUnitId and sensorId to
>> uniquely identify the data source with the timeShard value used to break up
>> our partitions. We have handling to sequentially scan based on our
>> "timeShard" value, but don't have a good handle on the determination of the
>> "timeShard" portion of the partition key at read time. The data starts
>> coming in when a subscriber starts using our system and finishes when they
>> discontinue service or put the service on hold temporarily.
>>
>> When I talk about hotspots, it isn't the time series data that is the
>> concern, it is with respect to storing the maximum and minimum timeShard
>> values in another table for subsequent lookup or the cost of running the
>> current implementation of SELECT DISTINCT. We need to run queries such as
>> getting the first or last 5000 sensor readings when we don't know the time
>> frame at which they occurred so cannot directly supply the timeShard
>> portion of our partition key.
>>
>> I appreciate your input,
>>
>> Thanks,
>>
>> Jason
>>
>> ------------------------------
>> *From:* Jack Krupansky <jack.krupansky@gmail.com
>> <javascript:_e(%7B%7D,'cvml','jack.krupansky@gmail.com');>>
>> *To:* "user@cassandra.apache.org
>> <javascript:_e(%7B%7D,'cvml','user@cassandra.apache.org');>" <
>> user@cassandra.apache.org
>> <javascript:_e(%7B%7D,'cvml','user@cassandra.apache.org');>>
>> *Sent:* Friday, March 11, 2016 4:45 PM
>>
>> *Subject:* Re: Strategy for dividing wide rows beyond just adding to the
>> partition key
>>
>> I'll stay away from advising on a specific schema per se, but I'll stick
>> to the advice that you need to make sure that your queries are depending
>> solely on the columns of the primary key or relatively short slices/scans,
>> rather than run the risk of very long scans or having to process multiple
>> partitions for a single query. That's canned to some extent, but still
>> essential.
>>
>> Of course we generally wish to avoid hotspots, but with time series they
>> are unavoidable. I mean, sure you could place successive events at separate
>> partitions, but then you can't do any kind of scanning/slicing.
>>
>> But, events for separate sensors are not true hotspots in the traditional
>> sense - unless you have only a single sensor/unit.
>>
>> After considering your queries, the next step is to consider the
>> cardinality of your data - how many sensors, how many units, rate of
>> events, etc. That will feedback into queries as well, such as how big a
>> slice or scan might be, as well as sizing of partitions.
>>
>> So, how many sensor units do you expect, how many sensors per unit, and
>> expected rate of events per sensor?
>>
>> Try not to jump too quickly to specific solutions - there really is a
>> method to understanding all of this other stuff upfront.
>>
>> -- Jack Krupansky
>>
>> On Thu, Mar 10, 2016 at 12:39 PM, Jason Kania <jason.kania@ymail.com
>> <javascript:_e(%7B%7D,'cvml','jason.kania@ymail.com');>> wrote:
>>
>> Jack,
>>
>> Thanks for the response. I don't think I provided enough information and
>> used the wrong terminology as your response is more the canned advice is
>> response to Cassandra antipatterns.
>>
>> To make this clearer, this is what we are doing:
>>
>> create table sensorReadings (
>> sensorUnitId int,
>> sensorId int,
>> time timestamp,
>> timeShard int,
>> readings blob,
>> primary key((sensorUnitId, sensorId, timeShard), time);
>>
>> where timeShard is a combination of year and week of year
>>
>> For known time range based queries, this works great. However, the
>> specific problem is in knowing the maximum and minimum timeShard values
>> when we want to select the entire range of data. Our understanding is that
>> if we update another related table with the maximum and minimum timeShard
>> value for a given sensorUnitId and sensorId combination, we will create a
>> hotspot and lots of tombstones. If we SELECT DISTINCT, we get a huge list
>> of partition keys for the table because we cannot reduce the scope with a
>> where clause.
>>
>> If there is a recommended pattern that solves this, we haven't come
>> across it.
>>
>> I hope makes the problem clearer.
>>
>> Thanks,
>>
>> Jason
>>
>> ------------------------------
>> *From:* Jack Krupansky <jack.krupansky@gmail.com
>> <javascript:_e(%7B%7D,'cvml','jack.krupansky@gmail.com');>>
>> *To:* user@cassandra.apache.org
>> <javascript:_e(%7B%7D,'cvml','user@cassandra.apache.org');>; Jason Kania
>> <jason.kania@ymail.com
>> <javascript:_e(%7B%7D,'cvml','jason.kania@ymail.com');>>
>> *Sent:* Thursday, March 10, 2016 10:42 AM
>> *Subject:* Re: Strategy for dividing wide rows beyond just adding to the
>> partition key
>>
>> There is an effort underway to support wider rows:
>> https://issues.apache.org/jira/browse/CASSANDRA-9754
>>
>> This won't help you now though. Even with that improvement you still may
>> need a more optimal data model since large-scale scanning/filtering is
>> always a very bad idea with Cassandra.
>>
>> The data modeling methodology for Cassandra dictates that queries drive
>> the data model and that each form of query requires a separate table
>> ("query table.") Materialized view can automate that process for a lot of
>> cases, but in any case it does sound as if some of your queries do require
>> additional tables.
>>
>> As a general proposition, Cassandra should not be used for heavy
>> filtering - query tables with the filtering criteria baked into the PK is
>> the way to go.
>>
>>
>> -- Jack Krupansky
>>
>> On Thu, Mar 10, 2016 at 8:54 AM, Jason Kania <jason.kania@ymail.com
>> <javascript:_e(%7B%7D,'cvml','jason.kania@ymail.com');>> wrote:
>>
>> Hi,
>>
>> We have sensor input that creates very wide rows and operations on these
>> rows have started to timeout regulary. We have been trying to find a
>> solution to dividing wide rows but keep hitting limitations that move the
>> problem around instead of solving it.
>>
>> We have a partition key consisting of a sensorUnitId and a sensorId and
>> use a time field to access each column in the row. We tried adding a time
>> based entry, timeShardId, to the partition key that consists of the year
>> and week of year during which the reading was taken. This works for a
>> number of queries but for scanning all the readings against a particular
>> sensorUnitId and sensorId combination, we seem to be stuck.
>>
>> We won't know the range of valid values of the timeShardId for a given
>> sensorUnitId and sensorId combination so would have to write to an
>> additional table to track the valid timeShardId. We suspect this would
>> create tombstone accumulation problems given the number of updates required
>> to the same row so haven't tried this option.
>>
>> Alternatively, we hit a different bottleneck in the form of SELECT
>> DISTINCT in trying to directly access the partition keys. Since SELECT
>> DISTINCT does not allow for a where clause to filter on the partition key
>> values, we have to filter several hundred thousand partition keys just to
>> find those related to the relevant sensorUnitId and sensorId. This problem
>> will only grow worse for us.
>>
>> Are there any other approaches that can be suggested? We have been
>> looking around, but haven't found any references beyond the initial
>> suggestion to add some sort of shard id to the partition key to handle wide
>> rows.
>>
>> Thanks,
>>
>> Jason
>>
>>
>>
>>
>>
>>
>>
>>
>

-- 
Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>

Re: Strategy for dividing wide rows beyond just adding to the partition key

Posted by Jack Krupansky <ja...@gmail.com>.

Thanks for the additional information, but there is still not enough color
on the queries and too much focus on a premature data model.

Is this 5000 readings for a single sensor of a single sensor unit, or for
all sensors of a specified unit, or... both?

I presume you want "next" and "previous" 5000 readings as well as first and
last, but... you will have to confirm that.

One technique is to store the bulk of your raw sensor data in a separate
table and then simply store the PK of that data in your time series. That
way you can have a much wider row of time series (number of rows) without
hitting a bulk size issue for the partition. But... I don't want to jump to
solutions until we have a firmer handle on the query side of the fence.

-- Jack Krupansky

On Fri, Mar 11, 2016 at 5:37 PM, Jason Kania <ja...@ymail.com> wrote:

> Jack,
>
> Thanks for the response.
>
> We are targeting our database design to 10000 sensor units and each sensor
> unit has 32 sensors. We are seeing about 700 events per day per sensor,
> each providing about 2K of data. Based on keeping each partition to about
> 10 Mb (based on readings we saw on performance), we chose to break our
> partitions on a weekly basis. This is possibly finer than we need as we
> were seeing timeouts only once a single partition was about 150Mb in size
>
> When pulling in data, we will typically need to pull 1 to 4 months of data
> for our analysis and will use only the sensorUnitId and sensorId to
> uniquely identify the data source with the timeShard value used to break up
> our partitions. We have handling to sequentially scan based on our
> "timeShard" value, but don't have a good handle on the determination of the
> "timeShard" portion of the partition key at read time. The data starts
> coming in when a subscriber starts using our system and finishes when they
> discontinue service or put the service on hold temporarily.
>
> When I talk about hotspots, it isn't the time series data that is the
> concern, it is with respect to storing the maximum and minimum timeShard
> values in another table for subsequent lookup or the cost of running the
> current implementation of SELECT DISTINCT. We need to run queries such as
> getting the first or last 5000 sensor readings when we don't know the time
> frame at which they occurred so cannot directly supply the timeShard
> portion of our partition key.
>
> I appreciate your input,
>
> Thanks,
>
> Jason
>
> ------------------------------
> *From:* Jack Krupansky <ja...@gmail.com>
> *To:* "user@cassandra.apache.org" <us...@cassandra.apache.org>
> *Sent:* Friday, March 11, 2016 4:45 PM
>
> *Subject:* Re: Strategy for dividing wide rows beyond just adding to the
> partition key
>
> I'll stay away from advising on a specific schema per se, but I'll stick
> to the advice that you need to make sure that your queries are depending
> solely on the columns of the primary key or relatively short slices/scans,
> rather than run the risk of very long scans or having to process multiple
> partitions for a single query. That's canned to some extent, but still
> essential.
>
> Of course we generally wish to avoid hotspots, but with time series they
> are unavoidable. I mean, sure you could place successive events at separate
> partitions, but then you can't do any kind of scanning/slicing.
>
> But, events for separate sensors are not true hotspots in the traditional
> sense - unless you have only a single sensor/unit.
>
> After considering your queries, the next step is to consider the
> cardinality of your data - how many sensors, how many units, rate of
> events, etc. That will feedback into queries as well, such as how big a
> slice or scan might be, as well as sizing of partitions.
>
> So, how many sensor units do you expect, how many sensors per unit, and
> expected rate of events per sensor?
>
> Try not to jump too quickly to specific solutions - there really is a
> method to understanding all of this other stuff upfront.
>
> -- Jack Krupansky
>
> On Thu, Mar 10, 2016 at 12:39 PM, Jason Kania <ja...@ymail.com>
> wrote:
>
> Jack,
>
> Thanks for the response. I don't think I provided enough information and
> used the wrong terminology as your response is more the canned advice is
> response to Cassandra antipatterns.
>
> To make this clearer, this is what we are doing:
>
> create table sensorReadings (
> sensorUnitId int,
> sensorId int,
> time timestamp,
> timeShard int,
> readings blob,
> primary key((sensorUnitId, sensorId, timeShard), time);
>
> where timeShard is a combination of year and week of year
>
> For known time range based queries, this works great. However, the
> specific problem is in knowing the maximum and minimum timeShard values
> when we want to select the entire range of data. Our understanding is that
> if we update another related table with the maximum and minimum timeShard
> value for a given sensorUnitId and sensorId combination, we will create a
> hotspot and lots of tombstones. If we SELECT DISTINCT, we get a huge list
> of partition keys for the table because we cannot reduce the scope with a
> where clause.
>
> If there is a recommended pattern that solves this, we haven't come across
> it.
>
> I hope makes the problem clearer.
>
> Thanks,
>
> Jason
>
> ------------------------------
> *From:* Jack Krupansky <ja...@gmail.com>
> *To:* user@cassandra.apache.org; Jason Kania <ja...@ymail.com>
> *Sent:* Thursday, March 10, 2016 10:42 AM
> *Subject:* Re: Strategy for dividing wide rows beyond just adding to the
> partition key
>
> There is an effort underway to support wider rows:
> https://issues.apache.org/jira/browse/CASSANDRA-9754
>
> This won't help you now though. Even with that improvement you still may
> need a more optimal data model since large-scale scanning/filtering is
> always a very bad idea with Cassandra.
>
> The data modeling methodology for Cassandra dictates that queries drive
> the data model and that each form of query requires a separate table
> ("query table.") Materialized view can automate that process for a lot of
> cases, but in any case it does sound as if some of your queries do require
> additional tables.
>
> As a general proposition, Cassandra should not be used for heavy filtering
> - query tables with the filtering criteria baked into the PK is the way to
> go.
>
>
> -- Jack Krupansky
>
> On Thu, Mar 10, 2016 at 8:54 AM, Jason Kania <ja...@ymail.com>
> wrote:
>
> Hi,
>
> We have sensor input that creates very wide rows and operations on these
> rows have started to timeout regulary. We have been trying to find a
> solution to dividing wide rows but keep hitting limitations that move the
> problem around instead of solving it.
>
> We have a partition key consisting of a sensorUnitId and a sensorId and
> use a time field to access each column in the row. We tried adding a time
> based entry, timeShardId, to the partition key that consists of the year
> and week of year during which the reading was taken. This works for a
> number of queries but for scanning all the readings against a particular
> sensorUnitId and sensorId combination, we seem to be stuck.
>
> We won't know the range of valid values of the timeShardId for a given
> sensorUnitId and sensorId combination so would have to write to an
> additional table to track the valid timeShardId. We suspect this would
> create tombstone accumulation problems given the number of updates required
> to the same row so haven't tried this option.
>
> Alternatively, we hit a different bottleneck in the form of SELECT
> DISTINCT in trying to directly access the partition keys. Since SELECT
> DISTINCT does not allow for a where clause to filter on the partition key
> values, we have to filter several hundred thousand partition keys just to
> find those related to the relevant sensorUnitId and sensorId. This problem
> will only grow worse for us.
>
> Are there any other approaches that can be suggested? We have been looking
> around, but haven't found any references beyond the initial suggestion to
> add some sort of shard id to the partition key to handle wide rows.
>
> Thanks,
>
> Jason
>
>
>
>
>
>
>
>

Re: Strategy for dividing wide rows beyond just adding to the partition key

Posted by Jason Kania <ja...@ymail.com>.

I appreciate your input,
Thanks,
Jason

If there is a recommended pattern that solves this, we haven't come across it.

I hope makes the problem clearer.
Thanks,
Jason

There is an effort underway to support wider rows:https://issues.apache.org/jira/browse/CASSANDRA-9754

-- Jack Krupansky
On Thu, Mar 10, 2016 at 8:54 AM, Jason Kania <ja...@ymail.com> wrote:

Re: Strategy for dividing wide rows beyond just adding to the partition key

Posted by Jack Krupansky <ja...@gmail.com>.

I'll stay away from advising on a specific schema per se, but I'll stick to
the advice that you need to make sure that your queries are depending
solely on the columns of the primary key or relatively short slices/scans,
rather than run the risk of very long scans or having to process multiple
partitions for a single query. That's canned to some extent, but still
essential.

Of course we generally wish to avoid hotspots, but with time series they
are unavoidable. I mean, sure you could place successive events at separate
partitions, but then you can't do any kind of scanning/slicing.

But, events for separate sensors are not true hotspots in the traditional
sense - unless you have only a single sensor/unit.

After considering your queries, the next step is to consider the
cardinality of your data - how many sensors, how many units, rate of
events, etc. That will feedback into queries as well, such as how big a
slice or scan might be, as well as sizing of partitions.

So, how many sensor units do you expect, how many sensors per unit, and
expected rate of events per sensor?

Try not to jump too quickly to specific solutions - there really is a
method to understanding all of this other stuff upfront.

-- Jack Krupansky

On Thu, Mar 10, 2016 at 12:39 PM, Jason Kania <ja...@ymail.com> wrote:

> Jack,
>
> Thanks for the response. I don't think I provided enough information and
> used the wrong terminology as your response is more the canned advice is
> response to Cassandra antipatterns.
>
> To make this clearer, this is what we are doing:
>
> create table sensorReadings (
> sensorUnitId int,
> sensorId int,
> time timestamp,
> timeShard int,
> readings blob,
> primary key((sensorUnitId, sensorId, timeShard), time);
>
> where timeShard is a combination of year and week of year
>
> For known time range based queries, this works great. However, the
> specific problem is in knowing the maximum and minimum timeShard values
> when we want to select the entire range of data. Our understanding is that
> if we update another related table with the maximum and minimum timeShard
> value for a given sensorUnitId and sensorId combination, we will create a
> hotspot and lots of tombstones. If we SELECT DISTINCT, we get a huge list
> of partition keys for the table because we cannot reduce the scope with a
> where clause.
>
> If there is a recommended pattern that solves this, we haven't come across
> it.
>
> I hope makes the problem clearer.
>
> Thanks,
>
> Jason
>
> ------------------------------
> *From:* Jack Krupansky <ja...@gmail.com>
> *To:* user@cassandra.apache.org; Jason Kania <ja...@ymail.com>
> *Sent:* Thursday, March 10, 2016 10:42 AM
> *Subject:* Re: Strategy for dividing wide rows beyond just adding to the
> partition key
>
> There is an effort underway to support wider rows:
> https://issues.apache.org/jira/browse/CASSANDRA-9754
>
> This won't help you now though. Even with that improvement you still may
> need a more optimal data model since large-scale scanning/filtering is
> always a very bad idea with Cassandra.
>
> The data modeling methodology for Cassandra dictates that queries drive
> the data model and that each form of query requires a separate table
> ("query table.") Materialized view can automate that process for a lot of
> cases, but in any case it does sound as if some of your queries do require
> additional tables.
>
> As a general proposition, Cassandra should not be used for heavy filtering
> - query tables with the filtering criteria baked into the PK is the way to
> go.
>
>
> -- Jack Krupansky
>
> On Thu, Mar 10, 2016 at 8:54 AM, Jason Kania <ja...@ymail.com>
> wrote:
>
> Hi,
>
> We have sensor input that creates very wide rows and operations on these
> rows have started to timeout regulary. We have been trying to find a
> solution to dividing wide rows but keep hitting limitations that move the
> problem around instead of solving it.
>
> We have a partition key consisting of a sensorUnitId and a sensorId and
> use a time field to access each column in the row. We tried adding a time
> based entry, timeShardId, to the partition key that consists of the year
> and week of year during which the reading was taken. This works for a
> number of queries but for scanning all the readings against a particular
> sensorUnitId and sensorId combination, we seem to be stuck.
>
> We won't know the range of valid values of the timeShardId for a given
> sensorUnitId and sensorId combination so would have to write to an
> additional table to track the valid timeShardId. We suspect this would
> create tombstone accumulation problems given the number of updates required
> to the same row so haven't tried this option.
>
> Alternatively, we hit a different bottleneck in the form of SELECT
> DISTINCT in trying to directly access the partition keys. Since SELECT
> DISTINCT does not allow for a where clause to filter on the partition key
> values, we have to filter several hundred thousand partition keys just to
> find those related to the relevant sensorUnitId and sensorId. This problem
> will only grow worse for us.
>
> Are there any other approaches that can be suggested? We have been looking
> around, but haven't found any references beyond the initial suggestion to
> add some sort of shard id to the partition key to handle wide rows.
>
> Thanks,
>
> Jason
>
>
>
>
>

Re: Strategy for dividing wide rows beyond just adding to the partition key

Posted by Jason Kania <ja...@ymail.com>.

If there is a recommended pattern that solves this, we haven't come across it.

I hope makes the problem clearer.
Thanks,
Jason

There is an effort underway to support wider rows:https://issues.apache.org/jira/browse/CASSANDRA-9754

-- Jack Krupansky
On Thu, Mar 10, 2016 at 8:54 AM, Jason Kania <ja...@ymail.com> wrote:

Re: Strategy for dividing wide rows beyond just adding to the partition key

Posted by Jack Krupansky <ja...@gmail.com>.

There is an effort underway to support wider rows:
https://issues.apache.org/jira/browse/CASSANDRA-9754

This won't help you now though. Even with that improvement you still may
need a more optimal data model since large-scale scanning/filtering is
always a very bad idea with Cassandra.

The data modeling methodology for Cassandra dictates that queries drive the
data model and that each form of query requires a separate table ("query
table.") Materialized view can automate that process for a lot of cases,
but in any case it does sound as if some of your queries do require
additional tables.

As a general proposition, Cassandra should not be used for heavy filtering
- query tables with the filtering criteria baked into the PK is the way to
go.


-- Jack Krupansky

On Thu, Mar 10, 2016 at 8:54 AM, Jason Kania <ja...@ymail.com> wrote:

> Hi,
>
> We have sensor input that creates very wide rows and operations on these
> rows have started to timeout regulary. We have been trying to find a
> solution to dividing wide rows but keep hitting limitations that move the
> problem around instead of solving it.
>
> We have a partition key consisting of a sensorUnitId and a sensorId and
> use a time field to access each column in the row. We tried adding a time
> based entry, timeShardId, to the partition key that consists of the year
> and week of year during which the reading was taken. This works for a
> number of queries but for scanning all the readings against a particular
> sensorUnitId and sensorId combination, we seem to be stuck.
>
> We won't know the range of valid values of the timeShardId for a given
> sensorUnitId and sensorId combination so would have to write to an
> additional table to track the valid timeShardId. We suspect this would
> create tombstone accumulation problems given the number of updates required
> to the same row so haven't tried this option.
>
> Alternatively, we hit a different bottleneck in the form of SELECT
> DISTINCT in trying to directly access the partition keys. Since SELECT
> DISTINCT does not allow for a where clause to filter on the partition key
> values, we have to filter several hundred thousand partition keys just to
> find those related to the relevant sensorUnitId and sensorId. This problem
> will only grow worse for us.
>
> Are there any other approaches that can be suggested? We have been looking
> around, but haven't found any references beyond the initial suggestion to
> add some sort of shard id to the partition key to handle wide rows.
>
> Thanks,
>
> Jason
>

Re: Strategy for dividing wide rows beyond just adding to the partition key

Posted by Jonathan Haddad <jo...@jonhaddad.com>.

Oops sorry, you wrote below that the shard is what I was suggesting.  I
didn't fully understand the problem you had.  I'll think about it a little
bit and come up w/ something.

On Thu, Mar 10, 2016 at 9:47 AM Jonathan Haddad <jo...@jonhaddad.com> wrote:

> My advice was to use the date that the reading was recorded as part of the
> Partition key instead of some arbitrary shard id.  Then you don't have to
> look anything up in a different table.
>
>
>
> create table sensorReadings (
> sensorUnitId int,
> sensorId int,
> date_recorded date,
> time timestamp,
> timeShard int,
> readings blob,
> primary key((sensorUnitId, sensorId, date_recorded), time);
>
>
> On Thu, Mar 10, 2016 at 9:29 AM Jason Kania <ja...@ymail.com> wrote:
>
>> Hi Jonathan,
>>
>> Thanks for the response. To make this clearer, this is what we are doing:
>>
>> create table sensorReadings (
>> sensorUnitId int,
>> sensorId int,
>> time timestamp,
>> timeShard int,
>> readings blob,
>> primary key((sensorUnitId, sensorId, timeShard), time);
>>
>> where timeShard is a combination of year and week of year
>>
>> This works exactly as you mentioned when we know what time range we are
>> querying.
>>
>> The problem is that for those cases where we want to run through all the
>> readings for all timestamps, we don't know the first and last timeShard
>> value to use to constrain the query or iterate over each shard. Our
>> understanding is that updating another table with the maximum or minimum
>> timeShard values on every write to the above table would mean pounding a
>> single row with updates and running SELECT DISTINCT pulls all partition
>> keys.
>>
>> Hopefully this is clearer.
>>
>> Again, any suggestions would be appreciated.
>>
>> Thanks,
>>
>> Jason
>>
>> ------------------------------
>> *From:* Jonathan Haddad <jo...@jonhaddad.com>
>> *To:* user@cassandra.apache.org; Jason Kania <ja...@ymail.com>
>> *Sent:* Thursday, March 10, 2016 11:21 AM
>> *Subject:* Re: Strategy for dividing wide rows beyond just adding to the
>> partition key
>>
>> Have you considered making the date (or week, or whatever, some time
>> component) part of your partition key?
>>
>> something like:
>>
>> create table sensordata (
>> sensor_id int,
>> day date,
>> ts datetime,
>> reading int,
>> primary key((sensor_id, day), ts);
>>
>> Then if you know you need data by a particular date range, just issue
>> multiple async queries for each day you need.
>>
>> On Thu, Mar 10, 2016 at 5:57 AM Jason Kania <ja...@ymail.com>
>> wrote:
>>
>> Hi,
>>
>> We have sensor input that creates very wide rows and operations on these
>> rows have started to timeout regulary. We have been trying to find a
>> solution to dividing wide rows but keep hitting limitations that move the
>> problem around instead of solving it.
>>
>> We have a partition key consisting of a sensorUnitId and a sensorId and
>> use a time field to access each column in the row. We tried adding a time
>> based entry, timeShardId, to the partition key that consists of the year
>> and week of year during which the reading was taken. This works for a
>> number of queries but for scanning all the readings against a particular
>> sensorUnitId and sensorId combination, we seem to be stuck.
>>
>> We won't know the range of valid values of the timeShardId for a given
>> sensorUnitId and sensorId combination so would have to write to an
>> additional table to track the valid timeShardId. We suspect this would
>> create tombstone accumulation problems given the number of updates required
>> to the same row so haven't tried this option.
>>
>> Alternatively, we hit a different bottleneck in the form of SELECT
>> DISTINCT in trying to directly access the partition keys. Since SELECT
>> DISTINCT does not allow for a where clause to filter on the partition key
>> values, we have to filter several hundred thousand partition keys just to
>> find those related to the relevant sensorUnitId and sensorId. This problem
>> will only grow worse for us.
>>
>> Are there any other approaches that can be suggested? We have been
>> looking around, but haven't found any references beyond the initial
>> suggestion to add some sort of shard id to the partition key to handle wide
>> rows.
>>
>> Thanks,
>>
>> Jason
>>
>>
>>
>>

Re: Strategy for dividing wide rows beyond just adding to the partition key

Posted by Jonathan Haddad <jo...@jonhaddad.com>.

My advice was to use the date that the reading was recorded as part of the
Partition key instead of some arbitrary shard id.  Then you don't have to
look anything up in a different table.


create table sensorReadings (
sensorUnitId int,
sensorId int,
date_recorded date,
time timestamp,
timeShard int,
readings blob,
primary key((sensorUnitId, sensorId, date_recorded), time);


On Thu, Mar 10, 2016 at 9:29 AM Jason Kania <ja...@ymail.com> wrote:

> Hi Jonathan,
>
> Thanks for the response. To make this clearer, this is what we are doing:
>
> create table sensorReadings (
> sensorUnitId int,
> sensorId int,
> time timestamp,
> timeShard int,
> readings blob,
> primary key((sensorUnitId, sensorId, timeShard), time);
>
> where timeShard is a combination of year and week of year
>
> This works exactly as you mentioned when we know what time range we are
> querying.
>
> The problem is that for those cases where we want to run through all the
> readings for all timestamps, we don't know the first and last timeShard
> value to use to constrain the query or iterate over each shard. Our
> understanding is that updating another table with the maximum or minimum
> timeShard values on every write to the above table would mean pounding a
> single row with updates and running SELECT DISTINCT pulls all partition
> keys.
>
> Hopefully this is clearer.
>
> Again, any suggestions would be appreciated.
>
> Thanks,
>
> Jason
>
> ------------------------------
> *From:* Jonathan Haddad <jo...@jonhaddad.com>
> *To:* user@cassandra.apache.org; Jason Kania <ja...@ymail.com>
> *Sent:* Thursday, March 10, 2016 11:21 AM
> *Subject:* Re: Strategy for dividing wide rows beyond just adding to the
> partition key
>
> Have you considered making the date (or week, or whatever, some time
> component) part of your partition key?
>
> something like:
>
> create table sensordata (
> sensor_id int,
> day date,
> ts datetime,
> reading int,
> primary key((sensor_id, day), ts);
>
> Then if you know you need data by a particular date range, just issue
> multiple async queries for each day you need.
>
> On Thu, Mar 10, 2016 at 5:57 AM Jason Kania <ja...@ymail.com> wrote:
>
> Hi,
>
> We have sensor input that creates very wide rows and operations on these
> rows have started to timeout regulary. We have been trying to find a
> solution to dividing wide rows but keep hitting limitations that move the
> problem around instead of solving it.
>
> We have a partition key consisting of a sensorUnitId and a sensorId and
> use a time field to access each column in the row. We tried adding a time
> based entry, timeShardId, to the partition key that consists of the year
> and week of year during which the reading was taken. This works for a
> number of queries but for scanning all the readings against a particular
> sensorUnitId and sensorId combination, we seem to be stuck.
>
> We won't know the range of valid values of the timeShardId for a given
> sensorUnitId and sensorId combination so would have to write to an
> additional table to track the valid timeShardId. We suspect this would
> create tombstone accumulation problems given the number of updates required
> to the same row so haven't tried this option.
>
> Alternatively, we hit a different bottleneck in the form of SELECT
> DISTINCT in trying to directly access the partition keys. Since SELECT
> DISTINCT does not allow for a where clause to filter on the partition key
> values, we have to filter several hundred thousand partition keys just to
> find those related to the relevant sensorUnitId and sensorId. This problem
> will only grow worse for us.
>
> Are there any other approaches that can be suggested? We have been looking
> around, but haven't found any references beyond the initial suggestion to
> add some sort of shard id to the partition key to handle wide rows.
>
> Thanks,
>
> Jason
>
>
>
>

Re: Strategy for dividing wide rows beyond just adding to the partition key

Posted by Jason Kania <ja...@ymail.com>.

Hi Jonathan,

Thanks for the response. To make this clearer, this is what we are doing:
create table sensorReadings (sensorUnitId int,
sensorId int,time timestamp,timeShard int,
readings blob,primary key((sensorUnitId, sensorId, timeShard), time);
where timeShard is a combination of year and week of year
This works exactly as you mentioned when we know what time range we are querying.

The problem is that for those cases where we want to run through all the readings for all timestamps, we don't know the first and last timeShard value to use to constrain the query or iterate over each shard. Our understanding is that updating another table with the maximum or minimum timeShard values on every write to the above table would mean pounding a single row with updates and running SELECT DISTINCT pulls all partition keys.

Hopefully this is clearer.
Again, any suggestions would be appreciated.

Thanks,
Jason

From: Jonathan Haddad <jo...@jonhaddad.com>
To: user@cassandra.apache.org; Jason Kania <ja...@ymail.com>
Sent: Thursday, March 10, 2016 11:21 AM
Subject: Re: Strategy for dividing wide rows beyond just adding to the partition key

Have you considered making the date (or week, or whatever, some time component) part of your partition key?
something like:
create table sensordata (sensor_id int,day date,ts datetime,reading int,primary key((sensor_id, day), ts);
Then if you know you need data by a particular date range, just issue multiple async queries for each day you need.
On Thu, Mar 10, 2016 at 5:57 AM Jason Kania <ja...@ymail.com> wrote:

Re: Strategy for dividing wide rows beyond just adding to the partition key

Posted by Jonathan Haddad <jo...@jonhaddad.com>.

Have you considered making the date (or week, or whatever, some time
component) part of your partition key?

something like:

create table sensordata (
sensor_id int,
day date,
ts datetime,
reading int,
primary key((sensor_id, day), ts);

Then if you know you need data by a particular date range, just issue
multiple async queries for each day you need.

On Thu, Mar 10, 2016 at 5:57 AM Jason Kania <ja...@ymail.com> wrote:

> Hi,
>
> We have sensor input that creates very wide rows and operations on these
> rows have started to timeout regulary. We have been trying to find a
> solution to dividing wide rows but keep hitting limitations that move the
> problem around instead of solving it.
>
> We have a partition key consisting of a sensorUnitId and a sensorId and
> use a time field to access each column in the row. We tried adding a time
> based entry, timeShardId, to the partition key that consists of the year
> and week of year during which the reading was taken. This works for a
> number of queries but for scanning all the readings against a particular
> sensorUnitId and sensorId combination, we seem to be stuck.
>
> We won't know the range of valid values of the timeShardId for a given
> sensorUnitId and sensorId combination so would have to write to an
> additional table to track the valid timeShardId. We suspect this would
> create tombstone accumulation problems given the number of updates required
> to the same row so haven't tried this option.
>
> Alternatively, we hit a different bottleneck in the form of SELECT
> DISTINCT in trying to directly access the partition keys. Since SELECT
> DISTINCT does not allow for a where clause to filter on the partition key
> values, we have to filter several hundred thousand partition keys just to
> find those related to the relevant sensorUnitId and sensorId. This problem
> will only grow worse for us.
>
> Are there any other approaches that can be suggested? We have been looking
> around, but haven't found any references beyond the initial suggestion to
> add some sort of shard id to the partition key to handle wide rows.
>
> Thanks,
>
> Jason
>