You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cassandra.apache.org by Jeff Jirsa <jj...@gmail.com> on 2019/02/01 01:50:34 UTC

Re: SSTable exclusion from read path based on sstable metadata marked by custom compaction strategies

In my original TWCS talk a few years back, I suggested that people make the partitions match the time window to avoid exactly what you’re describing. I added that to the talk because my first team that used TWCS (the team for which I built TWCS) had a data model not unlike yours, and the read-every-sstable thing turns out not to work that well if you have lots of windows (or very large partitions). If you do this, you can fan out a bunch of async reads for the first few days and ask for more as you need to fill the page - this means the reads are more distributed, too, which is an extra bonus when you have noisy partitions.

In 3.0 and newer (I think, don’t quote me in the specific version), the sstable metadata has the min and max clustering which helps exclude sstables from the read path quite well if everything in the table is using timestamp clustering columns. I know there was some issue with this and RTs recently, so I’m not sure if it’s current state, but worth considering that this may be much better on 3.0+



-- 
Jeff Jirsa


> On Jan 31, 2019, at 1:56 PM, Carl Mueller <ca...@smartthings.com.invalid> wrote:
> 
> Situation:
> 
> We use TWCS for a task history table (partition is user, column key is
> timeuuid of task, TWCS is used due to tombstone TTLs that rotate out the
> tasks every say month. )
> 
> However, if we want to get a "slice" of tasks (say, tasks in the last two
> days and we are using TWCS sstable blocks of 12 hours).
> 
> The problem is, this is a frequent user and they have tasks in ALL the
> sstables that are organized by the TWCS into time-bucketed sstables.
> 
> So Cassandra has to first read in, say 80 sstables to reconstruct the row,
> THEN it can exclude/slice on the column key.
> 
> Question:
> 
> Or am I wrong that the read path needs to grab all relevant sstables before
> applying column key slicing and this is possible? Admittedly we are in 2.1
> for this table (we in the process of upgrading now that we have an
> automated upgrading program that seems to work pretty well)
> 
> If my assumption is correct, then the compaction strategy knows as it
> writes the sstables what it is bucketing them as (and could encode in
> sstable metadata?). If my assumption about slicing is that the whole row
> needs reconstruction, if we had a perfect infinite monkey coding team that
> could generate whatever we wanted within some feasibility, could we provide
> special hooks to do sstable exclusion based on metadata if we know that
> that the metadata will indicate exclusion/inclusion of columns based on
> metadata?
> 
> Goal:
> 
> The overall goal would be to support exclusion of sstables from a read
> path, in case we had compaction strategies hand-tailored for other queries.
> Essentially we would be doing a first-pass bucketsort exclusion with the
> sstable metadata marking the buckets. This might aid support of superwide
> rows and paging through column keys if we allowed the table creator to
> specify bucketing as flushing occurs. In general it appears query
> performance quickly degrades based on # sstables required for a lookup.
> 
> I still don't know the code nearly well enough to do patches, it would seem
> based on my looking at custom compaction strategies and the basic read path
> that this would be a useful extension for advanced users.
> 
> The fallback would be a set of tables to serve as buckets and we span the
> buckets with queries when one bucket runs out. The tables rotate.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: SSTable exclusion from read path based on sstable metadata marked by custom compaction strategies

Posted by Jeff Jirsa <jj...@gmail.com>.

FWIW you can skip 2.2 and go 2.1 -> 3.11. I would wait for 3.11.4 though.



On Fri, Feb 1, 2019 at 12:53 PM Carl Mueller
<ca...@smartthings.com.invalid> wrote:

> Interesting. Now that we have semiautomated upgrades, we are going to
> hopefully get everything to 3.11X once we get the intermediate hop to 2.2.
>
> I'm thinking we could also use sstable metadata markings + custom
> compactors for things like multiple customers on the same table. So you
> could sequester the data for a customer in their own sstables and then
> queries could effectively be subdivided against only the sstables that had
> that customer. Maybe the min and max would cover that, I'd have to look at
> the details.
>
> On Thu, Jan 31, 2019 at 8:11 PM Jonathan Haddad <jo...@jonhaddad.com> wrote:
>
> > In addition to what Jeff mentioned, there was an optimization in 3.4 that
> > can significantly reduce the number of sstables accessed when a LIMIT
> > clause was used.  This can be a pretty big win with TWCS.
> >
> >
> >
> http://thelastpickle.com/blog/2017/03/07/The-limit-clause-in-cassandra-might-not-work-as-you-think.html
> >
> > On Thu, Jan 31, 2019 at 5:50 PM Jeff Jirsa <jj...@gmail.com> wrote:
> >
> > > In my original TWCS talk a few years back, I suggested that people make
> > > the partitions match the time window to avoid exactly what you’re
> > > describing. I added that to the talk because my first team that used
> TWCS
> > > (the team for which I built TWCS) had a data model not unlike yours,
> and
> > > the read-every-sstable thing turns out not to work that well if you
> have
> > > lots of windows (or very large partitions). If you do this, you can fan
> > out
> > > a bunch of async reads for the first few days and ask for more as you
> > need
> > > to fill the page - this means the reads are more distributed, too,
> which
> > is
> > > an extra bonus when you have noisy partitions.
> > >
> > > In 3.0 and newer (I think, don’t quote me in the specific version), the
> > > sstable metadata has the min and max clustering which helps exclude
> > > sstables from the read path quite well if everything in the table is
> > using
> > > timestamp clustering columns. I know there was some issue with this and
> > RTs
> > > recently, so I’m not sure if it’s current state, but worth considering
> > that
> > > this may be much better on 3.0+
> > >
> > >
> > >
> > > --
> > > Jeff Jirsa
> > >
> > >
> > > > On Jan 31, 2019, at 1:56 PM, Carl Mueller <
> > carl.mueller@smartthings.com.invalid>
> > > wrote:
> > > >
> > > > Situation:
> > > >
> > > > We use TWCS for a task history table (partition is user, column key
> is
> > > > timeuuid of task, TWCS is used due to tombstone TTLs that rotate out
> > the
> > > > tasks every say month. )
> > > >
> > > > However, if we want to get a "slice" of tasks (say, tasks in the last
> > two
> > > > days and we are using TWCS sstable blocks of 12 hours).
> > > >
> > > > The problem is, this is a frequent user and they have tasks in ALL
> the
> > > > sstables that are organized by the TWCS into time-bucketed sstables.
> > > >
> > > > So Cassandra has to first read in, say 80 sstables to reconstruct the
> > > row,
> > > > THEN it can exclude/slice on the column key.
> > > >
> > > > Question:
> > > >
> > > > Or am I wrong that the read path needs to grab all relevant sstables
> > > before
> > > > applying column key slicing and this is possible? Admittedly we are
> in
> > > 2.1
> > > > for this table (we in the process of upgrading now that we have an
> > > > automated upgrading program that seems to work pretty well)
> > > >
> > > > If my assumption is correct, then the compaction strategy knows as it
> > > > writes the sstables what it is bucketing them as (and could encode in
> > > > sstable metadata?). If my assumption about slicing is that the whole
> > row
> > > > needs reconstruction, if we had a perfect infinite monkey coding team
> > > that
> > > > could generate whatever we wanted within some feasibility, could we
> > > provide
> > > > special hooks to do sstable exclusion based on metadata if we know
> that
> > > > that the metadata will indicate exclusion/inclusion of columns based
> on
> > > > metadata?
> > > >
> > > > Goal:
> > > >
> > > > The overall goal would be to support exclusion of sstables from a
> read
> > > > path, in case we had compaction strategies hand-tailored for other
> > > queries.
> > > > Essentially we would be doing a first-pass bucketsort exclusion with
> > the
> > > > sstable metadata marking the buckets. This might aid support of
> > superwide
> > > > rows and paging through column keys if we allowed the table creator
> to
> > > > specify bucketing as flushing occurs. In general it appears query
> > > > performance quickly degrades based on # sstables required for a
> lookup.
> > > >
> > > > I still don't know the code nearly well enough to do patches, it
> would
> > > seem
> > > > based on my looking at custom compaction strategies and the basic
> read
> > > path
> > > > that this would be a useful extension for advanced users.
> > > >
> > > > The fallback would be a set of tables to serve as buckets and we span
> > the
> > > > buckets with queries when one bucket runs out. The tables rotate.
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > For additional commands, e-mail: dev-help@cassandra.apache.org
> > >
> > >
> >
> > --
> > Jon Haddad
> > http://www.rustyrazorblade.com
> > twitter: rustyrazorblade
> >
>

Re: SSTable exclusion from read path based on sstable metadata marked by custom compaction strategies

Posted by Jeff Jirsa <jj...@gmail.com>.

Iterate over all of the possible time buckets.


On Fri, Feb 1, 2019 at 1:36 PM Carl Mueller
<ca...@smartthings.com.invalid> wrote:

> I'd still need a "all events for app_id" query. We have seconds-level
> events :-(
>
>
> On Fri, Feb 1, 2019 at 3:02 PM Jeff Jirsa <jj...@gmail.com> wrote:
>
> > On Fri, Feb 1, 2019 at 12:58 PM Carl Mueller
> > <ca...@smartthings.com.invalid> wrote:
> >
> > > Jeff: so the partition key with timestamp would then need a separate
> > index
> > > table to track the appid->partition keys. Which isn't horrible, but
> also
> > > tracks into another desire of mine: some way to make the replica
> mapping
> > > match locally between the index table and the data table:
> > >
> > > So in the composite partition key for the TWCS table, you'd have
> app_id +
> > > timestamp, BUT ONLY THE app_id GENERATES the hash/key.
> > >
> > >
> > Huh? No, you'd have a composite partition key of app_id + timestamp
> > ROUNDED/CEIL/FLOOR to some time window, and both would be used for
> > hash/key.
> >
> > And you dont need any extra table, because app_id is known and the
> > timestamp can be calculated (e.g., 4 digits of year + 3 digits for day of
> > year makes today 2019032 )
> >
> >
> >
> > > Thus it would match with the index table that is just partition key
> > app_id,
> > > column key timestamp.
> > >
> > > And then theoretically a node-local "join" could be done without an
> > > additional query hop, and batched updates would be more easily atomic
> to
> > a
> > > single node.
> > >
> > > Now how we would communicate all that in CQL/etc: who knows. Hm. Maybe
> > > materialized views cover this, but I haven't tracked that since we
> don't
> > > have versions that support them and they got "deprecated".
> > >
> > >
> > > On Fri, Feb 1, 2019 at 2:53 PM Carl Mueller <
> > carl.mueller@smartthings.com>
> > > wrote:
> > >
> > > > Interesting. Now that we have semiautomated upgrades, we are going to
> > > > hopefully get everything to 3.11X once we get the intermediate hop to
> > > 2.2.
> > > >
> > > > I'm thinking we could also use sstable metadata markings + custom
> > > > compactors for things like multiple customers on the same table. So
> you
> > > > could sequester the data for a customer in their own sstables and
> then
> > > > queries could effectively be subdivided against only the sstables
> that
> > > had
> > > > that customer. Maybe the min and max would cover that, I'd have to
> look
> > > at
> > > > the details.
> > > >
> > > > On Thu, Jan 31, 2019 at 8:11 PM Jonathan Haddad <jo...@jonhaddad.com>
> > > wrote:
> > > >
> > > >> In addition to what Jeff mentioned, there was an optimization in 3.4
> > > that
> > > >> can significantly reduce the number of sstables accessed when a
> LIMIT
> > > >> clause was used.  This can be a pretty big win with TWCS.
> > > >>
> > > >>
> > > >>
> > >
> >
> http://thelastpickle.com/blog/2017/03/07/The-limit-clause-in-cassandra-might-not-work-as-you-think.html
> > > >>
> > > >> On Thu, Jan 31, 2019 at 5:50 PM Jeff Jirsa <jj...@gmail.com>
> wrote:
> > > >>
> > > >> > In my original TWCS talk a few years back, I suggested that people
> > > make
> > > >> > the partitions match the time window to avoid exactly what you’re
> > > >> > describing. I added that to the talk because my first team that
> used
> > > >> TWCS
> > > >> > (the team for which I built TWCS) had a data model not unlike
> yours,
> > > and
> > > >> > the read-every-sstable thing turns out not to work that well if
> you
> > > have
> > > >> > lots of windows (or very large partitions). If you do this, you
> can
> > > fan
> > > >> out
> > > >> > a bunch of async reads for the first few days and ask for more as
> > you
> > > >> need
> > > >> > to fill the page - this means the reads are more distributed, too,
> > > >> which is
> > > >> > an extra bonus when you have noisy partitions.
> > > >> >
> > > >> > In 3.0 and newer (I think, don’t quote me in the specific
> version),
> > > the
> > > >> > sstable metadata has the min and max clustering which helps
> exclude
> > > >> > sstables from the read path quite well if everything in the table
> is
> > > >> using
> > > >> > timestamp clustering columns. I know there was some issue with
> this
> > > and
> > > >> RTs
> > > >> > recently, so I’m not sure if it’s current state, but worth
> > considering
> > > >> that
> > > >> > this may be much better on 3.0+
> > > >> >
> > > >> >
> > > >> >
> > > >> > --
> > > >> > Jeff Jirsa
> > > >> >
> > > >> >
> > > >> > > On Jan 31, 2019, at 1:56 PM, Carl Mueller <
> > > >> carl.mueller@smartthings.com.invalid>
> > > >> > wrote:
> > > >> > >
> > > >> > > Situation:
> > > >> > >
> > > >> > > We use TWCS for a task history table (partition is user, column
> > key
> > > is
> > > >> > > timeuuid of task, TWCS is used due to tombstone TTLs that rotate
> > out
> > > >> the
> > > >> > > tasks every say month. )
> > > >> > >
> > > >> > > However, if we want to get a "slice" of tasks (say, tasks in the
> > > last
> > > >> two
> > > >> > > days and we are using TWCS sstable blocks of 12 hours).
> > > >> > >
> > > >> > > The problem is, this is a frequent user and they have tasks in
> ALL
> > > the
> > > >> > > sstables that are organized by the TWCS into time-bucketed
> > sstables.
> > > >> > >
> > > >> > > So Cassandra has to first read in, say 80 sstables to
> reconstruct
> > > the
> > > >> > row,
> > > >> > > THEN it can exclude/slice on the column key.
> > > >> > >
> > > >> > > Question:
> > > >> > >
> > > >> > > Or am I wrong that the read path needs to grab all relevant
> > sstables
> > > >> > before
> > > >> > > applying column key slicing and this is possible? Admittedly we
> > are
> > > in
> > > >> > 2.1
> > > >> > > for this table (we in the process of upgrading now that we have
> an
> > > >> > > automated upgrading program that seems to work pretty well)
> > > >> > >
> > > >> > > If my assumption is correct, then the compaction strategy knows
> as
> > > it
> > > >> > > writes the sstables what it is bucketing them as (and could
> encode
> > > in
> > > >> > > sstable metadata?). If my assumption about slicing is that the
> > whole
> > > >> row
> > > >> > > needs reconstruction, if we had a perfect infinite monkey coding
> > > team
> > > >> > that
> > > >> > > could generate whatever we wanted within some feasibility, could
> > we
> > > >> > provide
> > > >> > > special hooks to do sstable exclusion based on metadata if we
> know
> > > >> that
> > > >> > > that the metadata will indicate exclusion/inclusion of columns
> > based
> > > >> on
> > > >> > > metadata?
> > > >> > >
> > > >> > > Goal:
> > > >> > >
> > > >> > > The overall goal would be to support exclusion of sstables from
> a
> > > read
> > > >> > > path, in case we had compaction strategies hand-tailored for
> other
> > > >> > queries.
> > > >> > > Essentially we would be doing a first-pass bucketsort exclusion
> > with
> > > >> the
> > > >> > > sstable metadata marking the buckets. This might aid support of
> > > >> superwide
> > > >> > > rows and paging through column keys if we allowed the table
> > creator
> > > to
> > > >> > > specify bucketing as flushing occurs. In general it appears
> query
> > > >> > > performance quickly degrades based on # sstables required for a
> > > >> lookup.
> > > >> > >
> > > >> > > I still don't know the code nearly well enough to do patches, it
> > > would
> > > >> > seem
> > > >> > > based on my looking at custom compaction strategies and the
> basic
> > > read
> > > >> > path
> > > >> > > that this would be a useful extension for advanced users.
> > > >> > >
> > > >> > > The fallback would be a set of tables to serve as buckets and we
> > > span
> > > >> the
> > > >> > > buckets with queries when one bucket runs out. The tables
> rotate.
> > > >> >
> > > >> >
> > ---------------------------------------------------------------------
> > > >> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > >> > For additional commands, e-mail: dev-help@cassandra.apache.org
> > > >> >
> > > >> >
> > > >>
> > > >> --
> > > >> Jon Haddad
> > > >> http://www.rustyrazorblade.com
> > > >> twitter: rustyrazorblade
> > > >>
> > > >
> > >
> >
>

Re: SSTable exclusion from read path based on sstable metadata marked by custom compaction strategies

Posted by Carl Mueller <ca...@smartthings.com.INVALID>.

I'd still need a "all events for app_id" query. We have seconds-level
events :-(


On Fri, Feb 1, 2019 at 3:02 PM Jeff Jirsa <jj...@gmail.com> wrote:

> On Fri, Feb 1, 2019 at 12:58 PM Carl Mueller
> <ca...@smartthings.com.invalid> wrote:
>
> > Jeff: so the partition key with timestamp would then need a separate
> index
> > table to track the appid->partition keys. Which isn't horrible, but also
> > tracks into another desire of mine: some way to make the replica mapping
> > match locally between the index table and the data table:
> >
> > So in the composite partition key for the TWCS table, you'd have app_id +
> > timestamp, BUT ONLY THE app_id GENERATES the hash/key.
> >
> >
> Huh? No, you'd have a composite partition key of app_id + timestamp
> ROUNDED/CEIL/FLOOR to some time window, and both would be used for
> hash/key.
>
> And you dont need any extra table, because app_id is known and the
> timestamp can be calculated (e.g., 4 digits of year + 3 digits for day of
> year makes today 2019032 )
>
>
>
> > Thus it would match with the index table that is just partition key
> app_id,
> > column key timestamp.
> >
> > And then theoretically a node-local "join" could be done without an
> > additional query hop, and batched updates would be more easily atomic to
> a
> > single node.
> >
> > Now how we would communicate all that in CQL/etc: who knows. Hm. Maybe
> > materialized views cover this, but I haven't tracked that since we don't
> > have versions that support them and they got "deprecated".
> >
> >
> > On Fri, Feb 1, 2019 at 2:53 PM Carl Mueller <
> carl.mueller@smartthings.com>
> > wrote:
> >
> > > Interesting. Now that we have semiautomated upgrades, we are going to
> > > hopefully get everything to 3.11X once we get the intermediate hop to
> > 2.2.
> > >
> > > I'm thinking we could also use sstable metadata markings + custom
> > > compactors for things like multiple customers on the same table. So you
> > > could sequester the data for a customer in their own sstables and then
> > > queries could effectively be subdivided against only the sstables that
> > had
> > > that customer. Maybe the min and max would cover that, I'd have to look
> > at
> > > the details.
> > >
> > > On Thu, Jan 31, 2019 at 8:11 PM Jonathan Haddad <jo...@jonhaddad.com>
> > wrote:
> > >
> > >> In addition to what Jeff mentioned, there was an optimization in 3.4
> > that
> > >> can significantly reduce the number of sstables accessed when a LIMIT
> > >> clause was used.  This can be a pretty big win with TWCS.
> > >>
> > >>
> > >>
> >
> http://thelastpickle.com/blog/2017/03/07/The-limit-clause-in-cassandra-might-not-work-as-you-think.html
> > >>
> > >> On Thu, Jan 31, 2019 at 5:50 PM Jeff Jirsa <jj...@gmail.com> wrote:
> > >>
> > >> > In my original TWCS talk a few years back, I suggested that people
> > make
> > >> > the partitions match the time window to avoid exactly what you’re
> > >> > describing. I added that to the talk because my first team that used
> > >> TWCS
> > >> > (the team for which I built TWCS) had a data model not unlike yours,
> > and
> > >> > the read-every-sstable thing turns out not to work that well if you
> > have
> > >> > lots of windows (or very large partitions). If you do this, you can
> > fan
> > >> out
> > >> > a bunch of async reads for the first few days and ask for more as
> you
> > >> need
> > >> > to fill the page - this means the reads are more distributed, too,
> > >> which is
> > >> > an extra bonus when you have noisy partitions.
> > >> >
> > >> > In 3.0 and newer (I think, don’t quote me in the specific version),
> > the
> > >> > sstable metadata has the min and max clustering which helps exclude
> > >> > sstables from the read path quite well if everything in the table is
> > >> using
> > >> > timestamp clustering columns. I know there was some issue with this
> > and
> > >> RTs
> > >> > recently, so I’m not sure if it’s current state, but worth
> considering
> > >> that
> > >> > this may be much better on 3.0+
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > Jeff Jirsa
> > >> >
> > >> >
> > >> > > On Jan 31, 2019, at 1:56 PM, Carl Mueller <
> > >> carl.mueller@smartthings.com.invalid>
> > >> > wrote:
> > >> > >
> > >> > > Situation:
> > >> > >
> > >> > > We use TWCS for a task history table (partition is user, column
> key
> > is
> > >> > > timeuuid of task, TWCS is used due to tombstone TTLs that rotate
> out
> > >> the
> > >> > > tasks every say month. )
> > >> > >
> > >> > > However, if we want to get a "slice" of tasks (say, tasks in the
> > last
> > >> two
> > >> > > days and we are using TWCS sstable blocks of 12 hours).
> > >> > >
> > >> > > The problem is, this is a frequent user and they have tasks in ALL
> > the
> > >> > > sstables that are organized by the TWCS into time-bucketed
> sstables.
> > >> > >
> > >> > > So Cassandra has to first read in, say 80 sstables to reconstruct
> > the
> > >> > row,
> > >> > > THEN it can exclude/slice on the column key.
> > >> > >
> > >> > > Question:
> > >> > >
> > >> > > Or am I wrong that the read path needs to grab all relevant
> sstables
> > >> > before
> > >> > > applying column key slicing and this is possible? Admittedly we
> are
> > in
> > >> > 2.1
> > >> > > for this table (we in the process of upgrading now that we have an
> > >> > > automated upgrading program that seems to work pretty well)
> > >> > >
> > >> > > If my assumption is correct, then the compaction strategy knows as
> > it
> > >> > > writes the sstables what it is bucketing them as (and could encode
> > in
> > >> > > sstable metadata?). If my assumption about slicing is that the
> whole
> > >> row
> > >> > > needs reconstruction, if we had a perfect infinite monkey coding
> > team
> > >> > that
> > >> > > could generate whatever we wanted within some feasibility, could
> we
> > >> > provide
> > >> > > special hooks to do sstable exclusion based on metadata if we know
> > >> that
> > >> > > that the metadata will indicate exclusion/inclusion of columns
> based
> > >> on
> > >> > > metadata?
> > >> > >
> > >> > > Goal:
> > >> > >
> > >> > > The overall goal would be to support exclusion of sstables from a
> > read
> > >> > > path, in case we had compaction strategies hand-tailored for other
> > >> > queries.
> > >> > > Essentially we would be doing a first-pass bucketsort exclusion
> with
> > >> the
> > >> > > sstable metadata marking the buckets. This might aid support of
> > >> superwide
> > >> > > rows and paging through column keys if we allowed the table
> creator
> > to
> > >> > > specify bucketing as flushing occurs. In general it appears query
> > >> > > performance quickly degrades based on # sstables required for a
> > >> lookup.
> > >> > >
> > >> > > I still don't know the code nearly well enough to do patches, it
> > would
> > >> > seem
> > >> > > based on my looking at custom compaction strategies and the basic
> > read
> > >> > path
> > >> > > that this would be a useful extension for advanced users.
> > >> > >
> > >> > > The fallback would be a set of tables to serve as buckets and we
> > span
> > >> the
> > >> > > buckets with queries when one bucket runs out. The tables rotate.
> > >> >
> > >> >
> ---------------------------------------------------------------------
> > >> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > >> > For additional commands, e-mail: dev-help@cassandra.apache.org
> > >> >
> > >> >
> > >>
> > >> --
> > >> Jon Haddad
> > >> http://www.rustyrazorblade.com
> > >> twitter: rustyrazorblade
> > >>
> > >
> >
>

Re: SSTable exclusion from read path based on sstable metadata marked by custom compaction strategies

Posted by Jeff Jirsa <jj...@gmail.com>.

On Fri, Feb 1, 2019 at 12:58 PM Carl Mueller
<ca...@smartthings.com.invalid> wrote:

> Jeff: so the partition key with timestamp would then need a separate index
> table to track the appid->partition keys. Which isn't horrible, but also
> tracks into another desire of mine: some way to make the replica mapping
> match locally between the index table and the data table:
>
> So in the composite partition key for the TWCS table, you'd have app_id +
> timestamp, BUT ONLY THE app_id GENERATES the hash/key.
>
>
Huh? No, you'd have a composite partition key of app_id + timestamp
ROUNDED/CEIL/FLOOR to some time window, and both would be used for hash/key.

And you dont need any extra table, because app_id is known and the
timestamp can be calculated (e.g., 4 digits of year + 3 digits for day of
year makes today 2019032 )



> Thus it would match with the index table that is just partition key app_id,
> column key timestamp.
>
> And then theoretically a node-local "join" could be done without an
> additional query hop, and batched updates would be more easily atomic to a
> single node.
>
> Now how we would communicate all that in CQL/etc: who knows. Hm. Maybe
> materialized views cover this, but I haven't tracked that since we don't
> have versions that support them and they got "deprecated".
>
>
> On Fri, Feb 1, 2019 at 2:53 PM Carl Mueller <ca...@smartthings.com>
> wrote:
>
> > Interesting. Now that we have semiautomated upgrades, we are going to
> > hopefully get everything to 3.11X once we get the intermediate hop to
> 2.2.
> >
> > I'm thinking we could also use sstable metadata markings + custom
> > compactors for things like multiple customers on the same table. So you
> > could sequester the data for a customer in their own sstables and then
> > queries could effectively be subdivided against only the sstables that
> had
> > that customer. Maybe the min and max would cover that, I'd have to look
> at
> > the details.
> >
> > On Thu, Jan 31, 2019 at 8:11 PM Jonathan Haddad <jo...@jonhaddad.com>
> wrote:
> >
> >> In addition to what Jeff mentioned, there was an optimization in 3.4
> that
> >> can significantly reduce the number of sstables accessed when a LIMIT
> >> clause was used.  This can be a pretty big win with TWCS.
> >>
> >>
> >>
> http://thelastpickle.com/blog/2017/03/07/The-limit-clause-in-cassandra-might-not-work-as-you-think.html
> >>
> >> On Thu, Jan 31, 2019 at 5:50 PM Jeff Jirsa <jj...@gmail.com> wrote:
> >>
> >> > In my original TWCS talk a few years back, I suggested that people
> make
> >> > the partitions match the time window to avoid exactly what you’re
> >> > describing. I added that to the talk because my first team that used
> >> TWCS
> >> > (the team for which I built TWCS) had a data model not unlike yours,
> and
> >> > the read-every-sstable thing turns out not to work that well if you
> have
> >> > lots of windows (or very large partitions). If you do this, you can
> fan
> >> out
> >> > a bunch of async reads for the first few days and ask for more as you
> >> need
> >> > to fill the page - this means the reads are more distributed, too,
> >> which is
> >> > an extra bonus when you have noisy partitions.
> >> >
> >> > In 3.0 and newer (I think, don’t quote me in the specific version),
> the
> >> > sstable metadata has the min and max clustering which helps exclude
> >> > sstables from the read path quite well if everything in the table is
> >> using
> >> > timestamp clustering columns. I know there was some issue with this
> and
> >> RTs
> >> > recently, so I’m not sure if it’s current state, but worth considering
> >> that
> >> > this may be much better on 3.0+
> >> >
> >> >
> >> >
> >> > --
> >> > Jeff Jirsa
> >> >
> >> >
> >> > > On Jan 31, 2019, at 1:56 PM, Carl Mueller <
> >> carl.mueller@smartthings.com.invalid>
> >> > wrote:
> >> > >
> >> > > Situation:
> >> > >
> >> > > We use TWCS for a task history table (partition is user, column key
> is
> >> > > timeuuid of task, TWCS is used due to tombstone TTLs that rotate out
> >> the
> >> > > tasks every say month. )
> >> > >
> >> > > However, if we want to get a "slice" of tasks (say, tasks in the
> last
> >> two
> >> > > days and we are using TWCS sstable blocks of 12 hours).
> >> > >
> >> > > The problem is, this is a frequent user and they have tasks in ALL
> the
> >> > > sstables that are organized by the TWCS into time-bucketed sstables.
> >> > >
> >> > > So Cassandra has to first read in, say 80 sstables to reconstruct
> the
> >> > row,
> >> > > THEN it can exclude/slice on the column key.
> >> > >
> >> > > Question:
> >> > >
> >> > > Or am I wrong that the read path needs to grab all relevant sstables
> >> > before
> >> > > applying column key slicing and this is possible? Admittedly we are
> in
> >> > 2.1
> >> > > for this table (we in the process of upgrading now that we have an
> >> > > automated upgrading program that seems to work pretty well)
> >> > >
> >> > > If my assumption is correct, then the compaction strategy knows as
> it
> >> > > writes the sstables what it is bucketing them as (and could encode
> in
> >> > > sstable metadata?). If my assumption about slicing is that the whole
> >> row
> >> > > needs reconstruction, if we had a perfect infinite monkey coding
> team
> >> > that
> >> > > could generate whatever we wanted within some feasibility, could we
> >> > provide
> >> > > special hooks to do sstable exclusion based on metadata if we know
> >> that
> >> > > that the metadata will indicate exclusion/inclusion of columns based
> >> on
> >> > > metadata?
> >> > >
> >> > > Goal:
> >> > >
> >> > > The overall goal would be to support exclusion of sstables from a
> read
> >> > > path, in case we had compaction strategies hand-tailored for other
> >> > queries.
> >> > > Essentially we would be doing a first-pass bucketsort exclusion with
> >> the
> >> > > sstable metadata marking the buckets. This might aid support of
> >> superwide
> >> > > rows and paging through column keys if we allowed the table creator
> to
> >> > > specify bucketing as flushing occurs. In general it appears query
> >> > > performance quickly degrades based on # sstables required for a
> >> lookup.
> >> > >
> >> > > I still don't know the code nearly well enough to do patches, it
> would
> >> > seem
> >> > > based on my looking at custom compaction strategies and the basic
> read
> >> > path
> >> > > that this would be a useful extension for advanced users.
> >> > >
> >> > > The fallback would be a set of tables to serve as buckets and we
> span
> >> the
> >> > > buckets with queries when one bucket runs out. The tables rotate.
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >> >
> >> >
> >>
> >> --
> >> Jon Haddad
> >> http://www.rustyrazorblade.com
> >> twitter: rustyrazorblade
> >>
> >
>

Re: SSTable exclusion from read path based on sstable metadata marked by custom compaction strategies

Posted by Carl Mueller <ca...@smartthings.com.INVALID>.

Jeff: so the partition key with timestamp would then need a separate index
table to track the appid->partition keys. Which isn't horrible, but also
tracks into another desire of mine: some way to make the replica mapping
match locally between the index table and the data table:

So in the composite partition key for the TWCS table, you'd have app_id +
timestamp, BUT ONLY THE app_id GENERATES the hash/key.

Thus it would match with the index table that is just partition key app_id,
column key timestamp.

And then theoretically a node-local "join" could be done without an
additional query hop, and batched updates would be more easily atomic to a
single node.

Now how we would communicate all that in CQL/etc: who knows. Hm. Maybe
materialized views cover this, but I haven't tracked that since we don't
have versions that support them and they got "deprecated".


On Fri, Feb 1, 2019 at 2:53 PM Carl Mueller <ca...@smartthings.com>
wrote:

> Interesting. Now that we have semiautomated upgrades, we are going to
> hopefully get everything to 3.11X once we get the intermediate hop to 2.2.
>
> I'm thinking we could also use sstable metadata markings + custom
> compactors for things like multiple customers on the same table. So you
> could sequester the data for a customer in their own sstables and then
> queries could effectively be subdivided against only the sstables that had
> that customer. Maybe the min and max would cover that, I'd have to look at
> the details.
>
> On Thu, Jan 31, 2019 at 8:11 PM Jonathan Haddad <jo...@jonhaddad.com> wrote:
>
>> In addition to what Jeff mentioned, there was an optimization in 3.4 that
>> can significantly reduce the number of sstables accessed when a LIMIT
>> clause was used.  This can be a pretty big win with TWCS.
>>
>>
>> http://thelastpickle.com/blog/2017/03/07/The-limit-clause-in-cassandra-might-not-work-as-you-think.html
>>
>> On Thu, Jan 31, 2019 at 5:50 PM Jeff Jirsa <jj...@gmail.com> wrote:
>>
>> > In my original TWCS talk a few years back, I suggested that people make
>> > the partitions match the time window to avoid exactly what you’re
>> > describing. I added that to the talk because my first team that used
>> TWCS
>> > (the team for which I built TWCS) had a data model not unlike yours, and
>> > the read-every-sstable thing turns out not to work that well if you have
>> > lots of windows (or very large partitions). If you do this, you can fan
>> out
>> > a bunch of async reads for the first few days and ask for more as you
>> need
>> > to fill the page - this means the reads are more distributed, too,
>> which is
>> > an extra bonus when you have noisy partitions.
>> >
>> > In 3.0 and newer (I think, don’t quote me in the specific version), the
>> > sstable metadata has the min and max clustering which helps exclude
>> > sstables from the read path quite well if everything in the table is
>> using
>> > timestamp clustering columns. I know there was some issue with this and
>> RTs
>> > recently, so I’m not sure if it’s current state, but worth considering
>> that
>> > this may be much better on 3.0+
>> >
>> >
>> >
>> > --
>> > Jeff Jirsa
>> >
>> >
>> > > On Jan 31, 2019, at 1:56 PM, Carl Mueller <
>> carl.mueller@smartthings.com.invalid>
>> > wrote:
>> > >
>> > > Situation:
>> > >
>> > > We use TWCS for a task history table (partition is user, column key is
>> > > timeuuid of task, TWCS is used due to tombstone TTLs that rotate out
>> the
>> > > tasks every say month. )
>> > >
>> > > However, if we want to get a "slice" of tasks (say, tasks in the last
>> two
>> > > days and we are using TWCS sstable blocks of 12 hours).
>> > >
>> > > The problem is, this is a frequent user and they have tasks in ALL the
>> > > sstables that are organized by the TWCS into time-bucketed sstables.
>> > >
>> > > So Cassandra has to first read in, say 80 sstables to reconstruct the
>> > row,
>> > > THEN it can exclude/slice on the column key.
>> > >
>> > > Question:
>> > >
>> > > Or am I wrong that the read path needs to grab all relevant sstables
>> > before
>> > > applying column key slicing and this is possible? Admittedly we are in
>> > 2.1
>> > > for this table (we in the process of upgrading now that we have an
>> > > automated upgrading program that seems to work pretty well)
>> > >
>> > > If my assumption is correct, then the compaction strategy knows as it
>> > > writes the sstables what it is bucketing them as (and could encode in
>> > > sstable metadata?). If my assumption about slicing is that the whole
>> row
>> > > needs reconstruction, if we had a perfect infinite monkey coding team
>> > that
>> > > could generate whatever we wanted within some feasibility, could we
>> > provide
>> > > special hooks to do sstable exclusion based on metadata if we know
>> that
>> > > that the metadata will indicate exclusion/inclusion of columns based
>> on
>> > > metadata?
>> > >
>> > > Goal:
>> > >
>> > > The overall goal would be to support exclusion of sstables from a read
>> > > path, in case we had compaction strategies hand-tailored for other
>> > queries.
>> > > Essentially we would be doing a first-pass bucketsort exclusion with
>> the
>> > > sstable metadata marking the buckets. This might aid support of
>> superwide
>> > > rows and paging through column keys if we allowed the table creator to
>> > > specify bucketing as flushing occurs. In general it appears query
>> > > performance quickly degrades based on # sstables required for a
>> lookup.
>> > >
>> > > I still don't know the code nearly well enough to do patches, it would
>> > seem
>> > > based on my looking at custom compaction strategies and the basic read
>> > path
>> > > that this would be a useful extension for advanced users.
>> > >
>> > > The fallback would be a set of tables to serve as buckets and we span
>> the
>> > > buckets with queries when one bucket runs out. The tables rotate.
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> > For additional commands, e-mail: dev-help@cassandra.apache.org
>> >
>> >
>>
>> --
>> Jon Haddad
>> http://www.rustyrazorblade.com
>> twitter: rustyrazorblade
>>
>

Re: SSTable exclusion from read path based on sstable metadata marked by custom compaction strategies

Posted by Carl Mueller <ca...@smartthings.com.INVALID>.

Interesting. Now that we have semiautomated upgrades, we are going to
hopefully get everything to 3.11X once we get the intermediate hop to 2.2.

I'm thinking we could also use sstable metadata markings + custom
compactors for things like multiple customers on the same table. So you
could sequester the data for a customer in their own sstables and then
queries could effectively be subdivided against only the sstables that had
that customer. Maybe the min and max would cover that, I'd have to look at
the details.

On Thu, Jan 31, 2019 at 8:11 PM Jonathan Haddad <jo...@jonhaddad.com> wrote:

> In addition to what Jeff mentioned, there was an optimization in 3.4 that
> can significantly reduce the number of sstables accessed when a LIMIT
> clause was used.  This can be a pretty big win with TWCS.
>
>
> http://thelastpickle.com/blog/2017/03/07/The-limit-clause-in-cassandra-might-not-work-as-you-think.html
>
> On Thu, Jan 31, 2019 at 5:50 PM Jeff Jirsa <jj...@gmail.com> wrote:
>
> > In my original TWCS talk a few years back, I suggested that people make
> > the partitions match the time window to avoid exactly what you’re
> > describing. I added that to the talk because my first team that used TWCS
> > (the team for which I built TWCS) had a data model not unlike yours, and
> > the read-every-sstable thing turns out not to work that well if you have
> > lots of windows (or very large partitions). If you do this, you can fan
> out
> > a bunch of async reads for the first few days and ask for more as you
> need
> > to fill the page - this means the reads are more distributed, too, which
> is
> > an extra bonus when you have noisy partitions.
> >
> > In 3.0 and newer (I think, don’t quote me in the specific version), the
> > sstable metadata has the min and max clustering which helps exclude
> > sstables from the read path quite well if everything in the table is
> using
> > timestamp clustering columns. I know there was some issue with this and
> RTs
> > recently, so I’m not sure if it’s current state, but worth considering
> that
> > this may be much better on 3.0+
> >
> >
> >
> > --
> > Jeff Jirsa
> >
> >
> > > On Jan 31, 2019, at 1:56 PM, Carl Mueller <
> carl.mueller@smartthings.com.invalid>
> > wrote:
> > >
> > > Situation:
> > >
> > > We use TWCS for a task history table (partition is user, column key is
> > > timeuuid of task, TWCS is used due to tombstone TTLs that rotate out
> the
> > > tasks every say month. )
> > >
> > > However, if we want to get a "slice" of tasks (say, tasks in the last
> two
> > > days and we are using TWCS sstable blocks of 12 hours).
> > >
> > > The problem is, this is a frequent user and they have tasks in ALL the
> > > sstables that are organized by the TWCS into time-bucketed sstables.
> > >
> > > So Cassandra has to first read in, say 80 sstables to reconstruct the
> > row,
> > > THEN it can exclude/slice on the column key.
> > >
> > > Question:
> > >
> > > Or am I wrong that the read path needs to grab all relevant sstables
> > before
> > > applying column key slicing and this is possible? Admittedly we are in
> > 2.1
> > > for this table (we in the process of upgrading now that we have an
> > > automated upgrading program that seems to work pretty well)
> > >
> > > If my assumption is correct, then the compaction strategy knows as it
> > > writes the sstables what it is bucketing them as (and could encode in
> > > sstable metadata?). If my assumption about slicing is that the whole
> row
> > > needs reconstruction, if we had a perfect infinite monkey coding team
> > that
> > > could generate whatever we wanted within some feasibility, could we
> > provide
> > > special hooks to do sstable exclusion based on metadata if we know that
> > > that the metadata will indicate exclusion/inclusion of columns based on
> > > metadata?
> > >
> > > Goal:
> > >
> > > The overall goal would be to support exclusion of sstables from a read
> > > path, in case we had compaction strategies hand-tailored for other
> > queries.
> > > Essentially we would be doing a first-pass bucketsort exclusion with
> the
> > > sstable metadata marking the buckets. This might aid support of
> superwide
> > > rows and paging through column keys if we allowed the table creator to
> > > specify bucketing as flushing occurs. In general it appears query
> > > performance quickly degrades based on # sstables required for a lookup.
> > >
> > > I still don't know the code nearly well enough to do patches, it would
> > seem
> > > based on my looking at custom compaction strategies and the basic read
> > path
> > > that this would be a useful extension for advanced users.
> > >
> > > The fallback would be a set of tables to serve as buckets and we span
> the
> > > buckets with queries when one bucket runs out. The tables rotate.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
> >
>
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade
>

Re: SSTable exclusion from read path based on sstable metadata marked by custom compaction strategies

Posted by Jonathan Haddad <jo...@jonhaddad.com>.

In addition to what Jeff mentioned, there was an optimization in 3.4 that
can significantly reduce the number of sstables accessed when a LIMIT
clause was used.  This can be a pretty big win with TWCS.

http://thelastpickle.com/blog/2017/03/07/The-limit-clause-in-cassandra-might-not-work-as-you-think.html

On Thu, Jan 31, 2019 at 5:50 PM Jeff Jirsa <jj...@gmail.com> wrote:

> In my original TWCS talk a few years back, I suggested that people make
> the partitions match the time window to avoid exactly what you’re
> describing. I added that to the talk because my first team that used TWCS
> (the team for which I built TWCS) had a data model not unlike yours, and
> the read-every-sstable thing turns out not to work that well if you have
> lots of windows (or very large partitions). If you do this, you can fan out
> a bunch of async reads for the first few days and ask for more as you need
> to fill the page - this means the reads are more distributed, too, which is
> an extra bonus when you have noisy partitions.
>
> In 3.0 and newer (I think, don’t quote me in the specific version), the
> sstable metadata has the min and max clustering which helps exclude
> sstables from the read path quite well if everything in the table is using
> timestamp clustering columns. I know there was some issue with this and RTs
> recently, so I’m not sure if it’s current state, but worth considering that
> this may be much better on 3.0+
>
>
>
> --
> Jeff Jirsa
>
>
> > On Jan 31, 2019, at 1:56 PM, Carl Mueller <ca...@smartthings.com.invalid>
> wrote:
> >
> > Situation:
> >
> > We use TWCS for a task history table (partition is user, column key is
> > timeuuid of task, TWCS is used due to tombstone TTLs that rotate out the
> > tasks every say month. )
> >
> > However, if we want to get a "slice" of tasks (say, tasks in the last two
> > days and we are using TWCS sstable blocks of 12 hours).
> >
> > The problem is, this is a frequent user and they have tasks in ALL the
> > sstables that are organized by the TWCS into time-bucketed sstables.
> >
> > So Cassandra has to first read in, say 80 sstables to reconstruct the
> row,
> > THEN it can exclude/slice on the column key.
> >
> > Question:
> >
> > Or am I wrong that the read path needs to grab all relevant sstables
> before
> > applying column key slicing and this is possible? Admittedly we are in
> 2.1
> > for this table (we in the process of upgrading now that we have an
> > automated upgrading program that seems to work pretty well)
> >
> > If my assumption is correct, then the compaction strategy knows as it
> > writes the sstables what it is bucketing them as (and could encode in
> > sstable metadata?). If my assumption about slicing is that the whole row
> > needs reconstruction, if we had a perfect infinite monkey coding team
> that
> > could generate whatever we wanted within some feasibility, could we
> provide
> > special hooks to do sstable exclusion based on metadata if we know that
> > that the metadata will indicate exclusion/inclusion of columns based on
> > metadata?
> >
> > Goal:
> >
> > The overall goal would be to support exclusion of sstables from a read
> > path, in case we had compaction strategies hand-tailored for other
> queries.
> > Essentially we would be doing a first-pass bucketsort exclusion with the
> > sstable metadata marking the buckets. This might aid support of superwide
> > rows and paging through column keys if we allowed the table creator to
> > specify bucketing as flushing occurs. In general it appears query
> > performance quickly degrades based on # sstables required for a lookup.
> >
> > I still don't know the code nearly well enough to do patches, it would
> seem
> > based on my looking at custom compaction strategies and the basic read
> path
> > that this would be a useful extension for advanced users.
> >
> > The fallback would be a set of tables to serve as buckets and we span the
> > buckets with queries when one bucket runs out. The tables rotate.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade