You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cassandra.apache.org by Arthur Kushka <ar...@gmail.com> on 2018/01/13 12:58:07 UTC

Getting partition min/max timestamp

Hi folks,

Currently, I working on custom CQL operator that should return the max
timestamp for some partition.

I don't think that scanning of partition for that kind of data is a nice
idea. Instead of it, I thinking about adding a metadata to the partition. I
want to store minTimestamp and maxTimestamp for every partition as it
already done in Memtable`s. That timestamps will be updated on each
mutation operation, that is quite cheap in comparison to full scan.

I quite new to Cassandra codebase and want to get some critics and ideas,
maybe that kind of data already stored somewhere or you have better ideas.
Is my assumption right?

Best,
Artur

Re: Getting partition min/max timestamp

Posted by Jeremiah Jordan <je...@datastax.com>.

Finding the max timestamp of a partition is an aggregation.  Doing that calculation purely on the replica (wether pre-calculated or not) is problematic for any CL > 1 in the face of deletions or update that are missing. As the contents of the partition on a given replica are different than what they would be when merged on the coordinator.

> On Jan 14, 2018, at 3:33 PM, "arhelmus@gmail.com"<ar...@gmail.com> wrote:
> 
> First of all, thx for all the ideas. 
> 
> Benedict ElIiott Smith, in code comments I found a notice that data in EncodingStats can be wrong, not sure that its good idea to use it for accurate results. As I understand incorrect data is not a problem for the current use case of it, but not for my one. Currently, I added fields for every AtomicBTreePartition. That fields I update in addAllWithSizeDelta call, but also now I get that I should think about the case of data removing.
> 
> I currently don't really care about TTL's, but its the case about I should think, thx.
> 
> Jeremiah Jordan, thx for notice, but I don't really get what are you mean about replica aggregation optimization’s. Can you please explain it for me?
> 
>> On 2018-01-14 17:16, Benedict Elliott Smith <be...@apache.org> wrote: 
>> (Obviously, not to detract from the points that Jon and Jeremiah make, i.e.
>> that if TTLs or tombstones are involved the metadata we have, or can add,
>> is going to be worthless in most cases anyway)
>> 
>> On 14 January 2018 at 16:11, Benedict Elliott Smith <be...@apache.org>
>> wrote:
>> 
>>> We already store the minimum timestamp in the EncodingStats of each
>>> partition, to support more efficient encoding of atom timestamps.  This
>>> just isn't exposed beyond UnfilteredRowIterator, though it probably could
>>> be.
>>> 
>>> Storing the max alongside would still require justification, though its
>>> cost would actually be fairly nominal (probably only a few bytes; it
>>> depends on how far apart min/max are).
>>> 
>>> I'm not sure (IMO) that even a fairly nominal cost could be justified
>>> unless there were widespread benefit though, which I'm not sure this would
>>> provide.  Maintaining a patched variant of your own that stores this
>>> probably wouldn't be too hard, though.
>>> 
>>> In the meantime, exposing and utilising the minimum timestamp from
>>> EncodingStats is probably a good place to start to explore the viability of
>>> the approach.
>>> 
>>> On 14 January 2018 at 15:34, Jeremiah Jordan <je...@datastax.com>
>>> wrote:
>>> 
>>>> Don’t forget about deleted and missing data. The bane of all on replica
>>>> aggregation optimization’s.
>>>> 
>>>>> On Jan 14, 2018, at 12:07 AM, Jeff Jirsa <jj...@gmail.com> wrote:
>>>>> 
>>>>> 
>>>>> You’re right it’s not stored in metadata now. Adding this to metadata
>>>> isn’t hard, it’s just hard to do it right where it’s useful to people with
>>>> other data models (besides yours) so it can make it upstream (if that’s
>>>> your goal). In particular the worst possible case is a table with no
>>>> clustering key and a single non-partition key column. In that case storing
>>>> these extra two long time stamps may be 2-3x more storage than without,
>>>> which would be a huge regression, so you’d have to have a way to turn that
>>>> feature off.
>>>>> 
>>>>> 
>>>>> Worth mentioning that there are ways to do this without altering
>>>> Cassandra -  consider using static columns that represent the min timestamp
>>>> and max timestamp. Create them both as ints or longs and write them on all
>>>> inserts/updates (as part of a batch, if needed). The only thing you’ll have
>>>> to do is find a way for “min timestamp” to work - you can set the min time
>>>> stamp column with an explicit  “using timestamp” timestamp = 2^31-NOW, so
>>>> that future writes won’t overwrite those values. That gives you a first
>>>> write win behavior for that column, which gives you an effective min
>>>> timestamp for the partition as a whole.
>>>>> 
>>>>> --
>>>>> Jeff Jirsa
>>>>> 
>>>>> 
>>>>>> On Jan 13, 2018, at 4:58 AM, Arthur Kushka <ar...@gmail.com> wrote:
>>>>>> 
>>>>>> Hi folks,
>>>>>> 
>>>>>> Currently, I working on custom CQL operator that should return the max
>>>>>> timestamp for some partition.
>>>>>> 
>>>>>> I don't think that scanning of partition for that kind of data is a
>>>> nice
>>>>>> idea. Instead of it, I thinking about adding a metadata to the
>>>> partition. I
>>>>>> want to store minTimestamp and maxTimestamp for every partition as it
>>>>>> already done in Memtable`s. That timestamps will be updated on each
>>>>>> mutation operation, that is quite cheap in comparison to full scan.
>>>>>> 
>>>>>> I quite new to Cassandra codebase and want to get some critics and
>>>> ideas,
>>>>>> maybe that kind of data already stored somewhere or you have better
>>>> ideas.
>>>>>> Is my assumption right?
>>>>>> 
>>>>>> Best,
>>>>>> Artur
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>> 
>>>> 
>>> 
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Getting partition min/max timestamp

Posted by Brian Hess <br...@gmail.com>.

Jeremiah, this might be the exception, since the value that is being
aggregated is exactly the same value that determines liveliness of the
data, and more so since the aggregation requested is the *max* of the
timestamp, given that Cassandra is a Last-Write-Wins (so, looks at the
maximum timestamp).  So, you could actually record the last timestamp of
the last mutation on each partition and have an aggregation you can read at
consistency levels greater than ONE.

That said, it will be the timestamp of the last mutation.  That is, it has
to include tombstones, range tombstones, partition tombstones, etc. That
is, it's quite a bit harder to record the timestamp of the last "live"
value of the data in the partition.

Minimum timestamp is quite a bit harder, especially in the face of Time to
Live operations.  Once the "oldest" timestamped mutation ages off, it's
essentially a full partition scan to find the new minumum timestamp.  It's
also difficult to "break the tie" if two replicas come back with different
minimum timestamps.  The issue is if some other replica has deleted the
mutation that holds the minimum timestamp then you would want to discard
this timestamp value, but the only way to do that is a second lookup to
determine what value corresponds to the minimum timestamp and see if the
value is still live.  If it is not live, then how will you determine the
actual minimum?  Also, assume this is the case for the arbitrarily
large *N* minimum
timestamps on the replica(s).

TL/DR, while maximum might be doable, minimum does fall into the category
that Jeremiah calls out (it's hard to do aggregations on an eventually
consistent system).

On Sun, Jan 14, 2018 at 5:37 PM, Benedict Elliott Smith <benedict@apache.org
> wrote:

> It's a long time since I looked at the code, but I'm pretty sure that
> comment is explaining why we translate *no* timestamp to *epoch*, to save
> space when serializing the encoding stats.  Not stipulating that the data
> may be inaccurate.
>
> However, being such a long time since I looked, I forgot we still only
> apparently store these stats per sstable.  It's not actually immediately
> clear to me if storing them per partition would help tremendously (wrt
> compression, as this data was intended) given you would expect a great deal
> of correlation between partitions.  But they would also be extremely cheap
> to persist per partition, so only a modestly positive impact on compression
> would be needed to justify (or permit) them.
>
> I don't think this use case would probably drive development, but if you
> were to write the patch and demonstrate it approximately maintained present
> data sizes, it's likely such a patch would be accepted.
>
> On 14 January 2018 at 20:33, arhelmus@gmail.com <ar...@gmail.com>
> wrote:
>
> > First of all, thx for all the ideas.
> >
> > Benedict ElIiott Smith, in code comments I found a notice that data in
> > EncodingStats can be wrong, not sure that its good idea to use it for
> > accurate results. As I understand incorrect data is not a problem for the
> > current use case of it, but not for my one. Currently, I added fields for
> > every AtomicBTreePartition. That fields I update in addAllWithSizeDelta
> > call, but also now I get that I should think about the case of data
> > removing.
> >
> > I currently don't really care about TTL's, but its the case about I
> should
> > think, thx.
> >
> > Jeremiah Jordan, thx for notice, but I don't really get what are you mean
> > about replica aggregation optimization’s. Can you please explain it for
> me?
> >
> > On 2018-01-14 17:16, Benedict Elliott Smith <be...@apache.org> wrote:
> > > (Obviously, not to detract from the points that Jon and Jeremiah make,
> > i.e.
> > > that if TTLs or tombstones are involved the metadata we have, or can
> add,
> > > is going to be worthless in most cases anyway)
> > >
> > > On 14 January 2018 at 16:11, Benedict Elliott Smith <
> benedict@apache.org
> > >
> > > wrote:
> > >
> > > > We already store the minimum timestamp in the EncodingStats of each
> > > > partition, to support more efficient encoding of atom timestamps.
> This
> > > > just isn't exposed beyond UnfilteredRowIterator, though it probably
> > could
> > > > be.
> > > >
> > > > Storing the max alongside would still require justification, though
> its
> > > > cost would actually be fairly nominal (probably only a few bytes; it
> > > > depends on how far apart min/max are).
> > > >
> > > > I'm not sure (IMO) that even a fairly nominal cost could be justified
> > > > unless there were widespread benefit though, which I'm not sure this
> > would
> > > > provide.  Maintaining a patched variant of your own that stores this
> > > > probably wouldn't be too hard, though.
> > > >
> > > > In the meantime, exposing and utilising the minimum timestamp from
> > > > EncodingStats is probably a good place to start to explore the
> > viability of
> > > > the approach.
> > > >
> > > > On 14 January 2018 at 15:34, Jeremiah Jordan <je...@datastax.com>
> > > > wrote:
> > > >
> > > >> Don’t forget about deleted and missing data. The bane of all on
> > replica
> > > >> aggregation optimization’s.
> > > >>
> > > >> > On Jan 14, 2018, at 12:07 AM, Jeff Jirsa <jj...@gmail.com>
> wrote:
> > > >> >
> > > >> >
> > > >> > You’re right it’s not stored in metadata now. Adding this to
> > metadata
> > > >> isn’t hard, it’s just hard to do it right where it’s useful to
> people
> > with
> > > >> other data models (besides yours) so it can make it upstream (if
> > that’s
> > > >> your goal). In particular the worst possible case is a table with no
> > > >> clustering key and a single non-partition key column. In that case
> > storing
> > > >> these extra two long time stamps may be 2-3x more storage than
> > without,
> > > >> which would be a huge regression, so you’d have to have a way to
> turn
> > that
> > > >> feature off.
> > > >> >
> > > >> >
> > > >> > Worth mentioning that there are ways to do this without altering
> > > >> Cassandra -  consider using static columns that represent the min
> > timestamp
> > > >> and max timestamp. Create them both as ints or longs and write them
> > on all
> > > >> inserts/updates (as part of a batch, if needed). The only thing
> > you’ll have
> > > >> to do is find a way for “min timestamp” to work - you can set the
> min
> > time
> > > >> stamp column with an explicit  “using timestamp” timestamp =
> > 2^31-NOW, so
> > > >> that future writes won’t overwrite those values. That gives you a
> > first
> > > >> write win behavior for that column, which gives you an effective min
> > > >> timestamp for the partition as a whole.
> > > >> >
> > > >> > --
> > > >> > Jeff Jirsa
> > > >> >
> > > >> >
> > > >> >> On Jan 13, 2018, at 4:58 AM, Arthur Kushka <ar...@gmail.com>
> > wrote:
> > > >> >>
> > > >> >> Hi folks,
> > > >> >>
> > > >> >> Currently, I working on custom CQL operator that should return
> the
> > max
> > > >> >> timestamp for some partition.
> > > >> >>
> > > >> >> I don't think that scanning of partition for that kind of data
> is a
> > > >> nice
> > > >> >> idea. Instead of it, I thinking about adding a metadata to the
> > > >> partition. I
> > > >> >> want to store minTimestamp and maxTimestamp for every partition
> as
> > it
> > > >> >> already done in Memtable`s. That timestamps will be updated on
> each
> > > >> >> mutation operation, that is quite cheap in comparison to full
> scan.
> > > >> >>
> > > >> >> I quite new to Cassandra codebase and want to get some critics
> and
> > > >> ideas,
> > > >> >> maybe that kind of data already stored somewhere or you have
> better
> > > >> ideas.
> > > >> >> Is my assumption right?
> > > >> >>
> > > >> >> Best,
> > > >> >> Artur
> > > >> >
> > > >> > ------------------------------------------------------------
> > ---------
> > > >> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > >> > For additional commands, e-mail: dev-help@cassandra.apache.org
> > > >> >
> > > >>
> > > >> ------------------------------------------------------------
> ---------
> > > >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > >> For additional commands, e-mail: dev-help@cassandra.apache.org
> > > >>
> > > >>
> > > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
> >
>

Re: Getting partition min/max timestamp

Posted by Benedict Elliott Smith <be...@apache.org>.

It's a long time since I looked at the code, but I'm pretty sure that
comment is explaining why we translate *no* timestamp to *epoch*, to save
space when serializing the encoding stats.  Not stipulating that the data
may be inaccurate.

However, being such a long time since I looked, I forgot we still only
apparently store these stats per sstable.  It's not actually immediately
clear to me if storing them per partition would help tremendously (wrt
compression, as this data was intended) given you would expect a great deal
of correlation between partitions.  But they would also be extremely cheap
to persist per partition, so only a modestly positive impact on compression
would be needed to justify (or permit) them.

I don't think this use case would probably drive development, but if you
were to write the patch and demonstrate it approximately maintained present
data sizes, it's likely such a patch would be accepted.

On 14 January 2018 at 20:33, arhelmus@gmail.com <ar...@gmail.com> wrote:

> First of all, thx for all the ideas.
>
> Benedict ElIiott Smith, in code comments I found a notice that data in
> EncodingStats can be wrong, not sure that its good idea to use it for
> accurate results. As I understand incorrect data is not a problem for the
> current use case of it, but not for my one. Currently, I added fields for
> every AtomicBTreePartition. That fields I update in addAllWithSizeDelta
> call, but also now I get that I should think about the case of data
> removing.
>
> I currently don't really care about TTL's, but its the case about I should
> think, thx.
>
> Jeremiah Jordan, thx for notice, but I don't really get what are you mean
> about replica aggregation optimization’s. Can you please explain it for me?
>
> On 2018-01-14 17:16, Benedict Elliott Smith <be...@apache.org> wrote:
> > (Obviously, not to detract from the points that Jon and Jeremiah make,
> i.e.
> > that if TTLs or tombstones are involved the metadata we have, or can add,
> > is going to be worthless in most cases anyway)
> >
> > On 14 January 2018 at 16:11, Benedict Elliott Smith <benedict@apache.org
> >
> > wrote:
> >
> > > We already store the minimum timestamp in the EncodingStats of each
> > > partition, to support more efficient encoding of atom timestamps.  This
> > > just isn't exposed beyond UnfilteredRowIterator, though it probably
> could
> > > be.
> > >
> > > Storing the max alongside would still require justification, though its
> > > cost would actually be fairly nominal (probably only a few bytes; it
> > > depends on how far apart min/max are).
> > >
> > > I'm not sure (IMO) that even a fairly nominal cost could be justified
> > > unless there were widespread benefit though, which I'm not sure this
> would
> > > provide.  Maintaining a patched variant of your own that stores this
> > > probably wouldn't be too hard, though.
> > >
> > > In the meantime, exposing and utilising the minimum timestamp from
> > > EncodingStats is probably a good place to start to explore the
> viability of
> > > the approach.
> > >
> > > On 14 January 2018 at 15:34, Jeremiah Jordan <je...@datastax.com>
> > > wrote:
> > >
> > >> Don’t forget about deleted and missing data. The bane of all on
> replica
> > >> aggregation optimization’s.
> > >>
> > >> > On Jan 14, 2018, at 12:07 AM, Jeff Jirsa <jj...@gmail.com> wrote:
> > >> >
> > >> >
> > >> > You’re right it’s not stored in metadata now. Adding this to
> metadata
> > >> isn’t hard, it’s just hard to do it right where it’s useful to people
> with
> > >> other data models (besides yours) so it can make it upstream (if
> that’s
> > >> your goal). In particular the worst possible case is a table with no
> > >> clustering key and a single non-partition key column. In that case
> storing
> > >> these extra two long time stamps may be 2-3x more storage than
> without,
> > >> which would be a huge regression, so you’d have to have a way to turn
> that
> > >> feature off.
> > >> >
> > >> >
> > >> > Worth mentioning that there are ways to do this without altering
> > >> Cassandra -  consider using static columns that represent the min
> timestamp
> > >> and max timestamp. Create them both as ints or longs and write them
> on all
> > >> inserts/updates (as part of a batch, if needed). The only thing
> you’ll have
> > >> to do is find a way for “min timestamp” to work - you can set the min
> time
> > >> stamp column with an explicit  “using timestamp” timestamp =
> 2^31-NOW, so
> > >> that future writes won’t overwrite those values. That gives you a
> first
> > >> write win behavior for that column, which gives you an effective min
> > >> timestamp for the partition as a whole.
> > >> >
> > >> > --
> > >> > Jeff Jirsa
> > >> >
> > >> >
> > >> >> On Jan 13, 2018, at 4:58 AM, Arthur Kushka <ar...@gmail.com>
> wrote:
> > >> >>
> > >> >> Hi folks,
> > >> >>
> > >> >> Currently, I working on custom CQL operator that should return the
> max
> > >> >> timestamp for some partition.
> > >> >>
> > >> >> I don't think that scanning of partition for that kind of data is a
> > >> nice
> > >> >> idea. Instead of it, I thinking about adding a metadata to the
> > >> partition. I
> > >> >> want to store minTimestamp and maxTimestamp for every partition as
> it
> > >> >> already done in Memtable`s. That timestamps will be updated on each
> > >> >> mutation operation, that is quite cheap in comparison to full scan.
> > >> >>
> > >> >> I quite new to Cassandra codebase and want to get some critics and
> > >> ideas,
> > >> >> maybe that kind of data already stored somewhere or you have better
> > >> ideas.
> > >> >> Is my assumption right?
> > >> >>
> > >> >> Best,
> > >> >> Artur
> > >> >
> > >> > ------------------------------------------------------------
> ---------
> > >> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > >> > For additional commands, e-mail: dev-help@cassandra.apache.org
> > >> >
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > >> For additional commands, e-mail: dev-help@cassandra.apache.org
> > >>
> > >>
> > >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: Getting partition min/max timestamp

Posted by "arhelmus@gmail.com" <ar...@gmail.com>.

First of all, thx for all the ideas. 

Benedict ElIiott Smith, in code comments I found a notice that data in EncodingStats can be wrong, not sure that its good idea to use it for accurate results. As I understand incorrect data is not a problem for the current use case of it, but not for my one. Currently, I added fields for every AtomicBTreePartition. That fields I update in addAllWithSizeDelta call, but also now I get that I should think about the case of data removing.

I currently don't really care about TTL's, but its the case about I should think, thx.

Jeremiah Jordan, thx for notice, but I don't really get what are you mean about replica aggregation optimizationâs. Can you please explain it for me?

On 2018-01-14 17:16, Benedict Elliott Smith <be...@apache.org> wrote: 
> (Obviously, not to detract from the points that Jon and Jeremiah make, i.e.
> that if TTLs or tombstones are involved the metadata we have, or can add,
> is going to be worthless in most cases anyway)
> 
> On 14 January 2018 at 16:11, Benedict Elliott Smith <be...@apache.org>
> wrote:
> 
> > We already store the minimum timestamp in the EncodingStats of each
> > partition, to support more efficient encoding of atom timestamps.  This
> > just isn't exposed beyond UnfilteredRowIterator, though it probably could
> > be.
> >
> > Storing the max alongside would still require justification, though its
> > cost would actually be fairly nominal (probably only a few bytes; it
> > depends on how far apart min/max are).
> >
> > I'm not sure (IMO) that even a fairly nominal cost could be justified
> > unless there were widespread benefit though, which I'm not sure this would
> > provide.  Maintaining a patched variant of your own that stores this
> > probably wouldn't be too hard, though.
> >
> > In the meantime, exposing and utilising the minimum timestamp from
> > EncodingStats is probably a good place to start to explore the viability of
> > the approach.
> >
> > On 14 January 2018 at 15:34, Jeremiah Jordan <je...@datastax.com>
> > wrote:
> >
> >> Donât forget about deleted and missing data. The bane of all on replica
> >> aggregation optimizationâs.
> >>
> >> > On Jan 14, 2018, at 12:07 AM, Jeff Jirsa <jj...@gmail.com> wrote:
> >> >
> >> >
> >> > Youâre right itâs not stored in metadata now. Adding this to metadata
> >> isnât hard, itâs just hard to do it right where itâs useful to people with
> >> other data models (besides yours) so it can make it upstream (if thatâs
> >> your goal). In particular the worst possible case is a table with no
> >> clustering key and a single non-partition key column. In that case storing
> >> these extra two long time stamps may be 2-3x more storage than without,
> >> which would be a huge regression, so youâd have to have a way to turn that
> >> feature off.
> >> >
> >> >
> >> > Worth mentioning that there are ways to do this without altering
> >> Cassandra -  consider using static columns that represent the min timestamp
> >> and max timestamp. Create them both as ints or longs and write them on all
> >> inserts/updates (as part of a batch, if needed). The only thing youâll have
> >> to do is find a way for âmin timestampâ to work - you can set the min time
> >> stamp column with an explicit  âusing timestampâ timestamp = 2^31-NOW, so
> >> that future writes wonât overwrite those values. That gives you a first
> >> write win behavior for that column, which gives you an effective min
> >> timestamp for the partition as a whole.
> >> >
> >> > --
> >> > Jeff Jirsa
> >> >
> >> >
> >> >> On Jan 13, 2018, at 4:58 AM, Arthur Kushka <ar...@gmail.com> wrote:
> >> >>
> >> >> Hi folks,
> >> >>
> >> >> Currently, I working on custom CQL operator that should return the max
> >> >> timestamp for some partition.
> >> >>
> >> >> I don't think that scanning of partition for that kind of data is a
> >> nice
> >> >> idea. Instead of it, I thinking about adding a metadata to the
> >> partition. I
> >> >> want to store minTimestamp and maxTimestamp for every partition as it
> >> >> already done in Memtable`s. That timestamps will be updated on each
> >> >> mutation operation, that is quite cheap in comparison to full scan.
> >> >>
> >> >> I quite new to Cassandra codebase and want to get some critics and
> >> ideas,
> >> >> maybe that kind of data already stored somewhere or you have better
> >> ideas.
> >> >> Is my assumption right?
> >> >>
> >> >> Best,
> >> >> Artur
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >> >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>
> >>
> >
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Getting partition min/max timestamp

Posted by Benedict Elliott Smith <be...@apache.org>.

(Obviously, not to detract from the points that Jon and Jeremiah make, i.e.
that if TTLs or tombstones are involved the metadata we have, or can add,
is going to be worthless in most cases anyway)

On 14 January 2018 at 16:11, Benedict Elliott Smith <be...@apache.org>
wrote:

> We already store the minimum timestamp in the EncodingStats of each
> partition, to support more efficient encoding of atom timestamps.  This
> just isn't exposed beyond UnfilteredRowIterator, though it probably could
> be.
>
> Storing the max alongside would still require justification, though its
> cost would actually be fairly nominal (probably only a few bytes; it
> depends on how far apart min/max are).
>
> I'm not sure (IMO) that even a fairly nominal cost could be justified
> unless there were widespread benefit though, which I'm not sure this would
> provide.  Maintaining a patched variant of your own that stores this
> probably wouldn't be too hard, though.
>
> In the meantime, exposing and utilising the minimum timestamp from
> EncodingStats is probably a good place to start to explore the viability of
> the approach.
>
> On 14 January 2018 at 15:34, Jeremiah Jordan <je...@datastax.com>
> wrote:
>
>> Don’t forget about deleted and missing data. The bane of all on replica
>> aggregation optimization’s.
>>
>> > On Jan 14, 2018, at 12:07 AM, Jeff Jirsa <jj...@gmail.com> wrote:
>> >
>> >
>> > You’re right it’s not stored in metadata now. Adding this to metadata
>> isn’t hard, it’s just hard to do it right where it’s useful to people with
>> other data models (besides yours) so it can make it upstream (if that’s
>> your goal). In particular the worst possible case is a table with no
>> clustering key and a single non-partition key column. In that case storing
>> these extra two long time stamps may be 2-3x more storage than without,
>> which would be a huge regression, so you’d have to have a way to turn that
>> feature off.
>> >
>> >
>> > Worth mentioning that there are ways to do this without altering
>> Cassandra -  consider using static columns that represent the min timestamp
>> and max timestamp. Create them both as ints or longs and write them on all
>> inserts/updates (as part of a batch, if needed). The only thing you’ll have
>> to do is find a way for “min timestamp” to work - you can set the min time
>> stamp column with an explicit  “using timestamp” timestamp = 2^31-NOW, so
>> that future writes won’t overwrite those values. That gives you a first
>> write win behavior for that column, which gives you an effective min
>> timestamp for the partition as a whole.
>> >
>> > --
>> > Jeff Jirsa
>> >
>> >
>> >> On Jan 13, 2018, at 4:58 AM, Arthur Kushka <ar...@gmail.com> wrote:
>> >>
>> >> Hi folks,
>> >>
>> >> Currently, I working on custom CQL operator that should return the max
>> >> timestamp for some partition.
>> >>
>> >> I don't think that scanning of partition for that kind of data is a
>> nice
>> >> idea. Instead of it, I thinking about adding a metadata to the
>> partition. I
>> >> want to store minTimestamp and maxTimestamp for every partition as it
>> >> already done in Memtable`s. That timestamps will be updated on each
>> >> mutation operation, that is quite cheap in comparison to full scan.
>> >>
>> >> I quite new to Cassandra codebase and want to get some critics and
>> ideas,
>> >> maybe that kind of data already stored somewhere or you have better
>> ideas.
>> >> Is my assumption right?
>> >>
>> >> Best,
>> >> Artur
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> > For additional commands, e-mail: dev-help@cassandra.apache.org
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>
>>
>

Re: Getting partition min/max timestamp

Posted by Benedict Elliott Smith <be...@apache.org>.

We already store the minimum timestamp in the EncodingStats of each
partition, to support more efficient encoding of atom timestamps.  This
just isn't exposed beyond UnfilteredRowIterator, though it probably could
be.

Storing the max alongside would still require justification, though its
cost would actually be fairly nominal (probably only a few bytes; it
depends on how far apart min/max are).

I'm not sure (IMO) that even a fairly nominal cost could be justified
unless there were widespread benefit though, which I'm not sure this would
provide.  Maintaining a patched variant of your own that stores this
probably wouldn't be too hard, though.

In the meantime, exposing and utilising the minimum timestamp from
EncodingStats is probably a good place to start to explore the viability of
the approach.

On 14 January 2018 at 15:34, Jeremiah Jordan <je...@datastax.com> wrote:

> Don’t forget about deleted and missing data. The bane of all on replica
> aggregation optimization’s.
>
> > On Jan 14, 2018, at 12:07 AM, Jeff Jirsa <jj...@gmail.com> wrote:
> >
> >
> > You’re right it’s not stored in metadata now. Adding this to metadata
> isn’t hard, it’s just hard to do it right where it’s useful to people with
> other data models (besides yours) so it can make it upstream (if that’s
> your goal). In particular the worst possible case is a table with no
> clustering key and a single non-partition key column. In that case storing
> these extra two long time stamps may be 2-3x more storage than without,
> which would be a huge regression, so you’d have to have a way to turn that
> feature off.
> >
> >
> > Worth mentioning that there are ways to do this without altering
> Cassandra -  consider using static columns that represent the min timestamp
> and max timestamp. Create them both as ints or longs and write them on all
> inserts/updates (as part of a batch, if needed). The only thing you’ll have
> to do is find a way for “min timestamp” to work - you can set the min time
> stamp column with an explicit  “using timestamp” timestamp = 2^31-NOW, so
> that future writes won’t overwrite those values. That gives you a first
> write win behavior for that column, which gives you an effective min
> timestamp for the partition as a whole.
> >
> > --
> > Jeff Jirsa
> >
> >
> >> On Jan 13, 2018, at 4:58 AM, Arthur Kushka <ar...@gmail.com> wrote:
> >>
> >> Hi folks,
> >>
> >> Currently, I working on custom CQL operator that should return the max
> >> timestamp for some partition.
> >>
> >> I don't think that scanning of partition for that kind of data is a nice
> >> idea. Instead of it, I thinking about adding a metadata to the
> partition. I
> >> want to store minTimestamp and maxTimestamp for every partition as it
> >> already done in Memtable`s. That timestamps will be updated on each
> >> mutation operation, that is quite cheap in comparison to full scan.
> >>
> >> I quite new to Cassandra codebase and want to get some critics and
> ideas,
> >> maybe that kind of data already stored somewhere or you have better
> ideas.
> >> Is my assumption right?
> >>
> >> Best,
> >> Artur
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: Getting partition min/max timestamp

Posted by Jeremiah Jordan <je...@datastax.com>.

Don’t forget about deleted and missing data. The bane of all on replica aggregation optimization’s. 

> On Jan 14, 2018, at 12:07 AM, Jeff Jirsa <jj...@gmail.com> wrote:
> 
> 
> You’re right it’s not stored in metadata now. Adding this to metadata isn’t hard, it’s just hard to do it right where it’s useful to people with other data models (besides yours) so it can make it upstream (if that’s your goal). In particular the worst possible case is a table with no clustering key and a single non-partition key column. In that case storing these extra two long time stamps may be 2-3x more storage than without, which would be a huge regression, so you’d have to have a way to turn that feature off.
> 
> 
> Worth mentioning that there are ways to do this without altering Cassandra -  consider using static columns that represent the min timestamp and max timestamp. Create them both as ints or longs and write them on all inserts/updates (as part of a batch, if needed). The only thing you’ll have to do is find a way for “min timestamp” to work - you can set the min time stamp column with an explicit  “using timestamp” timestamp = 2^31-NOW, so that future writes won’t overwrite those values. That gives you a first write win behavior for that column, which gives you an effective min timestamp for the partition as a whole.
> 
> -- 
> Jeff Jirsa
> 
> 
>> On Jan 13, 2018, at 4:58 AM, Arthur Kushka <ar...@gmail.com> wrote:
>> 
>> Hi folks,
>> 
>> Currently, I working on custom CQL operator that should return the max
>> timestamp for some partition.
>> 
>> I don't think that scanning of partition for that kind of data is a nice
>> idea. Instead of it, I thinking about adding a metadata to the partition. I
>> want to store minTimestamp and maxTimestamp for every partition as it
>> already done in Memtable`s. That timestamps will be updated on each
>> mutation operation, that is quite cheap in comparison to full scan.
>> 
>> I quite new to Cassandra codebase and want to get some critics and ideas,
>> maybe that kind of data already stored somewhere or you have better ideas.
>> Is my assumption right?
>> 
>> Best,
>> Artur
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Getting partition min/max timestamp

Posted by Jeff Jirsa <jj...@gmail.com>.

You’re right it’s not stored in metadata now. Adding this to metadata isn’t hard, it’s just hard to do it right where it’s useful to people with other data models (besides yours) so it can make it upstream (if that’s your goal). In particular the worst possible case is a table with no clustering key and a single non-partition key column. In that case storing these extra two long time stamps may be 2-3x more storage than without, which would be a huge regression, so you’d have to have a way to turn that feature off.

Worth mentioning that there are ways to do this without altering Cassandra -  consider using static columns that represent the min timestamp and max timestamp. Create them both as ints or longs and write them on all inserts/updates (as part of a batch, if needed). The only thing you’ll have to do is find a way for “min timestamp” to work - you can set the min time stamp column with an explicit  “using timestamp” timestamp = 2^31-NOW, so that future writes won’t overwrite those values. That gives you a first write win behavior for that column, which gives you an effective min timestamp for the partition as a whole.

-- 
Jeff Jirsa

> On Jan 13, 2018, at 4:58 AM, Arthur Kushka <ar...@gmail.com> wrote:
> 
> Hi folks,
> 
> Currently, I working on custom CQL operator that should return the max
> timestamp for some partition.
> 
> I don't think that scanning of partition for that kind of data is a nice
> idea. Instead of it, I thinking about adding a metadata to the partition. I
> want to store minTimestamp and maxTimestamp for every partition as it
> already done in Memtable`s. That timestamps will be updated on each
> mutation operation, that is quite cheap in comparison to full scan.
> 
> I quite new to Cassandra codebase and want to get some critics and ideas,
> maybe that kind of data already stored somewhere or you have better ideas.
> Is my assumption right?
> 
> Best,
> Artur

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: Getting partition min/max timestamp

Posted by Jonathan Haddad <jo...@jonhaddad.com>.

Do you need to support TTLs? That might be a bit of an issue.
On Sat, Jan 13, 2018 at 12:41 PM Arthur Kushka <ar...@gmail.com> wrote:

> Hi folks,
>
> Currently, I working on custom CQL operator that should return the max
> timestamp for some partition.
>
> I don't think that scanning of partition for that kind of data is a nice
> idea. Instead of it, I thinking about adding a metadata to the partition. I
> want to store minTimestamp and maxTimestamp for every partition as it
> already done in Memtable`s. That timestamps will be updated on each
> mutation operation, that is quite cheap in comparison to full scan.
>
> I quite new to Cassandra codebase and want to get some critics and ideas,
> maybe that kind of data already stored somewhere or you have better ideas.
> Is my assumption right?
>
> Best,
> Artur
>