You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samza.apache.org by Boris Shkolnik <bo...@gmail.com> on 2016/02/04 02:35:06 UTC

Re: ChangeLog Question for TTL rocksDB stores

As Jacob mentioned there is not direct relationship between the rocksdb tts
(internal to rocksdb) and changelog (done by Samza).
The problem may arise if the store is restored from the changelog, since
the log will have the expired entries, and they will be entered with the
NEW date (and as Yi mentioned, there is no TTL on kafka based changelogs
now).
But since it is not an error per se, SAMZA-862
<https://issues.apache.org/jira/browse/SAMZA-862> has changed this message
to be a warning instead of an error.

On Thu, Jan 28, 2016 at 11:51 AM, Yi Pan <ni...@gmail.com> wrote:

> Hi, David,
>
> The "compaction" referred to together w/ TTL is referring to RocksDb's
> compaction, not the Kafka-based changelog topic. Currently, TTL is not
> applied to Kafka-based changelog topic. SAMZA-677 is opened for this.
>
> -Yi
>
> On Thu, Jan 28, 2016 at 11:36 AM, David Garcia
> <dgarcia@homeaway.com.invalid
> > wrote:
>
> > Ok, that makes sense.  I had assumed that the changelog was supported
> > because the docs mention that TTL is enforced upon ³compaction² (I had
> > assumed compaction of the DB changelog).  Which topic does the TTL policy
> > listen for the compaction of (since compaction policies of topics can
> > differ)?
> >
> > -David
> >
> > On 1/27/16, 8:46 PM, "Jacob Maes" <ja...@gmail.com> wrote:
> >
> > >Here's my understanding. The others can correct me if I'm mistaken.
> > >
> > >Samza provides the changelog functionality by intercepting RocksDB "put"
> > >and "delete" operations. However, TTL is managed by RocksDB internally
> and
> > >there aren't any hooks exposed in the RocksDB JNI. So there are 2
> problems
> > >that arise with TTL and change logging:
> > >1. Samza doesn't know when an entry expires, so it can't delete the
> > >expired
> > >entry from the changelog.
> > >2. The changelog currently has no concept of entry age/timestamp, so
> when
> > >the changelog is restored, it's unknown whether some subset (or all) of
> > >the
> > >entries should be immediately expired.
> > >
> > >These issues aren't insurmountable, but they weren't pursued for the
> > >initial implementation. Perhaps because there was a shortage of use
> cases
> > >that needed both TTL and changelogging, but I'm not sure.
> > >
> > >-Jake
> > >
> > >On Wed, Jan 27, 2016 at 6:19 PM, David Garcia
> > ><dg...@homeaway.com.invalid>
> > >wrote:
> > >
> > >> So, I saw this very scary message:
> > >>
> > >>
> > >> ERROR - e.kv.RocksDbKeyValueStore$ - sessionJoinStore is a TTL based
> > >> store, changelog is not supported for TTL based stores, use at your
> own
> > >> discretion
> > >>
> > >>
> > >>
> > >>
> > >> A few of questions:
> > >>
> > >> 1.) Does this mean that this store is NOT backed by the changelog?
> > >>
> > >> 2.) Provided that the store IS backed by a change log, do the TTL
> > >> expirations commit removals from the changelog (I.e.
> Nulls)...presumably
> > >> upon compaction
> > >>
> > >> 3.) Can I please get a bit more detail on how TTL affects a changelog
> > >> store?
> > >>
> > >>
> > >> -David
> > >>
> >
> >
>

Re: ChangeLog Question for TTL rocksDB stores

Posted by Boris Shkolnik <bo...@gmail.com>.
That's the reason for the warning! In my understanding it may happen.
Events deleted by RocksDB TTL may be added back if restored from the
changelog.

Boris.

On Wed, Feb 3, 2016 at 5:46 PM, David Garcia <dg...@homeaway.com.invalid>
wrote:

> Boris, thank you for the clarification.  But just to make sure I
> understand, is it correct to say that entries deleted by the TTL-policy in
> rocksDB will NOT be logged in the change-log?  My job processes a lot of
> data and saves a large portion of it to RocksDB (for reference later…but
> subject to a retention policy).  I need to ensure that rocksDB doesn’t
> grow uncontrollably.  If the TTL isn’t reflected in the changelog, then
> it’s quite possible that job restart will push too many messages into
> rocksDB.  Thx again for the help!
>
> -David
>
> On 2/3/16, 7:35 PM, "Boris Shkolnik" <bo...@gmail.com> wrote:
>
> >As Jacob mentioned there is not direct relationship between the rocksdb
> >tts
> >(internal to rocksdb) and changelog (done by Samza).
> >The problem may arise if the store is restored from the changelog, since
> >the log will have the expired entries, and they will be entered with the
> >NEW date (and as Yi mentioned, there is no TTL on kafka based changelogs
> >now).
> >But since it is not an error per se, SAMZA-862
> ><https://issues.apache.org/jira/browse/SAMZA-862> has changed this
> message
> >to be a warning instead of an error.
> >
> >On Thu, Jan 28, 2016 at 11:51 AM, Yi Pan <ni...@gmail.com> wrote:
> >
> >> Hi, David,
> >>
> >> The "compaction" referred to together w/ TTL is referring to RocksDb's
> >> compaction, not the Kafka-based changelog topic. Currently, TTL is not
> >> applied to Kafka-based changelog topic. SAMZA-677 is opened for this.
> >>
> >> -Yi
> >>
> >> On Thu, Jan 28, 2016 at 11:36 AM, David Garcia
> >> <dgarcia@homeaway.com.invalid
> >> > wrote:
> >>
> >> > Ok, that makes sense.  I had assumed that the changelog was supported
> >> > because the docs mention that TTL is enforced upon ³compaction² (I had
> >> > assumed compaction of the DB changelog).  Which topic does the TTL
> >>policy
> >> > listen for the compaction of (since compaction policies of topics can
> >> > differ)?
> >> >
> >> > -David
> >> >
> >> > On 1/27/16, 8:46 PM, "Jacob Maes" <ja...@gmail.com> wrote:
> >> >
> >> > >Here's my understanding. The others can correct me if I'm mistaken.
> >> > >
> >> > >Samza provides the changelog functionality by intercepting RocksDB
> >>"put"
> >> > >and "delete" operations. However, TTL is managed by RocksDB
> >>internally
> >> and
> >> > >there aren't any hooks exposed in the RocksDB JNI. So there are 2
> >> problems
> >> > >that arise with TTL and change logging:
> >> > >1. Samza doesn't know when an entry expires, so it can't delete the
> >> > >expired
> >> > >entry from the changelog.
> >> > >2. The changelog currently has no concept of entry age/timestamp, so
> >> when
> >> > >the changelog is restored, it's unknown whether some subset (or all)
> >>of
> >> > >the
> >> > >entries should be immediately expired.
> >> > >
> >> > >These issues aren't insurmountable, but they weren't pursued for the
> >> > >initial implementation. Perhaps because there was a shortage of use
> >> cases
> >> > >that needed both TTL and changelogging, but I'm not sure.
> >> > >
> >> > >-Jake
> >> > >
> >> > >On Wed, Jan 27, 2016 at 6:19 PM, David Garcia
> >> > ><dg...@homeaway.com.invalid>
> >> > >wrote:
> >> > >
> >> > >> So, I saw this very scary message:
> >> > >>
> >> > >>
> >> > >> ERROR - e.kv.RocksDbKeyValueStore$ - sessionJoinStore is a TTL
> >>based
> >> > >> store, changelog is not supported for TTL based stores, use at your
> >> own
> >> > >> discretion
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >> A few of questions:
> >> > >>
> >> > >> 1.) Does this mean that this store is NOT backed by the changelog?
> >> > >>
> >> > >> 2.) Provided that the store IS backed by a change log, do the TTL
> >> > >> expirations commit removals from the changelog (I.e.
> >> Nulls)...presumably
> >> > >> upon compaction
> >> > >>
> >> > >> 3.) Can I please get a bit more detail on how TTL affects a
> >>changelog
> >> > >> store?
> >> > >>
> >> > >>
> >> > >> -David
> >> > >>
> >> >
> >> >
> >>
>
>

Re: ChangeLog Question for TTL rocksDB stores

Posted by Tao Feng <fe...@gmail.com>.
Hi David,

My understanding is that Samza changelog could still be logging those
entries deleted by RocksDB TTL. You could refer to SAMZA-677 for more
information.

Thanks,
-Tao

On Wed, Feb 3, 2016 at 5:46 PM, David Garcia <dg...@homeaway.com.invalid>
wrote:

> Boris, thank you for the clarification.  But just to make sure I
> understand, is it correct to say that entries deleted by the TTL-policy in
> rocksDB will NOT be logged in the change-log?  My job processes a lot of
> data and saves a large portion of it to RocksDB (for reference later…but
> subject to a retention policy).  I need to ensure that rocksDB doesn’t
> grow uncontrollably.  If the TTL isn’t reflected in the changelog, then
> it’s quite possible that job restart will push too many messages into
> rocksDB.  Thx again for the help!
>
> -David
>
> On 2/3/16, 7:35 PM, "Boris Shkolnik" <bo...@gmail.com> wrote:
>
> >As Jacob mentioned there is not direct relationship between the rocksdb
> >tts
> >(internal to rocksdb) and changelog (done by Samza).
> >The problem may arise if the store is restored from the changelog, since
> >the log will have the expired entries, and they will be entered with the
> >NEW date (and as Yi mentioned, there is no TTL on kafka based changelogs
> >now).
> >But since it is not an error per se, SAMZA-862
> ><https://issues.apache.org/jira/browse/SAMZA-862> has changed this
> message
> >to be a warning instead of an error.
> >
> >On Thu, Jan 28, 2016 at 11:51 AM, Yi Pan <ni...@gmail.com> wrote:
> >
> >> Hi, David,
> >>
> >> The "compaction" referred to together w/ TTL is referring to RocksDb's
> >> compaction, not the Kafka-based changelog topic. Currently, TTL is not
> >> applied to Kafka-based changelog topic. SAMZA-677 is opened for this.
> >>
> >> -Yi
> >>
> >> On Thu, Jan 28, 2016 at 11:36 AM, David Garcia
> >> <dgarcia@homeaway.com.invalid
> >> > wrote:
> >>
> >> > Ok, that makes sense.  I had assumed that the changelog was supported
> >> > because the docs mention that TTL is enforced upon ³compaction² (I had
> >> > assumed compaction of the DB changelog).  Which topic does the TTL
> >>policy
> >> > listen for the compaction of (since compaction policies of topics can
> >> > differ)?
> >> >
> >> > -David
> >> >
> >> > On 1/27/16, 8:46 PM, "Jacob Maes" <ja...@gmail.com> wrote:
> >> >
> >> > >Here's my understanding. The others can correct me if I'm mistaken.
> >> > >
> >> > >Samza provides the changelog functionality by intercepting RocksDB
> >>"put"
> >> > >and "delete" operations. However, TTL is managed by RocksDB
> >>internally
> >> and
> >> > >there aren't any hooks exposed in the RocksDB JNI. So there are 2
> >> problems
> >> > >that arise with TTL and change logging:
> >> > >1. Samza doesn't know when an entry expires, so it can't delete the
> >> > >expired
> >> > >entry from the changelog.
> >> > >2. The changelog currently has no concept of entry age/timestamp, so
> >> when
> >> > >the changelog is restored, it's unknown whether some subset (or all)
> >>of
> >> > >the
> >> > >entries should be immediately expired.
> >> > >
> >> > >These issues aren't insurmountable, but they weren't pursued for the
> >> > >initial implementation. Perhaps because there was a shortage of use
> >> cases
> >> > >that needed both TTL and changelogging, but I'm not sure.
> >> > >
> >> > >-Jake
> >> > >
> >> > >On Wed, Jan 27, 2016 at 6:19 PM, David Garcia
> >> > ><dg...@homeaway.com.invalid>
> >> > >wrote:
> >> > >
> >> > >> So, I saw this very scary message:
> >> > >>
> >> > >>
> >> > >> ERROR - e.kv.RocksDbKeyValueStore$ - sessionJoinStore is a TTL
> >>based
> >> > >> store, changelog is not supported for TTL based stores, use at your
> >> own
> >> > >> discretion
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >> A few of questions:
> >> > >>
> >> > >> 1.) Does this mean that this store is NOT backed by the changelog?
> >> > >>
> >> > >> 2.) Provided that the store IS backed by a change log, do the TTL
> >> > >> expirations commit removals from the changelog (I.e.
> >> Nulls)...presumably
> >> > >> upon compaction
> >> > >>
> >> > >> 3.) Can I please get a bit more detail on how TTL affects a
> >>changelog
> >> > >> store?
> >> > >>
> >> > >>
> >> > >> -David
> >> > >>
> >> >
> >> >
> >>
>
>

Re: ChangeLog Question for TTL rocksDB stores

Posted by David Garcia <dg...@homeaway.com.INVALID>.
Boris, thank you for the clarification.  But just to make sure I
understand, is it correct to say that entries deleted by the TTL-policy in
rocksDB will NOT be logged in the change-log?  My job processes a lot of
data and saves a large portion of it to RocksDB (for reference later…but
subject to a retention policy).  I need to ensure that rocksDB doesn’t
grow uncontrollably.  If the TTL isn’t reflected in the changelog, then
it’s quite possible that job restart will push too many messages into
rocksDB.  Thx again for the help!

-David

On 2/3/16, 7:35 PM, "Boris Shkolnik" <bo...@gmail.com> wrote:

>As Jacob mentioned there is not direct relationship between the rocksdb
>tts
>(internal to rocksdb) and changelog (done by Samza).
>The problem may arise if the store is restored from the changelog, since
>the log will have the expired entries, and they will be entered with the
>NEW date (and as Yi mentioned, there is no TTL on kafka based changelogs
>now).
>But since it is not an error per se, SAMZA-862
><https://issues.apache.org/jira/browse/SAMZA-862> has changed this message
>to be a warning instead of an error.
>
>On Thu, Jan 28, 2016 at 11:51 AM, Yi Pan <ni...@gmail.com> wrote:
>
>> Hi, David,
>>
>> The "compaction" referred to together w/ TTL is referring to RocksDb's
>> compaction, not the Kafka-based changelog topic. Currently, TTL is not
>> applied to Kafka-based changelog topic. SAMZA-677 is opened for this.
>>
>> -Yi
>>
>> On Thu, Jan 28, 2016 at 11:36 AM, David Garcia
>> <dgarcia@homeaway.com.invalid
>> > wrote:
>>
>> > Ok, that makes sense.  I had assumed that the changelog was supported
>> > because the docs mention that TTL is enforced upon ³compaction² (I had
>> > assumed compaction of the DB changelog).  Which topic does the TTL
>>policy
>> > listen for the compaction of (since compaction policies of topics can
>> > differ)?
>> >
>> > -David
>> >
>> > On 1/27/16, 8:46 PM, "Jacob Maes" <ja...@gmail.com> wrote:
>> >
>> > >Here's my understanding. The others can correct me if I'm mistaken.
>> > >
>> > >Samza provides the changelog functionality by intercepting RocksDB
>>"put"
>> > >and "delete" operations. However, TTL is managed by RocksDB
>>internally
>> and
>> > >there aren't any hooks exposed in the RocksDB JNI. So there are 2
>> problems
>> > >that arise with TTL and change logging:
>> > >1. Samza doesn't know when an entry expires, so it can't delete the
>> > >expired
>> > >entry from the changelog.
>> > >2. The changelog currently has no concept of entry age/timestamp, so
>> when
>> > >the changelog is restored, it's unknown whether some subset (or all)
>>of
>> > >the
>> > >entries should be immediately expired.
>> > >
>> > >These issues aren't insurmountable, but they weren't pursued for the
>> > >initial implementation. Perhaps because there was a shortage of use
>> cases
>> > >that needed both TTL and changelogging, but I'm not sure.
>> > >
>> > >-Jake
>> > >
>> > >On Wed, Jan 27, 2016 at 6:19 PM, David Garcia
>> > ><dg...@homeaway.com.invalid>
>> > >wrote:
>> > >
>> > >> So, I saw this very scary message:
>> > >>
>> > >>
>> > >> ERROR - e.kv.RocksDbKeyValueStore$ - sessionJoinStore is a TTL
>>based
>> > >> store, changelog is not supported for TTL based stores, use at your
>> own
>> > >> discretion
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> A few of questions:
>> > >>
>> > >> 1.) Does this mean that this store is NOT backed by the changelog?
>> > >>
>> > >> 2.) Provided that the store IS backed by a change log, do the TTL
>> > >> expirations commit removals from the changelog (I.e.
>> Nulls)...presumably
>> > >> upon compaction
>> > >>
>> > >> 3.) Can I please get a bit more detail on how TTL affects a
>>changelog
>> > >> store?
>> > >>
>> > >>
>> > >> -David
>> > >>
>> >
>> >
>>