You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cassandra.apache.org by Jacek Lewandowski <le...@gmail.com> on 2021/10/22 08:24:25 UTC

[DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

I'd like to start a discussion about SSTable format API proposal (CEP-17)

Jira: https://issues.apache.org/jira/browse/CASSANDRA-17056
CEP: https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-17%3A+SSTable+format+API

Thanks,
Jacek

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org


Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

Posted by David Capwell <dc...@apple.com.INVALID>.
Inline

> On Nov 1, 2021, at 9:23 AM, Branimir Lambov <bl...@apache.org> wrote:
> 
> As Jacek is not a committer, this proposal needs a shepherd. I would be
> happy to take this role.
> 
>> to me the interfaces has to be at the SSTable level, which then expose
> readers/writers, but also has to expose the other things we do outside of
> those paths
> 
> Could you give some detail on what these things are? Are they something
> different from what the standalone Cassandra tools (scrub/verify/upgrade)
> are currently doing? Obviously, any pluggability proposal will have to
> provide a solution to these, and it would be helpful to know what needs to
> be done beyond making sure the bundled tools work correctly (which includes
> iterating indexes; format-specific operations (e.g. index summary
> redistribution) are excluded as they are to be handled by the individual
> format).

Looking closer at compaction and repair I had forgotten that they were changed in CASSANDRA-15861 to go through the reader interface rather than directly mutate the files (concurrency bug).  I was thinking the logic which is now org.apache.cassandra.io.sstable.format.SSTableReader#mutateLevelAndReload and org.apache.cassandra.io.sstable.format.SSTableReader#mutateRepairedAndReload; so I believe compaction/repair may be ok with reader/writer; ignore those examples.

Checking usage of descriptor you find examples like

org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader#read - which calls: writer.descriptor.getMetadataSerializer().mutate(writer.descriptor, description, transform);
org.apache.cassandra.tools.Util#metadataFromSSTable - which is used by sstablemetadata tool
org.apache.cassandra.io.sstable.KeyIterator#KeyIterator - directly loads primary index from descriptor: new In(new File(desc.filenameFor(Component.PRIMARY_INDEX)));

Non of the examples I see couldn’t be rewritten to use read/writer; so relying on reader/writer as the main interfaces would work.

> 
> There is another problem in the current code alluded to in the question, in
> the fact that "SSTableReader" (tied to the sstable format and ready for
> querying data (i.e. with open data files and bloom filters loaded in
> memory)) is the only concept that the code uses to work with sstables. As I
> understand it, this proposal does not aim to solve that problem, only to
> make sure that we can properly read and write sstables of a given format,
> including in streaming and standalone tools. In other words, to provide the
> machinery to convert sstable descriptors into sstable readers and writers.
> 
> I see this as an expansion of CASSANDRA-7443 and cleanup of any changes
> that came after it and broke the intended capability.
> 
> Regards,
> Branimir
> 
> On Thu, Oct 28, 2021 at 7:43 PM David Capwell <dc...@apple.com.invalid>
> wrote:
> 
>> Sorry about that; used -1/+1 to show preference, not binding action
>> 
>>> On Oct 28, 2021, at 5:50 AM, benedict@apache.org wrote:
>>> 
>>>> I am -1 here, for the reasons listed above; the problem (in my eye) is
>> not reader/writer but higher level at the actual SSTable.  If we plug out
>> read/write but still allow direct file access, then these abstractions fail
>> to provide the goals of the CEP.
>>> 
>>> Be careful dropping -1s, as your -1s here are binding. I realise this
>> isn’t a vote thread, but the effect is the same. IMO we should try to
>> express our preferences and defer to the collective opinion where possible.
>> True -1s should very rarely appear.
>>> 
>>> 
>>> From: David Capwell <dc...@apple.com.INVALID>
>>> Date: Wednesday, 27 October 2021 at 15:33
>>> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
>>> Subject: Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)
>>> Reading the CEP I don’t see any mention to the systems which access
>> SSTables; such as streaming (small callout to zero-copy-streaming with
>> ZeroCopyBigTableWriter) and repair.  If you are abstracting out
>> BigTableReader then you are not dealing with the implementation assumptions
>> that users of SSTables have (such as direct mutation to auxiliary files
>> outside of -Data.db).
>>> 
>>>> Audience
>>>>      • Cassandra developers who wish to see SSTableReader and
>> SSTableWriter more modular than they are today,
>>> 
>>> This statement relates to the above comment, many parts of the code do
>> not use Reader/Writer but instead use direct format knowledge to apply
>> changes to the file format (normally outside of -Data.db); to me the
>> interfaces has to be at the SSTable level, which then expose
>> readers/writers, but also has to expose the other things we do outside of
>> those paths.
>>> 
>>>>      • move the metrics related to sstable format out from
>> TableMetrics class and make them tied to certain sstable implementation
>>> 
>>> I am curious about this comment, are you removing exposing this
>> information?
>>> 
>>>>      • have a single factory for creating both readers and writers for
>> particular implementation of sstable and use it consistently - no direct
>> creation of any reader / writer
>>> 
>>> I am -1 here, for the reasons listed above; the problem (in my eye) is
>> not reader/writer but higher level at the actual SSTable.  If we plug out
>> read/write but still allow direct file access, then these abstractions fail
>> to provide the goals of the CEP.
>>> 
>>> I am +1 to the intent of the CEP.
>>> 
>>> And last comment, which I have also done in the other modularity thread…
>> backwards compatibility and maintenance. It is not clear right now what
>> java interfaces may not break and how we can maintain and extend such
>> interfaces in the future.  If the goal is to allow 3rd parties to plugin
>> and offer new SSTable formats, are we as a project ok with having a minor
>> release do a binary or source non-compatible change?  If not how do we
>> detect this?  Until this problem is solved, I do not think we should add
>> any such interfaces.
>>> 
>>>> On Oct 22, 2021, at 7:23 AM, Jeremiah Jordan <je...@gmail.com>
>> wrote:
>>>> 
>>>> Hi Stefan,
>>>> That idea is not related to this CEP which is about the file formats of
>> the
>>>> sstables, not file system access.  But you should take a look at the
>> work
>>>> recently committed in
>> https://issues.apache.org/jira/browse/CASSANDRA-16926
>>>> to switch to using java.nio.file.Path for file access.  This should
>> allow
>>>> the use of a file system provider to access files which could be the
>> basis
>>>> for work to load the files from S3.
>>>> 
>>>> -Jeremiah
>>>> 
>>>> On Fri, Oct 22, 2021 at 4:07 AM Stefan Miklosovic <
>>>> stefan.miklosovic@instaclustr.com> wrote:
>>>> 
>>>>> One point I would like to add to this; I was already looking into how
>>>>> to extend this but what I saw in SSTableReader was that it is very
>>>>> much "file system oriented". There was not any possibility to actually
>>>>> hook something like that there. I think what importing does is that it
>>>>> will use SSTableReader / Writer stuff so I think that the modification
>>>>> of these classes to accommodate this idea would be necessary.
>>>>> 
>>>>> On Fri, 22 Oct 2021 at 11:02, Stefan Miklosovic
>>>>> <st...@instaclustr.com> wrote:
>>>>>> 
>>>>>> Hi Jacek,
>>>>>> 
>>>>>> Thanks for taking the lead on this.
>>>>>> 
>>>>>> There was importing of SSTables introduced in 4.0 via
>>>>>> StorageService#importNewSSTables. The "problem" with this is that
>>>>>> SSTables need to be physically located at disk so Cassandra can read
>>>>>> them. If a backup is taken and SSTables are uploaded to, for example,
>>>>>> S3 bucket, then upon restore, all these SSTables need to be downloaded
>>>>>> first and then imported. What about downloading them / importing them
>>>>>> directly from S3? Or any custom source for that matter? Importing of
>>>>>> SSTables is a very nice feature in 4.0, we do not need to copy / hard
>>>>>> link / refresh, it is all handled internally.
>>>>>> 
>>>>>> I am not sure if your work is related to this idea but I would
>>>>>> appreciate it if this is pluggable as well for the sake of simplicity
>>>>>> and effectiveness as we would not have to download all sstables before
>>>>>> importing them.
>>>>>> 
>>>>>> If it is not related, feel free to skip that completely and I guess I
>>>>>> would have to try to push that forward myself.
>>>>>> 
>>>>>> Regards
>>>>>> 
>>>>>> 
>>>>>> On Fri, 22 Oct 2021 at 10:24, Jacek Lewandowski
>>>>>> <le...@gmail.com> wrote:
>>>>>>> 
>>>>>>> I'd like to start a discussion about SSTable format API proposal
>>>>> (CEP-17)
>>>>>>> 
>>>>>>> Jira: https://issues.apache.org/jira/browse/CASSANDRA-17056
>>>>>>> CEP:
>>>>> 
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-17%3A+SSTable+format+API
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Jacek
>>>>>>> 
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>>> 
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>> 
>>>>> 
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org


Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

Posted by Branimir Lambov <bl...@apache.org>.
As Jacek is not a committer, this proposal needs a shepherd. I would be
happy to take this role.

> to me the interfaces has to be at the SSTable level, which then expose
readers/writers, but also has to expose the other things we do outside of
those paths

Could you give some detail on what these things are? Are they something
different from what the standalone Cassandra tools (scrub/verify/upgrade)
are currently doing? Obviously, any pluggability proposal will have to
provide a solution to these, and it would be helpful to know what needs to
be done beyond making sure the bundled tools work correctly (which includes
iterating indexes; format-specific operations (e.g. index summary
redistribution) are excluded as they are to be handled by the individual
format).

There is another problem in the current code alluded to in the question, in
the fact that "SSTableReader" (tied to the sstable format and ready for
querying data (i.e. with open data files and bloom filters loaded in
memory)) is the only concept that the code uses to work with sstables. As I
understand it, this proposal does not aim to solve that problem, only to
make sure that we can properly read and write sstables of a given format,
including in streaming and standalone tools. In other words, to provide the
machinery to convert sstable descriptors into sstable readers and writers.

I see this as an expansion of CASSANDRA-7443 and cleanup of any changes
that came after it and broke the intended capability.

Regards,
Branimir

On Thu, Oct 28, 2021 at 7:43 PM David Capwell <dc...@apple.com.invalid>
wrote:

> Sorry about that; used -1/+1 to show preference, not binding action
>
> > On Oct 28, 2021, at 5:50 AM, benedict@apache.org wrote:
> >
> >> I am -1 here, for the reasons listed above; the problem (in my eye) is
> not reader/writer but higher level at the actual SSTable.  If we plug out
> read/write but still allow direct file access, then these abstractions fail
> to provide the goals of the CEP.
> >
> > Be careful dropping -1s, as your -1s here are binding. I realise this
> isn’t a vote thread, but the effect is the same. IMO we should try to
> express our preferences and defer to the collective opinion where possible.
> True -1s should very rarely appear.
> >
> >
> > From: David Capwell <dc...@apple.com.INVALID>
> > Date: Wednesday, 27 October 2021 at 15:33
> > To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> > Subject: Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)
> > Reading the CEP I don’t see any mention to the systems which access
> SSTables; such as streaming (small callout to zero-copy-streaming with
> ZeroCopyBigTableWriter) and repair.  If you are abstracting out
> BigTableReader then you are not dealing with the implementation assumptions
> that users of SSTables have (such as direct mutation to auxiliary files
> outside of -Data.db).
> >
> >> Audience
> >>       • Cassandra developers who wish to see SSTableReader and
> SSTableWriter more modular than they are today,
> >
> > This statement relates to the above comment, many parts of the code do
> not use Reader/Writer but instead use direct format knowledge to apply
> changes to the file format (normally outside of -Data.db); to me the
> interfaces has to be at the SSTable level, which then expose
> readers/writers, but also has to expose the other things we do outside of
> those paths.
> >
> >>       • move the metrics related to sstable format out from
> TableMetrics class and make them tied to certain sstable implementation
> >
> > I am curious about this comment, are you removing exposing this
> information?
> >
> >>       • have a single factory for creating both readers and writers for
> particular implementation of sstable and use it consistently - no direct
> creation of any reader / writer
> >
> > I am -1 here, for the reasons listed above; the problem (in my eye) is
> not reader/writer but higher level at the actual SSTable.  If we plug out
> read/write but still allow direct file access, then these abstractions fail
> to provide the goals of the CEP.
> >
> > I am +1 to the intent of the CEP.
> >
> > And last comment, which I have also done in the other modularity thread…
> backwards compatibility and maintenance. It is not clear right now what
> java interfaces may not break and how we can maintain and extend such
> interfaces in the future.  If the goal is to allow 3rd parties to plugin
> and offer new SSTable formats, are we as a project ok with having a minor
> release do a binary or source non-compatible change?  If not how do we
> detect this?  Until this problem is solved, I do not think we should add
> any such interfaces.
> >
> >> On Oct 22, 2021, at 7:23 AM, Jeremiah Jordan <je...@gmail.com>
> wrote:
> >>
> >> Hi Stefan,
> >> That idea is not related to this CEP which is about the file formats of
> the
> >> sstables, not file system access.  But you should take a look at the
> work
> >> recently committed in
> https://issues.apache.org/jira/browse/CASSANDRA-16926
> >> to switch to using java.nio.file.Path for file access.  This should
> allow
> >> the use of a file system provider to access files which could be the
> basis
> >> for work to load the files from S3.
> >>
> >> -Jeremiah
> >>
> >> On Fri, Oct 22, 2021 at 4:07 AM Stefan Miklosovic <
> >> stefan.miklosovic@instaclustr.com> wrote:
> >>
> >>> One point I would like to add to this; I was already looking into how
> >>> to extend this but what I saw in SSTableReader was that it is very
> >>> much "file system oriented". There was not any possibility to actually
> >>> hook something like that there. I think what importing does is that it
> >>> will use SSTableReader / Writer stuff so I think that the modification
> >>> of these classes to accommodate this idea would be necessary.
> >>>
> >>> On Fri, 22 Oct 2021 at 11:02, Stefan Miklosovic
> >>> <st...@instaclustr.com> wrote:
> >>>>
> >>>> Hi Jacek,
> >>>>
> >>>> Thanks for taking the lead on this.
> >>>>
> >>>> There was importing of SSTables introduced in 4.0 via
> >>>> StorageService#importNewSSTables. The "problem" with this is that
> >>>> SSTables need to be physically located at disk so Cassandra can read
> >>>> them. If a backup is taken and SSTables are uploaded to, for example,
> >>>> S3 bucket, then upon restore, all these SSTables need to be downloaded
> >>>> first and then imported. What about downloading them / importing them
> >>>> directly from S3? Or any custom source for that matter? Importing of
> >>>> SSTables is a very nice feature in 4.0, we do not need to copy / hard
> >>>> link / refresh, it is all handled internally.
> >>>>
> >>>> I am not sure if your work is related to this idea but I would
> >>>> appreciate it if this is pluggable as well for the sake of simplicity
> >>>> and effectiveness as we would not have to download all sstables before
> >>>> importing them.
> >>>>
> >>>> If it is not related, feel free to skip that completely and I guess I
> >>>> would have to try to push that forward myself.
> >>>>
> >>>> Regards
> >>>>
> >>>>
> >>>> On Fri, 22 Oct 2021 at 10:24, Jacek Lewandowski
> >>>> <le...@gmail.com> wrote:
> >>>>>
> >>>>> I'd like to start a discussion about SSTable format API proposal
> >>> (CEP-17)
> >>>>>
> >>>>> Jira: https://issues.apache.org/jira/browse/CASSANDRA-17056
> >>>>> CEP:
> >>>
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-17%3A+SSTable+format+API
> >>>>>
> >>>>> Thanks,
> >>>>> Jacek
> >>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>
> >>>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

Posted by David Capwell <dc...@apple.com.INVALID>.
Sorry about that; used -1/+1 to show preference, not binding action

> On Oct 28, 2021, at 5:50 AM, benedict@apache.org wrote:
> 
>> I am -1 here, for the reasons listed above; the problem (in my eye) is not reader/writer but higher level at the actual SSTable.  If we plug out read/write but still allow direct file access, then these abstractions fail to provide the goals of the CEP.
> 
> Be careful dropping -1s, as your -1s here are binding. I realise this isn’t a vote thread, but the effect is the same. IMO we should try to express our preferences and defer to the collective opinion where possible. True -1s should very rarely appear.
> 
> 
> From: David Capwell <dc...@apple.com.INVALID>
> Date: Wednesday, 27 October 2021 at 15:33
> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> Subject: Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)
> Reading the CEP I don’t see any mention to the systems which access SSTables; such as streaming (small callout to zero-copy-streaming with ZeroCopyBigTableWriter) and repair.  If you are abstracting out BigTableReader then you are not dealing with the implementation assumptions that users of SSTables have (such as direct mutation to auxiliary files outside of -Data.db).
> 
>> Audience
>>       • Cassandra developers who wish to see SSTableReader and SSTableWriter more modular than they are today,
> 
> This statement relates to the above comment, many parts of the code do not use Reader/Writer but instead use direct format knowledge to apply changes to the file format (normally outside of -Data.db); to me the interfaces has to be at the SSTable level, which then expose readers/writers, but also has to expose the other things we do outside of those paths.
> 
>>       • move the metrics related to sstable format out from TableMetrics class and make them tied to certain sstable implementation
> 
> I am curious about this comment, are you removing exposing this information?
> 
>>       • have a single factory for creating both readers and writers for particular implementation of sstable and use it consistently - no direct creation of any reader / writer
> 
> I am -1 here, for the reasons listed above; the problem (in my eye) is not reader/writer but higher level at the actual SSTable.  If we plug out read/write but still allow direct file access, then these abstractions fail to provide the goals of the CEP.
> 
> I am +1 to the intent of the CEP.
> 
> And last comment, which I have also done in the other modularity thread… backwards compatibility and maintenance. It is not clear right now what java interfaces may not break and how we can maintain and extend such interfaces in the future.  If the goal is to allow 3rd parties to plugin and offer new SSTable formats, are we as a project ok with having a minor release do a binary or source non-compatible change?  If not how do we detect this?  Until this problem is solved, I do not think we should add any such interfaces.
> 
>> On Oct 22, 2021, at 7:23 AM, Jeremiah Jordan <je...@gmail.com> wrote:
>> 
>> Hi Stefan,
>> That idea is not related to this CEP which is about the file formats of the
>> sstables, not file system access.  But you should take a look at the work
>> recently committed in https://issues.apache.org/jira/browse/CASSANDRA-16926
>> to switch to using java.nio.file.Path for file access.  This should allow
>> the use of a file system provider to access files which could be the basis
>> for work to load the files from S3.
>> 
>> -Jeremiah
>> 
>> On Fri, Oct 22, 2021 at 4:07 AM Stefan Miklosovic <
>> stefan.miklosovic@instaclustr.com> wrote:
>> 
>>> One point I would like to add to this; I was already looking into how
>>> to extend this but what I saw in SSTableReader was that it is very
>>> much "file system oriented". There was not any possibility to actually
>>> hook something like that there. I think what importing does is that it
>>> will use SSTableReader / Writer stuff so I think that the modification
>>> of these classes to accommodate this idea would be necessary.
>>> 
>>> On Fri, 22 Oct 2021 at 11:02, Stefan Miklosovic
>>> <st...@instaclustr.com> wrote:
>>>> 
>>>> Hi Jacek,
>>>> 
>>>> Thanks for taking the lead on this.
>>>> 
>>>> There was importing of SSTables introduced in 4.0 via
>>>> StorageService#importNewSSTables. The "problem" with this is that
>>>> SSTables need to be physically located at disk so Cassandra can read
>>>> them. If a backup is taken and SSTables are uploaded to, for example,
>>>> S3 bucket, then upon restore, all these SSTables need to be downloaded
>>>> first and then imported. What about downloading them / importing them
>>>> directly from S3? Or any custom source for that matter? Importing of
>>>> SSTables is a very nice feature in 4.0, we do not need to copy / hard
>>>> link / refresh, it is all handled internally.
>>>> 
>>>> I am not sure if your work is related to this idea but I would
>>>> appreciate it if this is pluggable as well for the sake of simplicity
>>>> and effectiveness as we would not have to download all sstables before
>>>> importing them.
>>>> 
>>>> If it is not related, feel free to skip that completely and I guess I
>>>> would have to try to push that forward myself.
>>>> 
>>>> Regards
>>>> 
>>>> 
>>>> On Fri, 22 Oct 2021 at 10:24, Jacek Lewandowski
>>>> <le...@gmail.com> wrote:
>>>>> 
>>>>> I'd like to start a discussion about SSTable format API proposal
>>> (CEP-17)
>>>>> 
>>>>> Jira: https://issues.apache.org/jira/browse/CASSANDRA-17056
>>>>> CEP:
>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-17%3A+SSTable+format+API
>>>>> 
>>>>> Thanks,
>>>>> Jacek
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>> 
>>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org


Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

Posted by "benedict@apache.org" <be...@apache.org>.
> I am -1 here, for the reasons listed above; the problem (in my eye) is not reader/writer but higher level at the actual SSTable.  If we plug out read/write but still allow direct file access, then these abstractions fail to provide the goals of the CEP.

Be careful dropping -1s, as your -1s here are binding. I realise this isn’t a vote thread, but the effect is the same. IMO we should try to express our preferences and defer to the collective opinion where possible. True -1s should very rarely appear.


From: David Capwell <dc...@apple.com.INVALID>
Date: Wednesday, 27 October 2021 at 15:33
To: dev@cassandra.apache.org <de...@cassandra.apache.org>
Subject: Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)
Reading the CEP I don’t see any mention to the systems which access SSTables; such as streaming (small callout to zero-copy-streaming with ZeroCopyBigTableWriter) and repair.  If you are abstracting out BigTableReader then you are not dealing with the implementation assumptions that users of SSTables have (such as direct mutation to auxiliary files outside of -Data.db).

> Audience
>        • Cassandra developers who wish to see SSTableReader and SSTableWriter more modular than they are today,

This statement relates to the above comment, many parts of the code do not use Reader/Writer but instead use direct format knowledge to apply changes to the file format (normally outside of -Data.db); to me the interfaces has to be at the SSTable level, which then expose readers/writers, but also has to expose the other things we do outside of those paths.

>        • move the metrics related to sstable format out from TableMetrics class and make them tied to certain sstable implementation

I am curious about this comment, are you removing exposing this information?

>        • have a single factory for creating both readers and writers for particular implementation of sstable and use it consistently - no direct creation of any reader / writer

I am -1 here, for the reasons listed above; the problem (in my eye) is not reader/writer but higher level at the actual SSTable.  If we plug out read/write but still allow direct file access, then these abstractions fail to provide the goals of the CEP.

I am +1 to the intent of the CEP.

And last comment, which I have also done in the other modularity thread… backwards compatibility and maintenance. It is not clear right now what java interfaces may not break and how we can maintain and extend such interfaces in the future.  If the goal is to allow 3rd parties to plugin and offer new SSTable formats, are we as a project ok with having a minor release do a binary or source non-compatible change?  If not how do we detect this?  Until this problem is solved, I do not think we should add any such interfaces.

> On Oct 22, 2021, at 7:23 AM, Jeremiah Jordan <je...@gmail.com> wrote:
>
> Hi Stefan,
> That idea is not related to this CEP which is about the file formats of the
> sstables, not file system access.  But you should take a look at the work
> recently committed in https://issues.apache.org/jira/browse/CASSANDRA-16926
> to switch to using java.nio.file.Path for file access.  This should allow
> the use of a file system provider to access files which could be the basis
> for work to load the files from S3.
>
> -Jeremiah
>
> On Fri, Oct 22, 2021 at 4:07 AM Stefan Miklosovic <
> stefan.miklosovic@instaclustr.com> wrote:
>
>> One point I would like to add to this; I was already looking into how
>> to extend this but what I saw in SSTableReader was that it is very
>> much "file system oriented". There was not any possibility to actually
>> hook something like that there. I think what importing does is that it
>> will use SSTableReader / Writer stuff so I think that the modification
>> of these classes to accommodate this idea would be necessary.
>>
>> On Fri, 22 Oct 2021 at 11:02, Stefan Miklosovic
>> <st...@instaclustr.com> wrote:
>>>
>>> Hi Jacek,
>>>
>>> Thanks for taking the lead on this.
>>>
>>> There was importing of SSTables introduced in 4.0 via
>>> StorageService#importNewSSTables. The "problem" with this is that
>>> SSTables need to be physically located at disk so Cassandra can read
>>> them. If a backup is taken and SSTables are uploaded to, for example,
>>> S3 bucket, then upon restore, all these SSTables need to be downloaded
>>> first and then imported. What about downloading them / importing them
>>> directly from S3? Or any custom source for that matter? Importing of
>>> SSTables is a very nice feature in 4.0, we do not need to copy / hard
>>> link / refresh, it is all handled internally.
>>>
>>> I am not sure if your work is related to this idea but I would
>>> appreciate it if this is pluggable as well for the sake of simplicity
>>> and effectiveness as we would not have to download all sstables before
>>> importing them.
>>>
>>> If it is not related, feel free to skip that completely and I guess I
>>> would have to try to push that forward myself.
>>>
>>> Regards
>>>
>>>
>>> On Fri, 22 Oct 2021 at 10:24, Jacek Lewandowski
>>> <le...@gmail.com> wrote:
>>>>
>>>> I'd like to start a discussion about SSTable format API proposal
>> (CEP-17)
>>>>
>>>> Jira: https://issues.apache.org/jira/browse/CASSANDRA-17056
>>>> CEP:
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-17%3A+SSTable+format+API
>>>>
>>>> Thanks,
>>>> Jacek
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

Posted by Branimir Lambov <bl...@apache.org>.
Looks like the discussion is settled down. I am moving forward to putting
this proposal to a vote.

Regards,
Branimir

On Mon, Nov 15, 2021 at 7:28 PM David Capwell <dc...@apple.com.invalid>
wrote:

> Works for me
>
> > On Nov 15, 2021, at 4:21 AM, Jacek Lewandowski <
> lewandowski.jacek@gmail.com> wrote:
> >
> > I'd put it another way - the scope is to make it possible to provide a
> new
> > implementation of sstable format without the necessity to refactor
> > Cassandra code. It implies a contract about the responsibilities of
> sstable
> > format implementation so that the rest of the code can rely on that, and
> > only on that, and do not make assumptions beyond that. But it does not
> > claim that the created interfaces will not change even with a minor
> version
> > release. When those interfaces are around for sometime, we can start a
> > separate discussion about whether we want to put some guarantees on them.
> >
> > - - -- --- ----- -------- -------------
> > Jacek Lewandowski
> >
> >
> > On Wed, Nov 10, 2021 at 9:01 PM David Capwell <dcapwell@apple.com.invalid
> >
> > wrote:
> >
> >> If this gets descoped to test only (can break all interfaces in a minor)
> >> then my support concerns are no longer valid; I am cool with the CEP
> scoped
> >> only to improving testing
> >>
> >>> On Nov 10, 2021, at 11:20 AM, Jacek Lewandowski <
> >> lewandowski.jacek@gmail.com> wrote:
> >>>
> >>> For the other ticket (schema update handler interface) I was also
> >> proposing
> >>> a kind of @DeveloperApi annotation as seen in other projects but
> >> similarly
> >>> to this thread there were different opinions and no conclusion. After
> >>> reading the comments I must agree that perhaps it is way too early to
> >> mark
> >>> this interface as stable. Perhaps it was too far-fetched to say it
> would
> >> be
> >>> for people who wished to replace the SSTable format. My focus is
> >>> primarily on cleaning up the code (modularization and clean contracts)
> >> and
> >>> making it possible to introduce a new format in the future while
> allowing
> >>> us to maintain the old format (no "if then else" approach)
> >>>
> >>> - - -- --- ----- -------- -------------
> >>> Jacek Lewandowski
> >>>
> >>>
> >>> On Wed, Nov 10, 2021 at 12:53 AM benedict@apache.org <
> >> benedict@apache.org>
> >>> wrote:
> >>>
> >>>>> I may be wrong here, but the CEP directly calls out making this api
> >>>> public for people who wish to replace the SSTable format
> >>>>
> >>>> I don’t think this implies API stability. For starters, it doesn’t
> >>>> stipulate that these implementations will be supported out of tree
> (the
> >>>> only one I’m aware of, so far as I understand, is intended to be
> >> incubated
> >>>> in tree), nor does an API for external usage have to be stable. It’s
> >> fine
> >>>> to create an API and tell users it’s unstable, and that they should
> >> closely
> >>>> monitor patch version changes if they use it.
> >>>>
> >>>> That said, norms may be changing around what can go into patch
> releases
> >>>> anyhow, so this may be a lot of noise about nothing. If all new
> >> development
> >>>> goes into trunk, then it’s all moot. But I don’t think we can make
> hard
> >>>> assumptions about that today, as historically these sorts of
> intentions
> >>>> haven’t lasted.
> >>>>
> >>>> I’m fairly against the idea of introducing hard restrictions on this,
> >> and
> >>>> potentially ossifying the codebase. I’m not keen to even consider out
> of
> >>>> tree consumers of these APIs in any way, for compatibility,
> >> upgradeability
> >>>> or anything. There’s a lot that needs to be done over the coming years
> >> to
> >>>> improve the internal structure of the project, and unduly entrenching
> >> the
> >>>> current state of affairs would be a huge potential harm of these
> >> efforts to
> >>>> modularise the codebase.
> >>>>
> >>>> From: David Capwell <dc...@apple.com.INVALID>
> >>>> Date: Tuesday, 9 November 2021 at 23:38
> >>>> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> >>>> Subject: Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)
> >>>>> My understanding is that the only interface that is expected to be
> >>>> stable for external consumers is the secondary index API
> >>>>
> >>>> I may be wrong here, but the CEP directly calls out making this api
> >> public
> >>>> for people who wish to replace the SSTable format ("Cassandra
> developers
> >>>> who want to develop and publish different file format
> >> implementations."),
> >>>> so if we need to support 2i API, why would we not support SSTable API
> as
> >>>> well?
> >>>>
> >>>>> All of the other mentioned APIs are in my opinion for internal usage
> >> only
> >>>>
> >>>> This gets back to my point; it is currently tribal knowledge what
> needs
> >> to
> >>>> work and what doesn’t, and without the broader set of committers
> knowing
> >>>> this then the likely hood any new API will break in a minor is high.
> >>>>
> >>>>> On Nov 9, 2021, at 12:13 PM, benedict@apache.org wrote:
> >>>>>
> >>>>> I agree that we don’t need to block the CEP on this, and that we
> should
> >>>> have that discussion. But it’s worth noting that the CEP should not
> >>>> anticipate or depend on any specific outcome of that discussion.
> >>>>>
> >>>>> Since it is somewhat relevant for this discussion, my view is that no
> >>>> interface should be assumed to be stable without the prior explicit
> >>>> agreement of the community.
> >>>>>
> >>>>> My understanding is that the only interface that is expected to be
> >>>> stable for external consumers is the secondary index API. Perhaps also
> >>>> snitches? But also perhaps not, as the difficulty of upgrading these
> at
> >> the
> >>>> same time is pretty low for custom snitches. All of the other
> mentioned
> >>>> APIs are in my opinion for internal usage only, so users should not
> >> assume
> >>>> compile time compatibility across any release, and I am certain we
> have
> >>>> never tried to maintained this. This still facilitates forks of
> course,
> >> by
> >>>> localising the compatibility work.
> >>>>>
> >>>>>
> >>>>> From: Jeremiah D Jordan <je...@gmail.com>
> >>>>> Date: Tuesday, 9 November 2021 at 19:43
> >>>>> To: Cassandra DEV <de...@cassandra.apache.org>
> >>>>> Subject: Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)
> >>>>> I would love to have this discussion and setup annotations or similar
> >> to
> >>>> formalize things.  I just do not think we need to hold any up CEPs to
> do
> >>>> so.  That discussion should possibly be a CEP of its own proposing how
> >> we
> >>>> want to formalize interfaces?  I would be happy to go through and try
> to
> >>>> put together something for that or since you feel so strongly about it
> >>>> maybe you want to David?  At the very least it should get its own
> >> DISCUSS
> >>>> thread and then be written up in the wiki.
> >>>>>
> >>>>> -Jeremiah
> >>>>>
> >>>>>> On Nov 9, 2021, at 1:06 PM, Joshua McKenzie <jm...@apache.org>
> >>>> wrote:
> >>>>>>
> >>>>>>>
> >>>>>>> trunk -> anything goes, not trunk -> try not to change these
> >> interfaces
> >>>>>>
> >>>>>> Have we ever clarified what "these interfaces" are? Was just talking
> >> to
> >>>>>> David and I realized I didn't even JavaDoc CommitLogReadHandler as
> >>>> _being
> >>>>>> designed_ for external usage. /sigh
> >>>>>>
> >>>>>> I think it'd be valuable for us to go through the codebase and
> >> annotate
> >>>>>> interfaces as intended to be exposed to 3rd parties; this has
> bothered
> >>>> me
> >>>>>> for years. Especially as we come up on a large number of new
> cleanups,
> >>>>>> refactorings, and potentially genericizing some subsystems into
> API's
> >>>>>> (CEP-18 descendents).
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Nov 9, 2021 at 2:01 PM David Capwell
> >> <dcapwell@apple.com.invalid
> >>>>>
> >>>>>> wrote:
> >>>>>>
> >>>>>>>> We already have many interfaces similar to these for Compaction
> >>>>>>> Strategy, Indexing, Query Handler.
> >>>>>>>
> >>>>>>> Today-I-Learned QueryHandler is not allowed to be touched in a
> minor…
> >>>> good
> >>>>>>> to know…
> >>>>>>>
> >>>>>>>> not trunk -> try not to change these interfaces
> >>>>>>>
> >>>>>>> Outside of MBeans, I honestly do not know what interfaces fall into
> >>>> this
> >>>>>>> group; and for MBeans we have tests which block breaking changes.
> >> The
> >>>>>>> point I am making is that not everyone is aware of the rules, so
> >> having
> >>>>>>> something in place to help enforce such rules should be thought
> >> about;
> >>>> if
> >>>>>>> we want to add pluggable hooks with the intent that external
> parties
> >>>> can
> >>>>>>> leverage such hooks, we should also add to the scope the
> maintenance
> >> of
> >>>>>>> these interfaces (we should not assume “tribal knowledge” will
> work).
> >>>>>>>
> >>>>>>> I am not trying to ask for something large or something requiring a
> >>>> ton of
> >>>>>>> work, I am just asking that this gets thought about during the
> >> project
> >>>> so
> >>>>>>> it doesn’t get neglected.  This could be as simple as an annotation
> >>>> like
> >>>>>>> @ExposedTo3rdParties (Hadoop does this to show an interface is
> >> exposed
> >>>> and
> >>>>>>> must be maintained), or it could be something like split
> directories
> >>>>>>> (src/java = private, src/java-exposed = public); I am trying not to
> >>>> dictate
> >>>>>>> an implementation, only trying to make sure we are setup to support
> >>>> the CEP
> >>>>>>> after the work is done.
> >>>>>>>
> >>>>>>>
> >>>>>>>> On Nov 9, 2021, at 9:52 AM, Jeremiah D Jordan <
> >>>> jeremiah.jordan@gmail.com>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> We already have many interfaces similar to these for Compaction
> >>>>>>> Strategy, Indexing, Query Handler.  I would hope that commiters are
> >>>> already
> >>>>>>> following a policy along the lines of trunk -> anything goes, not
> >>>> trunk ->
> >>>>>>> try not to change these interfaces.  I would expect that to be the
> >> same
> >>>>>>> policy for any new internal interfaces that are added.  But given
> we
> >>>>>>> already have many such interfaces, I see no reason to block adding
> >>>> more of
> >>>>>>> them while change policies are discussed.
> >>>>>>>>
> >>>>>>>> -Jeremiah
> >>>>>>>>
> >>>>>>>>> On Nov 9, 2021, at 10:44 AM, David Capwell
> >>>> <dc...@apple.com.INVALID>
> >>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> I still have one outstanding comment, but this is a comment for
> >>>> several
> >>>>>>> of the CEPs being worked on
> >>>>>>>>>
> >>>>>>>>>> And last comment, which I have also done in the other modularity
> >>>>>>> thread… backwards compatibility and maintenance. It is not clear
> >> right
> >>>> now
> >>>>>>> what java interfaces may not break and how we can maintain and
> extend
> >>>> such
> >>>>>>> interfaces in the future.  If the goal is to allow 3rd parties to
> >>>> plugin
> >>>>>>> and offer new SSTable formats, are we as a project ok with having a
> >>>> minor
> >>>>>>> release do a binary or source non-compatible change?  If not how do
> >> we
> >>>>>>> detect this?  Until this problem is solved, I do not think we
> should
> >>>> add
> >>>>>>> any such interfaces.
> >>>>>>>>>
> >>>>>>>>> I would love some clarity on this.  Specifically, if we assume a
> >>>> patch
> >>>>>>> author/reviewers are not familiar with the impact of changes these
> >>>>>>> interfaces, what happens?  Do we have tools to block this? Do we
> >>>> require
> >>>>>>> 3rd party authors to create massive shims to deal with every patch
> >>>> level
> >>>>>>> version out there?  I would love more clarity on how we maintain
> >> these
> >>>> new
> >>>>>>> pluggable interfaces.
> >>>>>>>>>
> >>>>>>>>>> On Nov 9, 2021, at 4:45 AM, Branimir Lambov <blambov@apache.org
> >
> >>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Does anyone have any further comments or questions on the
> >> proposal,
> >>>> or
> >>>>>>> are
> >>>>>>>>>> we ready to  move forward to a vote?
> >>>>>>>>>>
> >>>>>>>>>> Regards,
> >>>>>>>>>> Branimir
> >>>>>>>>>>
> >>>>>>>>>> On Tue, Nov 2, 2021 at 7:15 PM David Capwell
> >>>>>>> <dc...@apple.com.invalid>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>>> I apologize I did not mention those things explicitly. All the
> >>>> places
> >>>>>>>>>>> where
> >>>>>>>>>>>> sstable files are accessed directly would have to be
> refactored.
> >>>>>>>>>>>
> >>>>>>>>>>> Works for me
> >>>>>>>>>>>
> >>>>>>>>>>>> Speaking about the implementation, one idea I was thinking
> about
> >>>> was
> >>>>>>> that
> >>>>>>>>>>>> the factories for formats are registered using Java's native
> >>>> service
> >>>>>>>>>>>> loader.
> >>>>>>>>>>>
> >>>>>>>>>>> I am a fan of ServiceLoader as a means of plugging in.
> >>>>>>>>>>>
> >>>>>>>>>>>> I hope this explains a bit
> >>>>>>>>>>>
> >>>>>>>>>>> Yep; thanks!
> >>>>>>>>>>>
> >>>>>>>>>>>> On Nov 2, 2021, at 1:46 AM, Jacek Lewandowski <
> >>>>>>>>>>> lewandowski.jacek@gmail.com> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> David,
> >>>>>>>>>>>>
> >>>>>>>>>>>> I apologize I did not mention those things explicitly. All the
> >>>> places
> >>>>>>>>>>> where
> >>>>>>>>>>>> sstable files are accessed directly would have to be
> refactored.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Regarding TableMetrics - currently it includes many metrics,
> >> some
> >>>> of
> >>>>>>> them
> >>>>>>>>>>>> are unrelated to sstables at all, but there are metrics which
> >> are
> >>>>>>>>>>> specific
> >>>>>>>>>>>> to the current sstable format, like metrics related to index
> >>>>>>> summaries or
> >>>>>>>>>>>> bloom filters. The created gauges query certain methods on
> >> sstable
> >>>>>>>>>>> reader -
> >>>>>>>>>>>> I think the only common metrics for sstables we can leave in
> >>>>>>> TableMetrics
> >>>>>>>>>>>> are those for which there are query methods in generic sstable
> >>>>>>> interface.
> >>>>>>>>>>>> Other metrics, specific to the certain sstable format should
> be
> >>>>>>>>>>> registered
> >>>>>>>>>>>> by the implementation itself.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Speaking about the implementation, one idea I was thinking
> about
> >>>> was
> >>>>>>> that
> >>>>>>>>>>>> the factories for formats are registered using Java's native
> >>>> service
> >>>>>>>>>>>> loader. This way we could get the list of all the factories on
> >> the
> >>>>>>>>>>>> classpath and call some method, like `registerMetrics` during
> >>>> system
> >>>>>>>>>>>> initialization. That could be also implemented in static
> >>>> initializer
> >>>>>>> in
> >>>>>>>>>>> the
> >>>>>>>>>>>> factory but it would make it less obvious for the implementors
> >>>> where
> >>>>>>> such
> >>>>>>>>>>>> initialization should be done.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I hope this explains a bit
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks,
> >>>>>>>>>>>> Jacek
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>> ---------------------------------------------------------------------
> >>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >> ---------------------------------------------------------------------
> >>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >> ---------------------------------------------------------------------
> >>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> ---------------------------------------------------------------------
> >>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

Posted by David Capwell <dc...@apple.com.INVALID>.
Works for me

> On Nov 15, 2021, at 4:21 AM, Jacek Lewandowski <le...@gmail.com> wrote:
> 
> I'd put it another way - the scope is to make it possible to provide a new
> implementation of sstable format without the necessity to refactor
> Cassandra code. It implies a contract about the responsibilities of sstable
> format implementation so that the rest of the code can rely on that, and
> only on that, and do not make assumptions beyond that. But it does not
> claim that the created interfaces will not change even with a minor version
> release. When those interfaces are around for sometime, we can start a
> separate discussion about whether we want to put some guarantees on them.
> 
> - - -- --- ----- -------- -------------
> Jacek Lewandowski
> 
> 
> On Wed, Nov 10, 2021 at 9:01 PM David Capwell <dc...@apple.com.invalid>
> wrote:
> 
>> If this gets descoped to test only (can break all interfaces in a minor)
>> then my support concerns are no longer valid; I am cool with the CEP scoped
>> only to improving testing
>> 
>>> On Nov 10, 2021, at 11:20 AM, Jacek Lewandowski <
>> lewandowski.jacek@gmail.com> wrote:
>>> 
>>> For the other ticket (schema update handler interface) I was also
>> proposing
>>> a kind of @DeveloperApi annotation as seen in other projects but
>> similarly
>>> to this thread there were different opinions and no conclusion. After
>>> reading the comments I must agree that perhaps it is way too early to
>> mark
>>> this interface as stable. Perhaps it was too far-fetched to say it would
>> be
>>> for people who wished to replace the SSTable format. My focus is
>>> primarily on cleaning up the code (modularization and clean contracts)
>> and
>>> making it possible to introduce a new format in the future while allowing
>>> us to maintain the old format (no "if then else" approach)
>>> 
>>> - - -- --- ----- -------- -------------
>>> Jacek Lewandowski
>>> 
>>> 
>>> On Wed, Nov 10, 2021 at 12:53 AM benedict@apache.org <
>> benedict@apache.org>
>>> wrote:
>>> 
>>>>> I may be wrong here, but the CEP directly calls out making this api
>>>> public for people who wish to replace the SSTable format
>>>> 
>>>> I don’t think this implies API stability. For starters, it doesn’t
>>>> stipulate that these implementations will be supported out of tree (the
>>>> only one I’m aware of, so far as I understand, is intended to be
>> incubated
>>>> in tree), nor does an API for external usage have to be stable. It’s
>> fine
>>>> to create an API and tell users it’s unstable, and that they should
>> closely
>>>> monitor patch version changes if they use it.
>>>> 
>>>> That said, norms may be changing around what can go into patch releases
>>>> anyhow, so this may be a lot of noise about nothing. If all new
>> development
>>>> goes into trunk, then it’s all moot. But I don’t think we can make hard
>>>> assumptions about that today, as historically these sorts of intentions
>>>> haven’t lasted.
>>>> 
>>>> I’m fairly against the idea of introducing hard restrictions on this,
>> and
>>>> potentially ossifying the codebase. I’m not keen to even consider out of
>>>> tree consumers of these APIs in any way, for compatibility,
>> upgradeability
>>>> or anything. There’s a lot that needs to be done over the coming years
>> to
>>>> improve the internal structure of the project, and unduly entrenching
>> the
>>>> current state of affairs would be a huge potential harm of these
>> efforts to
>>>> modularise the codebase.
>>>> 
>>>> From: David Capwell <dc...@apple.com.INVALID>
>>>> Date: Tuesday, 9 November 2021 at 23:38
>>>> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
>>>> Subject: Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)
>>>>> My understanding is that the only interface that is expected to be
>>>> stable for external consumers is the secondary index API
>>>> 
>>>> I may be wrong here, but the CEP directly calls out making this api
>> public
>>>> for people who wish to replace the SSTable format ("Cassandra developers
>>>> who want to develop and publish different file format
>> implementations."),
>>>> so if we need to support 2i API, why would we not support SSTable API as
>>>> well?
>>>> 
>>>>> All of the other mentioned APIs are in my opinion for internal usage
>> only
>>>> 
>>>> This gets back to my point; it is currently tribal knowledge what needs
>> to
>>>> work and what doesn’t, and without the broader set of committers knowing
>>>> this then the likely hood any new API will break in a minor is high.
>>>> 
>>>>> On Nov 9, 2021, at 12:13 PM, benedict@apache.org wrote:
>>>>> 
>>>>> I agree that we don’t need to block the CEP on this, and that we should
>>>> have that discussion. But it’s worth noting that the CEP should not
>>>> anticipate or depend on any specific outcome of that discussion.
>>>>> 
>>>>> Since it is somewhat relevant for this discussion, my view is that no
>>>> interface should be assumed to be stable without the prior explicit
>>>> agreement of the community.
>>>>> 
>>>>> My understanding is that the only interface that is expected to be
>>>> stable for external consumers is the secondary index API. Perhaps also
>>>> snitches? But also perhaps not, as the difficulty of upgrading these at
>> the
>>>> same time is pretty low for custom snitches. All of the other mentioned
>>>> APIs are in my opinion for internal usage only, so users should not
>> assume
>>>> compile time compatibility across any release, and I am certain we have
>>>> never tried to maintained this. This still facilitates forks of course,
>> by
>>>> localising the compatibility work.
>>>>> 
>>>>> 
>>>>> From: Jeremiah D Jordan <je...@gmail.com>
>>>>> Date: Tuesday, 9 November 2021 at 19:43
>>>>> To: Cassandra DEV <de...@cassandra.apache.org>
>>>>> Subject: Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)
>>>>> I would love to have this discussion and setup annotations or similar
>> to
>>>> formalize things.  I just do not think we need to hold any up CEPs to do
>>>> so.  That discussion should possibly be a CEP of its own proposing how
>> we
>>>> want to formalize interfaces?  I would be happy to go through and try to
>>>> put together something for that or since you feel so strongly about it
>>>> maybe you want to David?  At the very least it should get its own
>> DISCUSS
>>>> thread and then be written up in the wiki.
>>>>> 
>>>>> -Jeremiah
>>>>> 
>>>>>> On Nov 9, 2021, at 1:06 PM, Joshua McKenzie <jm...@apache.org>
>>>> wrote:
>>>>>> 
>>>>>>> 
>>>>>>> trunk -> anything goes, not trunk -> try not to change these
>> interfaces
>>>>>> 
>>>>>> Have we ever clarified what "these interfaces" are? Was just talking
>> to
>>>>>> David and I realized I didn't even JavaDoc CommitLogReadHandler as
>>>> _being
>>>>>> designed_ for external usage. /sigh
>>>>>> 
>>>>>> I think it'd be valuable for us to go through the codebase and
>> annotate
>>>>>> interfaces as intended to be exposed to 3rd parties; this has bothered
>>>> me
>>>>>> for years. Especially as we come up on a large number of new cleanups,
>>>>>> refactorings, and potentially genericizing some subsystems into API's
>>>>>> (CEP-18 descendents).
>>>>>> 
>>>>>> 
>>>>>> On Tue, Nov 9, 2021 at 2:01 PM David Capwell
>> <dcapwell@apple.com.invalid
>>>>> 
>>>>>> wrote:
>>>>>> 
>>>>>>>> We already have many interfaces similar to these for Compaction
>>>>>>> Strategy, Indexing, Query Handler.
>>>>>>> 
>>>>>>> Today-I-Learned QueryHandler is not allowed to be touched in a minor…
>>>> good
>>>>>>> to know…
>>>>>>> 
>>>>>>>> not trunk -> try not to change these interfaces
>>>>>>> 
>>>>>>> Outside of MBeans, I honestly do not know what interfaces fall into
>>>> this
>>>>>>> group; and for MBeans we have tests which block breaking changes.
>> The
>>>>>>> point I am making is that not everyone is aware of the rules, so
>> having
>>>>>>> something in place to help enforce such rules should be thought
>> about;
>>>> if
>>>>>>> we want to add pluggable hooks with the intent that external parties
>>>> can
>>>>>>> leverage such hooks, we should also add to the scope the maintenance
>> of
>>>>>>> these interfaces (we should not assume “tribal knowledge” will work).
>>>>>>> 
>>>>>>> I am not trying to ask for something large or something requiring a
>>>> ton of
>>>>>>> work, I am just asking that this gets thought about during the
>> project
>>>> so
>>>>>>> it doesn’t get neglected.  This could be as simple as an annotation
>>>> like
>>>>>>> @ExposedTo3rdParties (Hadoop does this to show an interface is
>> exposed
>>>> and
>>>>>>> must be maintained), or it could be something like split directories
>>>>>>> (src/java = private, src/java-exposed = public); I am trying not to
>>>> dictate
>>>>>>> an implementation, only trying to make sure we are setup to support
>>>> the CEP
>>>>>>> after the work is done.
>>>>>>> 
>>>>>>> 
>>>>>>>> On Nov 9, 2021, at 9:52 AM, Jeremiah D Jordan <
>>>> jeremiah.jordan@gmail.com>
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> We already have many interfaces similar to these for Compaction
>>>>>>> Strategy, Indexing, Query Handler.  I would hope that commiters are
>>>> already
>>>>>>> following a policy along the lines of trunk -> anything goes, not
>>>> trunk ->
>>>>>>> try not to change these interfaces.  I would expect that to be the
>> same
>>>>>>> policy for any new internal interfaces that are added.  But given we
>>>>>>> already have many such interfaces, I see no reason to block adding
>>>> more of
>>>>>>> them while change policies are discussed.
>>>>>>>> 
>>>>>>>> -Jeremiah
>>>>>>>> 
>>>>>>>>> On Nov 9, 2021, at 10:44 AM, David Capwell
>>>> <dc...@apple.com.INVALID>
>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> I still have one outstanding comment, but this is a comment for
>>>> several
>>>>>>> of the CEPs being worked on
>>>>>>>>> 
>>>>>>>>>> And last comment, which I have also done in the other modularity
>>>>>>> thread… backwards compatibility and maintenance. It is not clear
>> right
>>>> now
>>>>>>> what java interfaces may not break and how we can maintain and extend
>>>> such
>>>>>>> interfaces in the future.  If the goal is to allow 3rd parties to
>>>> plugin
>>>>>>> and offer new SSTable formats, are we as a project ok with having a
>>>> minor
>>>>>>> release do a binary or source non-compatible change?  If not how do
>> we
>>>>>>> detect this?  Until this problem is solved, I do not think we should
>>>> add
>>>>>>> any such interfaces.
>>>>>>>>> 
>>>>>>>>> I would love some clarity on this.  Specifically, if we assume a
>>>> patch
>>>>>>> author/reviewers are not familiar with the impact of changes these
>>>>>>> interfaces, what happens?  Do we have tools to block this? Do we
>>>> require
>>>>>>> 3rd party authors to create massive shims to deal with every patch
>>>> level
>>>>>>> version out there?  I would love more clarity on how we maintain
>> these
>>>> new
>>>>>>> pluggable interfaces.
>>>>>>>>> 
>>>>>>>>>> On Nov 9, 2021, at 4:45 AM, Branimir Lambov <bl...@apache.org>
>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Does anyone have any further comments or questions on the
>> proposal,
>>>> or
>>>>>>> are
>>>>>>>>>> we ready to  move forward to a vote?
>>>>>>>>>> 
>>>>>>>>>> Regards,
>>>>>>>>>> Branimir
>>>>>>>>>> 
>>>>>>>>>> On Tue, Nov 2, 2021 at 7:15 PM David Capwell
>>>>>>> <dc...@apple.com.invalid>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>>> I apologize I did not mention those things explicitly. All the
>>>> places
>>>>>>>>>>> where
>>>>>>>>>>>> sstable files are accessed directly would have to be refactored.
>>>>>>>>>>> 
>>>>>>>>>>> Works for me
>>>>>>>>>>> 
>>>>>>>>>>>> Speaking about the implementation, one idea I was thinking about
>>>> was
>>>>>>> that
>>>>>>>>>>>> the factories for formats are registered using Java's native
>>>> service
>>>>>>>>>>>> loader.
>>>>>>>>>>> 
>>>>>>>>>>> I am a fan of ServiceLoader as a means of plugging in.
>>>>>>>>>>> 
>>>>>>>>>>>> I hope this explains a bit
>>>>>>>>>>> 
>>>>>>>>>>> Yep; thanks!
>>>>>>>>>>> 
>>>>>>>>>>>> On Nov 2, 2021, at 1:46 AM, Jacek Lewandowski <
>>>>>>>>>>> lewandowski.jacek@gmail.com> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> David,
>>>>>>>>>>>> 
>>>>>>>>>>>> I apologize I did not mention those things explicitly. All the
>>>> places
>>>>>>>>>>> where
>>>>>>>>>>>> sstable files are accessed directly would have to be refactored.
>>>>>>>>>>>> 
>>>>>>>>>>>> Regarding TableMetrics - currently it includes many metrics,
>> some
>>>> of
>>>>>>> them
>>>>>>>>>>>> are unrelated to sstables at all, but there are metrics which
>> are
>>>>>>>>>>> specific
>>>>>>>>>>>> to the current sstable format, like metrics related to index
>>>>>>> summaries or
>>>>>>>>>>>> bloom filters. The created gauges query certain methods on
>> sstable
>>>>>>>>>>> reader -
>>>>>>>>>>>> I think the only common metrics for sstables we can leave in
>>>>>>> TableMetrics
>>>>>>>>>>>> are those for which there are query methods in generic sstable
>>>>>>> interface.
>>>>>>>>>>>> Other metrics, specific to the certain sstable format should be
>>>>>>>>>>> registered
>>>>>>>>>>>> by the implementation itself.
>>>>>>>>>>>> 
>>>>>>>>>>>> Speaking about the implementation, one idea I was thinking about
>>>> was
>>>>>>> that
>>>>>>>>>>>> the factories for formats are registered using Java's native
>>>> service
>>>>>>>>>>>> loader. This way we could get the list of all the factories on
>> the
>>>>>>>>>>>> classpath and call some method, like `registerMetrics` during
>>>> system
>>>>>>>>>>>> initialization. That could be also implemented in static
>>>> initializer
>>>>>>> in
>>>>>>>>>>> the
>>>>>>>>>>>> factory but it would make it less obvious for the implementors
>>>> where
>>>>>>> such
>>>>>>>>>>>> initialization should be done.
>>>>>>>>>>>> 
>>>>>>>>>>>> I hope this explains a bit
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Jacek
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>> ---------------------------------------------------------------------
>>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org


Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

Posted by Jacek Lewandowski <le...@gmail.com>.
I'd put it another way - the scope is to make it possible to provide a new
implementation of sstable format without the necessity to refactor
Cassandra code. It implies a contract about the responsibilities of sstable
format implementation so that the rest of the code can rely on that, and
only on that, and do not make assumptions beyond that. But it does not
claim that the created interfaces will not change even with a minor version
release. When those interfaces are around for sometime, we can start a
separate discussion about whether we want to put some guarantees on them.

- - -- --- ----- -------- -------------
Jacek Lewandowski


On Wed, Nov 10, 2021 at 9:01 PM David Capwell <dc...@apple.com.invalid>
wrote:

> If this gets descoped to test only (can break all interfaces in a minor)
> then my support concerns are no longer valid; I am cool with the CEP scoped
> only to improving testing
>
> > On Nov 10, 2021, at 11:20 AM, Jacek Lewandowski <
> lewandowski.jacek@gmail.com> wrote:
> >
> > For the other ticket (schema update handler interface) I was also
> proposing
> > a kind of @DeveloperApi annotation as seen in other projects but
> similarly
> > to this thread there were different opinions and no conclusion. After
> > reading the comments I must agree that perhaps it is way too early to
> mark
> > this interface as stable. Perhaps it was too far-fetched to say it would
> be
> > for people who wished to replace the SSTable format. My focus is
> > primarily on cleaning up the code (modularization and clean contracts)
> and
> > making it possible to introduce a new format in the future while allowing
> > us to maintain the old format (no "if then else" approach)
> >
> > - - -- --- ----- -------- -------------
> > Jacek Lewandowski
> >
> >
> > On Wed, Nov 10, 2021 at 12:53 AM benedict@apache.org <
> benedict@apache.org>
> > wrote:
> >
> >>> I may be wrong here, but the CEP directly calls out making this api
> >> public for people who wish to replace the SSTable format
> >>
> >> I don’t think this implies API stability. For starters, it doesn’t
> >> stipulate that these implementations will be supported out of tree (the
> >> only one I’m aware of, so far as I understand, is intended to be
> incubated
> >> in tree), nor does an API for external usage have to be stable. It’s
> fine
> >> to create an API and tell users it’s unstable, and that they should
> closely
> >> monitor patch version changes if they use it.
> >>
> >> That said, norms may be changing around what can go into patch releases
> >> anyhow, so this may be a lot of noise about nothing. If all new
> development
> >> goes into trunk, then it’s all moot. But I don’t think we can make hard
> >> assumptions about that today, as historically these sorts of intentions
> >> haven’t lasted.
> >>
> >> I’m fairly against the idea of introducing hard restrictions on this,
> and
> >> potentially ossifying the codebase. I’m not keen to even consider out of
> >> tree consumers of these APIs in any way, for compatibility,
> upgradeability
> >> or anything. There’s a lot that needs to be done over the coming years
> to
> >> improve the internal structure of the project, and unduly entrenching
> the
> >> current state of affairs would be a huge potential harm of these
> efforts to
> >> modularise the codebase.
> >>
> >> From: David Capwell <dc...@apple.com.INVALID>
> >> Date: Tuesday, 9 November 2021 at 23:38
> >> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> >> Subject: Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)
> >>> My understanding is that the only interface that is expected to be
> >> stable for external consumers is the secondary index API
> >>
> >> I may be wrong here, but the CEP directly calls out making this api
> public
> >> for people who wish to replace the SSTable format ("Cassandra developers
> >> who want to develop and publish different file format
> implementations."),
> >> so if we need to support 2i API, why would we not support SSTable API as
> >> well?
> >>
> >>> All of the other mentioned APIs are in my opinion for internal usage
> only
> >>
> >> This gets back to my point; it is currently tribal knowledge what needs
> to
> >> work and what doesn’t, and without the broader set of committers knowing
> >> this then the likely hood any new API will break in a minor is high.
> >>
> >>> On Nov 9, 2021, at 12:13 PM, benedict@apache.org wrote:
> >>>
> >>> I agree that we don’t need to block the CEP on this, and that we should
> >> have that discussion. But it’s worth noting that the CEP should not
> >> anticipate or depend on any specific outcome of that discussion.
> >>>
> >>> Since it is somewhat relevant for this discussion, my view is that no
> >> interface should be assumed to be stable without the prior explicit
> >> agreement of the community.
> >>>
> >>> My understanding is that the only interface that is expected to be
> >> stable for external consumers is the secondary index API. Perhaps also
> >> snitches? But also perhaps not, as the difficulty of upgrading these at
> the
> >> same time is pretty low for custom snitches. All of the other mentioned
> >> APIs are in my opinion for internal usage only, so users should not
> assume
> >> compile time compatibility across any release, and I am certain we have
> >> never tried to maintained this. This still facilitates forks of course,
> by
> >> localising the compatibility work.
> >>>
> >>>
> >>> From: Jeremiah D Jordan <je...@gmail.com>
> >>> Date: Tuesday, 9 November 2021 at 19:43
> >>> To: Cassandra DEV <de...@cassandra.apache.org>
> >>> Subject: Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)
> >>> I would love to have this discussion and setup annotations or similar
> to
> >> formalize things.  I just do not think we need to hold any up CEPs to do
> >> so.  That discussion should possibly be a CEP of its own proposing how
> we
> >> want to formalize interfaces?  I would be happy to go through and try to
> >> put together something for that or since you feel so strongly about it
> >> maybe you want to David?  At the very least it should get its own
> DISCUSS
> >> thread and then be written up in the wiki.
> >>>
> >>> -Jeremiah
> >>>
> >>>> On Nov 9, 2021, at 1:06 PM, Joshua McKenzie <jm...@apache.org>
> >> wrote:
> >>>>
> >>>>>
> >>>>> trunk -> anything goes, not trunk -> try not to change these
> interfaces
> >>>>
> >>>> Have we ever clarified what "these interfaces" are? Was just talking
> to
> >>>> David and I realized I didn't even JavaDoc CommitLogReadHandler as
> >> _being
> >>>> designed_ for external usage. /sigh
> >>>>
> >>>> I think it'd be valuable for us to go through the codebase and
> annotate
> >>>> interfaces as intended to be exposed to 3rd parties; this has bothered
> >> me
> >>>> for years. Especially as we come up on a large number of new cleanups,
> >>>> refactorings, and potentially genericizing some subsystems into API's
> >>>> (CEP-18 descendents).
> >>>>
> >>>>
> >>>> On Tue, Nov 9, 2021 at 2:01 PM David Capwell
> <dcapwell@apple.com.invalid
> >>>
> >>>> wrote:
> >>>>
> >>>>>> We already have many interfaces similar to these for Compaction
> >>>>> Strategy, Indexing, Query Handler.
> >>>>>
> >>>>> Today-I-Learned QueryHandler is not allowed to be touched in a minor…
> >> good
> >>>>> to know…
> >>>>>
> >>>>>> not trunk -> try not to change these interfaces
> >>>>>
> >>>>> Outside of MBeans, I honestly do not know what interfaces fall into
> >> this
> >>>>> group; and for MBeans we have tests which block breaking changes.
> The
> >>>>> point I am making is that not everyone is aware of the rules, so
> having
> >>>>> something in place to help enforce such rules should be thought
> about;
> >> if
> >>>>> we want to add pluggable hooks with the intent that external parties
> >> can
> >>>>> leverage such hooks, we should also add to the scope the maintenance
> of
> >>>>> these interfaces (we should not assume “tribal knowledge” will work).
> >>>>>
> >>>>> I am not trying to ask for something large or something requiring a
> >> ton of
> >>>>> work, I am just asking that this gets thought about during the
> project
> >> so
> >>>>> it doesn’t get neglected.  This could be as simple as an annotation
> >> like
> >>>>> @ExposedTo3rdParties (Hadoop does this to show an interface is
> exposed
> >> and
> >>>>> must be maintained), or it could be something like split directories
> >>>>> (src/java = private, src/java-exposed = public); I am trying not to
> >> dictate
> >>>>> an implementation, only trying to make sure we are setup to support
> >> the CEP
> >>>>> after the work is done.
> >>>>>
> >>>>>
> >>>>>> On Nov 9, 2021, at 9:52 AM, Jeremiah D Jordan <
> >> jeremiah.jordan@gmail.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> We already have many interfaces similar to these for Compaction
> >>>>> Strategy, Indexing, Query Handler.  I would hope that commiters are
> >> already
> >>>>> following a policy along the lines of trunk -> anything goes, not
> >> trunk ->
> >>>>> try not to change these interfaces.  I would expect that to be the
> same
> >>>>> policy for any new internal interfaces that are added.  But given we
> >>>>> already have many such interfaces, I see no reason to block adding
> >> more of
> >>>>> them while change policies are discussed.
> >>>>>>
> >>>>>> -Jeremiah
> >>>>>>
> >>>>>>> On Nov 9, 2021, at 10:44 AM, David Capwell
> >> <dc...@apple.com.INVALID>
> >>>>> wrote:
> >>>>>>>
> >>>>>>> I still have one outstanding comment, but this is a comment for
> >> several
> >>>>> of the CEPs being worked on
> >>>>>>>
> >>>>>>>> And last comment, which I have also done in the other modularity
> >>>>> thread… backwards compatibility and maintenance. It is not clear
> right
> >> now
> >>>>> what java interfaces may not break and how we can maintain and extend
> >> such
> >>>>> interfaces in the future.  If the goal is to allow 3rd parties to
> >> plugin
> >>>>> and offer new SSTable formats, are we as a project ok with having a
> >> minor
> >>>>> release do a binary or source non-compatible change?  If not how do
> we
> >>>>> detect this?  Until this problem is solved, I do not think we should
> >> add
> >>>>> any such interfaces.
> >>>>>>>
> >>>>>>> I would love some clarity on this.  Specifically, if we assume a
> >> patch
> >>>>> author/reviewers are not familiar with the impact of changes these
> >>>>> interfaces, what happens?  Do we have tools to block this? Do we
> >> require
> >>>>> 3rd party authors to create massive shims to deal with every patch
> >> level
> >>>>> version out there?  I would love more clarity on how we maintain
> these
> >> new
> >>>>> pluggable interfaces.
> >>>>>>>
> >>>>>>>> On Nov 9, 2021, at 4:45 AM, Branimir Lambov <bl...@apache.org>
> >>>>> wrote:
> >>>>>>>>
> >>>>>>>> Does anyone have any further comments or questions on the
> proposal,
> >> or
> >>>>> are
> >>>>>>>> we ready to  move forward to a vote?
> >>>>>>>>
> >>>>>>>> Regards,
> >>>>>>>> Branimir
> >>>>>>>>
> >>>>>>>> On Tue, Nov 2, 2021 at 7:15 PM David Capwell
> >>>>> <dc...@apple.com.invalid>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>>> I apologize I did not mention those things explicitly. All the
> >> places
> >>>>>>>>> where
> >>>>>>>>>> sstable files are accessed directly would have to be refactored.
> >>>>>>>>>
> >>>>>>>>> Works for me
> >>>>>>>>>
> >>>>>>>>>> Speaking about the implementation, one idea I was thinking about
> >> was
> >>>>> that
> >>>>>>>>>> the factories for formats are registered using Java's native
> >> service
> >>>>>>>>>> loader.
> >>>>>>>>>
> >>>>>>>>> I am a fan of ServiceLoader as a means of plugging in.
> >>>>>>>>>
> >>>>>>>>>> I hope this explains a bit
> >>>>>>>>>
> >>>>>>>>> Yep; thanks!
> >>>>>>>>>
> >>>>>>>>>> On Nov 2, 2021, at 1:46 AM, Jacek Lewandowski <
> >>>>>>>>> lewandowski.jacek@gmail.com> wrote:
> >>>>>>>>>>
> >>>>>>>>>> David,
> >>>>>>>>>>
> >>>>>>>>>> I apologize I did not mention those things explicitly. All the
> >> places
> >>>>>>>>> where
> >>>>>>>>>> sstable files are accessed directly would have to be refactored.
> >>>>>>>>>>
> >>>>>>>>>> Regarding TableMetrics - currently it includes many metrics,
> some
> >> of
> >>>>> them
> >>>>>>>>>> are unrelated to sstables at all, but there are metrics which
> are
> >>>>>>>>> specific
> >>>>>>>>>> to the current sstable format, like metrics related to index
> >>>>> summaries or
> >>>>>>>>>> bloom filters. The created gauges query certain methods on
> sstable
> >>>>>>>>> reader -
> >>>>>>>>>> I think the only common metrics for sstables we can leave in
> >>>>> TableMetrics
> >>>>>>>>>> are those for which there are query methods in generic sstable
> >>>>> interface.
> >>>>>>>>>> Other metrics, specific to the certain sstable format should be
> >>>>>>>>> registered
> >>>>>>>>>> by the implementation itself.
> >>>>>>>>>>
> >>>>>>>>>> Speaking about the implementation, one idea I was thinking about
> >> was
> >>>>> that
> >>>>>>>>>> the factories for formats are registered using Java's native
> >> service
> >>>>>>>>>> loader. This way we could get the list of all the factories on
> the
> >>>>>>>>>> classpath and call some method, like `registerMetrics` during
> >> system
> >>>>>>>>>> initialization. That could be also implemented in static
> >> initializer
> >>>>> in
> >>>>>>>>> the
> >>>>>>>>>> factory but it would make it less obvious for the implementors
> >> where
> >>>>> such
> >>>>>>>>>> initialization should be done.
> >>>>>>>>>>
> >>>>>>>>>> I hope this explains a bit
> >>>>>>>>>>
> >>>>>>>>>> Thanks,
> >>>>>>>>>> Jacek
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >> ---------------------------------------------------------------------
> >>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> ---------------------------------------------------------------------
> >>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> ---------------------------------------------------------------------
> >>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>>>
> >>>>>
> >>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>>
> >>>>>
> >>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

Posted by David Capwell <dc...@apple.com.INVALID>.
If this gets descoped to test only (can break all interfaces in a minor) then my support concerns are no longer valid; I am cool with the CEP scoped only to improving testing

> On Nov 10, 2021, at 11:20 AM, Jacek Lewandowski <le...@gmail.com> wrote:
> 
> For the other ticket (schema update handler interface) I was also proposing
> a kind of @DeveloperApi annotation as seen in other projects but similarly
> to this thread there were different opinions and no conclusion. After
> reading the comments I must agree that perhaps it is way too early to mark
> this interface as stable. Perhaps it was too far-fetched to say it would be
> for people who wished to replace the SSTable format. My focus is
> primarily on cleaning up the code (modularization and clean contracts) and
> making it possible to introduce a new format in the future while allowing
> us to maintain the old format (no "if then else" approach)
> 
> - - -- --- ----- -------- -------------
> Jacek Lewandowski
> 
> 
> On Wed, Nov 10, 2021 at 12:53 AM benedict@apache.org <be...@apache.org>
> wrote:
> 
>>> I may be wrong here, but the CEP directly calls out making this api
>> public for people who wish to replace the SSTable format
>> 
>> I don’t think this implies API stability. For starters, it doesn’t
>> stipulate that these implementations will be supported out of tree (the
>> only one I’m aware of, so far as I understand, is intended to be incubated
>> in tree), nor does an API for external usage have to be stable. It’s fine
>> to create an API and tell users it’s unstable, and that they should closely
>> monitor patch version changes if they use it.
>> 
>> That said, norms may be changing around what can go into patch releases
>> anyhow, so this may be a lot of noise about nothing. If all new development
>> goes into trunk, then it’s all moot. But I don’t think we can make hard
>> assumptions about that today, as historically these sorts of intentions
>> haven’t lasted.
>> 
>> I’m fairly against the idea of introducing hard restrictions on this, and
>> potentially ossifying the codebase. I’m not keen to even consider out of
>> tree consumers of these APIs in any way, for compatibility, upgradeability
>> or anything. There’s a lot that needs to be done over the coming years to
>> improve the internal structure of the project, and unduly entrenching the
>> current state of affairs would be a huge potential harm of these efforts to
>> modularise the codebase.
>> 
>> From: David Capwell <dc...@apple.com.INVALID>
>> Date: Tuesday, 9 November 2021 at 23:38
>> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
>> Subject: Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)
>>> My understanding is that the only interface that is expected to be
>> stable for external consumers is the secondary index API
>> 
>> I may be wrong here, but the CEP directly calls out making this api public
>> for people who wish to replace the SSTable format ("Cassandra developers
>> who want to develop and publish different file format implementations."),
>> so if we need to support 2i API, why would we not support SSTable API as
>> well?
>> 
>>> All of the other mentioned APIs are in my opinion for internal usage only
>> 
>> This gets back to my point; it is currently tribal knowledge what needs to
>> work and what doesn’t, and without the broader set of committers knowing
>> this then the likely hood any new API will break in a minor is high.
>> 
>>> On Nov 9, 2021, at 12:13 PM, benedict@apache.org wrote:
>>> 
>>> I agree that we don’t need to block the CEP on this, and that we should
>> have that discussion. But it’s worth noting that the CEP should not
>> anticipate or depend on any specific outcome of that discussion.
>>> 
>>> Since it is somewhat relevant for this discussion, my view is that no
>> interface should be assumed to be stable without the prior explicit
>> agreement of the community.
>>> 
>>> My understanding is that the only interface that is expected to be
>> stable for external consumers is the secondary index API. Perhaps also
>> snitches? But also perhaps not, as the difficulty of upgrading these at the
>> same time is pretty low for custom snitches. All of the other mentioned
>> APIs are in my opinion for internal usage only, so users should not assume
>> compile time compatibility across any release, and I am certain we have
>> never tried to maintained this. This still facilitates forks of course, by
>> localising the compatibility work.
>>> 
>>> 
>>> From: Jeremiah D Jordan <je...@gmail.com>
>>> Date: Tuesday, 9 November 2021 at 19:43
>>> To: Cassandra DEV <de...@cassandra.apache.org>
>>> Subject: Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)
>>> I would love to have this discussion and setup annotations or similar to
>> formalize things.  I just do not think we need to hold any up CEPs to do
>> so.  That discussion should possibly be a CEP of its own proposing how we
>> want to formalize interfaces?  I would be happy to go through and try to
>> put together something for that or since you feel so strongly about it
>> maybe you want to David?  At the very least it should get its own DISCUSS
>> thread and then be written up in the wiki.
>>> 
>>> -Jeremiah
>>> 
>>>> On Nov 9, 2021, at 1:06 PM, Joshua McKenzie <jm...@apache.org>
>> wrote:
>>>> 
>>>>> 
>>>>> trunk -> anything goes, not trunk -> try not to change these interfaces
>>>> 
>>>> Have we ever clarified what "these interfaces" are? Was just talking to
>>>> David and I realized I didn't even JavaDoc CommitLogReadHandler as
>> _being
>>>> designed_ for external usage. /sigh
>>>> 
>>>> I think it'd be valuable for us to go through the codebase and annotate
>>>> interfaces as intended to be exposed to 3rd parties; this has bothered
>> me
>>>> for years. Especially as we come up on a large number of new cleanups,
>>>> refactorings, and potentially genericizing some subsystems into API's
>>>> (CEP-18 descendents).
>>>> 
>>>> 
>>>> On Tue, Nov 9, 2021 at 2:01 PM David Capwell <dcapwell@apple.com.invalid
>>> 
>>>> wrote:
>>>> 
>>>>>> We already have many interfaces similar to these for Compaction
>>>>> Strategy, Indexing, Query Handler.
>>>>> 
>>>>> Today-I-Learned QueryHandler is not allowed to be touched in a minor…
>> good
>>>>> to know…
>>>>> 
>>>>>> not trunk -> try not to change these interfaces
>>>>> 
>>>>> Outside of MBeans, I honestly do not know what interfaces fall into
>> this
>>>>> group; and for MBeans we have tests which block breaking changes.  The
>>>>> point I am making is that not everyone is aware of the rules, so having
>>>>> something in place to help enforce such rules should be thought about;
>> if
>>>>> we want to add pluggable hooks with the intent that external parties
>> can
>>>>> leverage such hooks, we should also add to the scope the maintenance of
>>>>> these interfaces (we should not assume “tribal knowledge” will work).
>>>>> 
>>>>> I am not trying to ask for something large or something requiring a
>> ton of
>>>>> work, I am just asking that this gets thought about during the project
>> so
>>>>> it doesn’t get neglected.  This could be as simple as an annotation
>> like
>>>>> @ExposedTo3rdParties (Hadoop does this to show an interface is exposed
>> and
>>>>> must be maintained), or it could be something like split directories
>>>>> (src/java = private, src/java-exposed = public); I am trying not to
>> dictate
>>>>> an implementation, only trying to make sure we are setup to support
>> the CEP
>>>>> after the work is done.
>>>>> 
>>>>> 
>>>>>> On Nov 9, 2021, at 9:52 AM, Jeremiah D Jordan <
>> jeremiah.jordan@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>> We already have many interfaces similar to these for Compaction
>>>>> Strategy, Indexing, Query Handler.  I would hope that commiters are
>> already
>>>>> following a policy along the lines of trunk -> anything goes, not
>> trunk ->
>>>>> try not to change these interfaces.  I would expect that to be the same
>>>>> policy for any new internal interfaces that are added.  But given we
>>>>> already have many such interfaces, I see no reason to block adding
>> more of
>>>>> them while change policies are discussed.
>>>>>> 
>>>>>> -Jeremiah
>>>>>> 
>>>>>>> On Nov 9, 2021, at 10:44 AM, David Capwell
>> <dc...@apple.com.INVALID>
>>>>> wrote:
>>>>>>> 
>>>>>>> I still have one outstanding comment, but this is a comment for
>> several
>>>>> of the CEPs being worked on
>>>>>>> 
>>>>>>>> And last comment, which I have also done in the other modularity
>>>>> thread… backwards compatibility and maintenance. It is not clear right
>> now
>>>>> what java interfaces may not break and how we can maintain and extend
>> such
>>>>> interfaces in the future.  If the goal is to allow 3rd parties to
>> plugin
>>>>> and offer new SSTable formats, are we as a project ok with having a
>> minor
>>>>> release do a binary or source non-compatible change?  If not how do we
>>>>> detect this?  Until this problem is solved, I do not think we should
>> add
>>>>> any such interfaces.
>>>>>>> 
>>>>>>> I would love some clarity on this.  Specifically, if we assume a
>> patch
>>>>> author/reviewers are not familiar with the impact of changes these
>>>>> interfaces, what happens?  Do we have tools to block this? Do we
>> require
>>>>> 3rd party authors to create massive shims to deal with every patch
>> level
>>>>> version out there?  I would love more clarity on how we maintain these
>> new
>>>>> pluggable interfaces.
>>>>>>> 
>>>>>>>> On Nov 9, 2021, at 4:45 AM, Branimir Lambov <bl...@apache.org>
>>>>> wrote:
>>>>>>>> 
>>>>>>>> Does anyone have any further comments or questions on the proposal,
>> or
>>>>> are
>>>>>>>> we ready to  move forward to a vote?
>>>>>>>> 
>>>>>>>> Regards,
>>>>>>>> Branimir
>>>>>>>> 
>>>>>>>> On Tue, Nov 2, 2021 at 7:15 PM David Capwell
>>>>> <dc...@apple.com.invalid>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>>> I apologize I did not mention those things explicitly. All the
>> places
>>>>>>>>> where
>>>>>>>>>> sstable files are accessed directly would have to be refactored.
>>>>>>>>> 
>>>>>>>>> Works for me
>>>>>>>>> 
>>>>>>>>>> Speaking about the implementation, one idea I was thinking about
>> was
>>>>> that
>>>>>>>>>> the factories for formats are registered using Java's native
>> service
>>>>>>>>>> loader.
>>>>>>>>> 
>>>>>>>>> I am a fan of ServiceLoader as a means of plugging in.
>>>>>>>>> 
>>>>>>>>>> I hope this explains a bit
>>>>>>>>> 
>>>>>>>>> Yep; thanks!
>>>>>>>>> 
>>>>>>>>>> On Nov 2, 2021, at 1:46 AM, Jacek Lewandowski <
>>>>>>>>> lewandowski.jacek@gmail.com> wrote:
>>>>>>>>>> 
>>>>>>>>>> David,
>>>>>>>>>> 
>>>>>>>>>> I apologize I did not mention those things explicitly. All the
>> places
>>>>>>>>> where
>>>>>>>>>> sstable files are accessed directly would have to be refactored.
>>>>>>>>>> 
>>>>>>>>>> Regarding TableMetrics - currently it includes many metrics, some
>> of
>>>>> them
>>>>>>>>>> are unrelated to sstables at all, but there are metrics which are
>>>>>>>>> specific
>>>>>>>>>> to the current sstable format, like metrics related to index
>>>>> summaries or
>>>>>>>>>> bloom filters. The created gauges query certain methods on sstable
>>>>>>>>> reader -
>>>>>>>>>> I think the only common metrics for sstables we can leave in
>>>>> TableMetrics
>>>>>>>>>> are those for which there are query methods in generic sstable
>>>>> interface.
>>>>>>>>>> Other metrics, specific to the certain sstable format should be
>>>>>>>>> registered
>>>>>>>>>> by the implementation itself.
>>>>>>>>>> 
>>>>>>>>>> Speaking about the implementation, one idea I was thinking about
>> was
>>>>> that
>>>>>>>>>> the factories for formats are registered using Java's native
>> service
>>>>>>>>>> loader. This way we could get the list of all the factories on the
>>>>>>>>>> classpath and call some method, like `registerMetrics` during
>> system
>>>>>>>>>> initialization. That could be also implemented in static
>> initializer
>>>>> in
>>>>>>>>> the
>>>>>>>>>> factory but it would make it less obvious for the implementors
>> where
>>>>> such
>>>>>>>>>> initialization should be done.
>>>>>>>>>> 
>>>>>>>>>> I hope this explains a bit
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> Jacek
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>> 
>>>>> 
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>> 
>>>>> 
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org


Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

Posted by Jacek Lewandowski <le...@gmail.com>.
For the other ticket (schema update handler interface) I was also proposing
a kind of @DeveloperApi annotation as seen in other projects but similarly
to this thread there were different opinions and no conclusion. After
reading the comments I must agree that perhaps it is way too early to mark
this interface as stable. Perhaps it was too far-fetched to say it would be
for people who wished to replace the SSTable format. My focus is
primarily on cleaning up the code (modularization and clean contracts) and
making it possible to introduce a new format in the future while allowing
us to maintain the old format (no "if then else" approach)

- - -- --- ----- -------- -------------
Jacek Lewandowski


On Wed, Nov 10, 2021 at 12:53 AM benedict@apache.org <be...@apache.org>
wrote:

> > I may be wrong here, but the CEP directly calls out making this api
> public for people who wish to replace the SSTable format
>
> I don’t think this implies API stability. For starters, it doesn’t
> stipulate that these implementations will be supported out of tree (the
> only one I’m aware of, so far as I understand, is intended to be incubated
> in tree), nor does an API for external usage have to be stable. It’s fine
> to create an API and tell users it’s unstable, and that they should closely
> monitor patch version changes if they use it.
>
> That said, norms may be changing around what can go into patch releases
> anyhow, so this may be a lot of noise about nothing. If all new development
> goes into trunk, then it’s all moot. But I don’t think we can make hard
> assumptions about that today, as historically these sorts of intentions
> haven’t lasted.
>
> I’m fairly against the idea of introducing hard restrictions on this, and
> potentially ossifying the codebase. I’m not keen to even consider out of
> tree consumers of these APIs in any way, for compatibility, upgradeability
> or anything. There’s a lot that needs to be done over the coming years to
> improve the internal structure of the project, and unduly entrenching the
> current state of affairs would be a huge potential harm of these efforts to
> modularise the codebase.
>
> From: David Capwell <dc...@apple.com.INVALID>
> Date: Tuesday, 9 November 2021 at 23:38
> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> Subject: Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)
> > My understanding is that the only interface that is expected to be
> stable for external consumers is the secondary index API
>
> I may be wrong here, but the CEP directly calls out making this api public
> for people who wish to replace the SSTable format ("Cassandra developers
> who want to develop and publish different file format implementations."),
> so if we need to support 2i API, why would we not support SSTable API as
> well?
>
> > All of the other mentioned APIs are in my opinion for internal usage only
>
> This gets back to my point; it is currently tribal knowledge what needs to
> work and what doesn’t, and without the broader set of committers knowing
> this then the likely hood any new API will break in a minor is high.
>
> > On Nov 9, 2021, at 12:13 PM, benedict@apache.org wrote:
> >
> > I agree that we don’t need to block the CEP on this, and that we should
> have that discussion. But it’s worth noting that the CEP should not
> anticipate or depend on any specific outcome of that discussion.
> >
> > Since it is somewhat relevant for this discussion, my view is that no
> interface should be assumed to be stable without the prior explicit
> agreement of the community.
> >
> > My understanding is that the only interface that is expected to be
> stable for external consumers is the secondary index API. Perhaps also
> snitches? But also perhaps not, as the difficulty of upgrading these at the
> same time is pretty low for custom snitches. All of the other mentioned
> APIs are in my opinion for internal usage only, so users should not assume
> compile time compatibility across any release, and I am certain we have
> never tried to maintained this. This still facilitates forks of course, by
> localising the compatibility work.
> >
> >
> > From: Jeremiah D Jordan <je...@gmail.com>
> > Date: Tuesday, 9 November 2021 at 19:43
> > To: Cassandra DEV <de...@cassandra.apache.org>
> > Subject: Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)
> > I would love to have this discussion and setup annotations or similar to
> formalize things.  I just do not think we need to hold any up CEPs to do
> so.  That discussion should possibly be a CEP of its own proposing how we
> want to formalize interfaces?  I would be happy to go through and try to
> put together something for that or since you feel so strongly about it
> maybe you want to David?  At the very least it should get its own DISCUSS
> thread and then be written up in the wiki.
> >
> > -Jeremiah
> >
> >> On Nov 9, 2021, at 1:06 PM, Joshua McKenzie <jm...@apache.org>
> wrote:
> >>
> >>>
> >>> trunk -> anything goes, not trunk -> try not to change these interfaces
> >>
> >> Have we ever clarified what "these interfaces" are? Was just talking to
> >> David and I realized I didn't even JavaDoc CommitLogReadHandler as
> _being
> >> designed_ for external usage. /sigh
> >>
> >> I think it'd be valuable for us to go through the codebase and annotate
> >> interfaces as intended to be exposed to 3rd parties; this has bothered
> me
> >> for years. Especially as we come up on a large number of new cleanups,
> >> refactorings, and potentially genericizing some subsystems into API's
> >> (CEP-18 descendents).
> >>
> >>
> >> On Tue, Nov 9, 2021 at 2:01 PM David Capwell <dcapwell@apple.com.invalid
> >
> >> wrote:
> >>
> >>>> We already have many interfaces similar to these for Compaction
> >>> Strategy, Indexing, Query Handler.
> >>>
> >>> Today-I-Learned QueryHandler is not allowed to be touched in a minor…
> good
> >>> to know…
> >>>
> >>>> not trunk -> try not to change these interfaces
> >>>
> >>> Outside of MBeans, I honestly do not know what interfaces fall into
> this
> >>> group; and for MBeans we have tests which block breaking changes.  The
> >>> point I am making is that not everyone is aware of the rules, so having
> >>> something in place to help enforce such rules should be thought about;
> if
> >>> we want to add pluggable hooks with the intent that external parties
> can
> >>> leverage such hooks, we should also add to the scope the maintenance of
> >>> these interfaces (we should not assume “tribal knowledge” will work).
> >>>
> >>> I am not trying to ask for something large or something requiring a
> ton of
> >>> work, I am just asking that this gets thought about during the project
> so
> >>> it doesn’t get neglected.  This could be as simple as an annotation
> like
> >>> @ExposedTo3rdParties (Hadoop does this to show an interface is exposed
> and
> >>> must be maintained), or it could be something like split directories
> >>> (src/java = private, src/java-exposed = public); I am trying not to
> dictate
> >>> an implementation, only trying to make sure we are setup to support
> the CEP
> >>> after the work is done.
> >>>
> >>>
> >>>> On Nov 9, 2021, at 9:52 AM, Jeremiah D Jordan <
> jeremiah.jordan@gmail.com>
> >>> wrote:
> >>>>
> >>>> We already have many interfaces similar to these for Compaction
> >>> Strategy, Indexing, Query Handler.  I would hope that commiters are
> already
> >>> following a policy along the lines of trunk -> anything goes, not
> trunk ->
> >>> try not to change these interfaces.  I would expect that to be the same
> >>> policy for any new internal interfaces that are added.  But given we
> >>> already have many such interfaces, I see no reason to block adding
> more of
> >>> them while change policies are discussed.
> >>>>
> >>>> -Jeremiah
> >>>>
> >>>>> On Nov 9, 2021, at 10:44 AM, David Capwell
> <dc...@apple.com.INVALID>
> >>> wrote:
> >>>>>
> >>>>> I still have one outstanding comment, but this is a comment for
> several
> >>> of the CEPs being worked on
> >>>>>
> >>>>>> And last comment, which I have also done in the other modularity
> >>> thread… backwards compatibility and maintenance. It is not clear right
> now
> >>> what java interfaces may not break and how we can maintain and extend
> such
> >>> interfaces in the future.  If the goal is to allow 3rd parties to
> plugin
> >>> and offer new SSTable formats, are we as a project ok with having a
> minor
> >>> release do a binary or source non-compatible change?  If not how do we
> >>> detect this?  Until this problem is solved, I do not think we should
> add
> >>> any such interfaces.
> >>>>>
> >>>>> I would love some clarity on this.  Specifically, if we assume a
> patch
> >>> author/reviewers are not familiar with the impact of changes these
> >>> interfaces, what happens?  Do we have tools to block this? Do we
> require
> >>> 3rd party authors to create massive shims to deal with every patch
> level
> >>> version out there?  I would love more clarity on how we maintain these
> new
> >>> pluggable interfaces.
> >>>>>
> >>>>>> On Nov 9, 2021, at 4:45 AM, Branimir Lambov <bl...@apache.org>
> >>> wrote:
> >>>>>>
> >>>>>> Does anyone have any further comments or questions on the proposal,
> or
> >>> are
> >>>>>> we ready to  move forward to a vote?
> >>>>>>
> >>>>>> Regards,
> >>>>>> Branimir
> >>>>>>
> >>>>>> On Tue, Nov 2, 2021 at 7:15 PM David Capwell
> >>> <dc...@apple.com.invalid>
> >>>>>> wrote:
> >>>>>>
> >>>>>>>> I apologize I did not mention those things explicitly. All the
> places
> >>>>>>> where
> >>>>>>>> sstable files are accessed directly would have to be refactored.
> >>>>>>>
> >>>>>>> Works for me
> >>>>>>>
> >>>>>>>> Speaking about the implementation, one idea I was thinking about
> was
> >>> that
> >>>>>>>> the factories for formats are registered using Java's native
> service
> >>>>>>>> loader.
> >>>>>>>
> >>>>>>> I am a fan of ServiceLoader as a means of plugging in.
> >>>>>>>
> >>>>>>>> I hope this explains a bit
> >>>>>>>
> >>>>>>> Yep; thanks!
> >>>>>>>
> >>>>>>>> On Nov 2, 2021, at 1:46 AM, Jacek Lewandowski <
> >>>>>>> lewandowski.jacek@gmail.com> wrote:
> >>>>>>>>
> >>>>>>>> David,
> >>>>>>>>
> >>>>>>>> I apologize I did not mention those things explicitly. All the
> places
> >>>>>>> where
> >>>>>>>> sstable files are accessed directly would have to be refactored.
> >>>>>>>>
> >>>>>>>> Regarding TableMetrics - currently it includes many metrics, some
> of
> >>> them
> >>>>>>>> are unrelated to sstables at all, but there are metrics which are
> >>>>>>> specific
> >>>>>>>> to the current sstable format, like metrics related to index
> >>> summaries or
> >>>>>>>> bloom filters. The created gauges query certain methods on sstable
> >>>>>>> reader -
> >>>>>>>> I think the only common metrics for sstables we can leave in
> >>> TableMetrics
> >>>>>>>> are those for which there are query methods in generic sstable
> >>> interface.
> >>>>>>>> Other metrics, specific to the certain sstable format should be
> >>>>>>> registered
> >>>>>>>> by the implementation itself.
> >>>>>>>>
> >>>>>>>> Speaking about the implementation, one idea I was thinking about
> was
> >>> that
> >>>>>>>> the factories for formats are registered using Java's native
> service
> >>>>>>>> loader. This way we could get the list of all the factories on the
> >>>>>>>> classpath and call some method, like `registerMetrics` during
> system
> >>>>>>>> initialization. That could be also implemented in static
> initializer
> >>> in
> >>>>>>> the
> >>>>>>>> factory but it would make it less obvious for the implementors
> where
> >>> such
> >>>>>>>> initialization should be done.
> >>>>>>>>
> >>>>>>>> I hope this explains a bit
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Jacek
> >>>>>>>
> >>>>>>>
> >>>>>>>
> ---------------------------------------------------------------------
> >>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>>
> >>>>
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>
> >>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>
> >>>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>

Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

Posted by "benedict@apache.org" <be...@apache.org>.
> I may be wrong here, but the CEP directly calls out making this api public for people who wish to replace the SSTable format

I don’t think this implies API stability. For starters, it doesn’t stipulate that these implementations will be supported out of tree (the only one I’m aware of, so far as I understand, is intended to be incubated in tree), nor does an API for external usage have to be stable. It’s fine to create an API and tell users it’s unstable, and that they should closely monitor patch version changes if they use it.

That said, norms may be changing around what can go into patch releases anyhow, so this may be a lot of noise about nothing. If all new development goes into trunk, then it’s all moot. But I don’t think we can make hard assumptions about that today, as historically these sorts of intentions haven’t lasted.

I’m fairly against the idea of introducing hard restrictions on this, and potentially ossifying the codebase. I’m not keen to even consider out of tree consumers of these APIs in any way, for compatibility, upgradeability or anything. There’s a lot that needs to be done over the coming years to improve the internal structure of the project, and unduly entrenching the current state of affairs would be a huge potential harm of these efforts to modularise the codebase.

From: David Capwell <dc...@apple.com.INVALID>
Date: Tuesday, 9 November 2021 at 23:38
To: dev@cassandra.apache.org <de...@cassandra.apache.org>
Subject: Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)
> My understanding is that the only interface that is expected to be stable for external consumers is the secondary index API

I may be wrong here, but the CEP directly calls out making this api public for people who wish to replace the SSTable format ("Cassandra developers who want to develop and publish different file format implementations."), so if we need to support 2i API, why would we not support SSTable API as well?

> All of the other mentioned APIs are in my opinion for internal usage only

This gets back to my point; it is currently tribal knowledge what needs to work and what doesn’t, and without the broader set of committers knowing this then the likely hood any new API will break in a minor is high.

> On Nov 9, 2021, at 12:13 PM, benedict@apache.org wrote:
>
> I agree that we don’t need to block the CEP on this, and that we should have that discussion. But it’s worth noting that the CEP should not anticipate or depend on any specific outcome of that discussion.
>
> Since it is somewhat relevant for this discussion, my view is that no interface should be assumed to be stable without the prior explicit agreement of the community.
>
> My understanding is that the only interface that is expected to be stable for external consumers is the secondary index API. Perhaps also snitches? But also perhaps not, as the difficulty of upgrading these at the same time is pretty low for custom snitches. All of the other mentioned APIs are in my opinion for internal usage only, so users should not assume compile time compatibility across any release, and I am certain we have never tried to maintained this. This still facilitates forks of course, by localising the compatibility work.
>
>
> From: Jeremiah D Jordan <je...@gmail.com>
> Date: Tuesday, 9 November 2021 at 19:43
> To: Cassandra DEV <de...@cassandra.apache.org>
> Subject: Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)
> I would love to have this discussion and setup annotations or similar to formalize things.  I just do not think we need to hold any up CEPs to do so.  That discussion should possibly be a CEP of its own proposing how we want to formalize interfaces?  I would be happy to go through and try to put together something for that or since you feel so strongly about it maybe you want to David?  At the very least it should get its own DISCUSS thread and then be written up in the wiki.
>
> -Jeremiah
>
>> On Nov 9, 2021, at 1:06 PM, Joshua McKenzie <jm...@apache.org> wrote:
>>
>>>
>>> trunk -> anything goes, not trunk -> try not to change these interfaces
>>
>> Have we ever clarified what "these interfaces" are? Was just talking to
>> David and I realized I didn't even JavaDoc CommitLogReadHandler as _being
>> designed_ for external usage. /sigh
>>
>> I think it'd be valuable for us to go through the codebase and annotate
>> interfaces as intended to be exposed to 3rd parties; this has bothered me
>> for years. Especially as we come up on a large number of new cleanups,
>> refactorings, and potentially genericizing some subsystems into API's
>> (CEP-18 descendents).
>>
>>
>> On Tue, Nov 9, 2021 at 2:01 PM David Capwell <dc...@apple.com.invalid>
>> wrote:
>>
>>>> We already have many interfaces similar to these for Compaction
>>> Strategy, Indexing, Query Handler.
>>>
>>> Today-I-Learned QueryHandler is not allowed to be touched in a minor… good
>>> to know…
>>>
>>>> not trunk -> try not to change these interfaces
>>>
>>> Outside of MBeans, I honestly do not know what interfaces fall into this
>>> group; and for MBeans we have tests which block breaking changes.  The
>>> point I am making is that not everyone is aware of the rules, so having
>>> something in place to help enforce such rules should be thought about; if
>>> we want to add pluggable hooks with the intent that external parties can
>>> leverage such hooks, we should also add to the scope the maintenance of
>>> these interfaces (we should not assume “tribal knowledge” will work).
>>>
>>> I am not trying to ask for something large or something requiring a ton of
>>> work, I am just asking that this gets thought about during the project so
>>> it doesn’t get neglected.  This could be as simple as an annotation like
>>> @ExposedTo3rdParties (Hadoop does this to show an interface is exposed and
>>> must be maintained), or it could be something like split directories
>>> (src/java = private, src/java-exposed = public); I am trying not to dictate
>>> an implementation, only trying to make sure we are setup to support the CEP
>>> after the work is done.
>>>
>>>
>>>> On Nov 9, 2021, at 9:52 AM, Jeremiah D Jordan <je...@gmail.com>
>>> wrote:
>>>>
>>>> We already have many interfaces similar to these for Compaction
>>> Strategy, Indexing, Query Handler.  I would hope that commiters are already
>>> following a policy along the lines of trunk -> anything goes, not trunk ->
>>> try not to change these interfaces.  I would expect that to be the same
>>> policy for any new internal interfaces that are added.  But given we
>>> already have many such interfaces, I see no reason to block adding more of
>>> them while change policies are discussed.
>>>>
>>>> -Jeremiah
>>>>
>>>>> On Nov 9, 2021, at 10:44 AM, David Capwell <dc...@apple.com.INVALID>
>>> wrote:
>>>>>
>>>>> I still have one outstanding comment, but this is a comment for several
>>> of the CEPs being worked on
>>>>>
>>>>>> And last comment, which I have also done in the other modularity
>>> thread… backwards compatibility and maintenance. It is not clear right now
>>> what java interfaces may not break and how we can maintain and extend such
>>> interfaces in the future.  If the goal is to allow 3rd parties to plugin
>>> and offer new SSTable formats, are we as a project ok with having a minor
>>> release do a binary or source non-compatible change?  If not how do we
>>> detect this?  Until this problem is solved, I do not think we should add
>>> any such interfaces.
>>>>>
>>>>> I would love some clarity on this.  Specifically, if we assume a patch
>>> author/reviewers are not familiar with the impact of changes these
>>> interfaces, what happens?  Do we have tools to block this? Do we require
>>> 3rd party authors to create massive shims to deal with every patch level
>>> version out there?  I would love more clarity on how we maintain these new
>>> pluggable interfaces.
>>>>>
>>>>>> On Nov 9, 2021, at 4:45 AM, Branimir Lambov <bl...@apache.org>
>>> wrote:
>>>>>>
>>>>>> Does anyone have any further comments or questions on the proposal, or
>>> are
>>>>>> we ready to  move forward to a vote?
>>>>>>
>>>>>> Regards,
>>>>>> Branimir
>>>>>>
>>>>>> On Tue, Nov 2, 2021 at 7:15 PM David Capwell
>>> <dc...@apple.com.invalid>
>>>>>> wrote:
>>>>>>
>>>>>>>> I apologize I did not mention those things explicitly. All the places
>>>>>>> where
>>>>>>>> sstable files are accessed directly would have to be refactored.
>>>>>>>
>>>>>>> Works for me
>>>>>>>
>>>>>>>> Speaking about the implementation, one idea I was thinking about was
>>> that
>>>>>>>> the factories for formats are registered using Java's native service
>>>>>>>> loader.
>>>>>>>
>>>>>>> I am a fan of ServiceLoader as a means of plugging in.
>>>>>>>
>>>>>>>> I hope this explains a bit
>>>>>>>
>>>>>>> Yep; thanks!
>>>>>>>
>>>>>>>> On Nov 2, 2021, at 1:46 AM, Jacek Lewandowski <
>>>>>>> lewandowski.jacek@gmail.com> wrote:
>>>>>>>>
>>>>>>>> David,
>>>>>>>>
>>>>>>>> I apologize I did not mention those things explicitly. All the places
>>>>>>> where
>>>>>>>> sstable files are accessed directly would have to be refactored.
>>>>>>>>
>>>>>>>> Regarding TableMetrics - currently it includes many metrics, some of
>>> them
>>>>>>>> are unrelated to sstables at all, but there are metrics which are
>>>>>>> specific
>>>>>>>> to the current sstable format, like metrics related to index
>>> summaries or
>>>>>>>> bloom filters. The created gauges query certain methods on sstable
>>>>>>> reader -
>>>>>>>> I think the only common metrics for sstables we can leave in
>>> TableMetrics
>>>>>>>> are those for which there are query methods in generic sstable
>>> interface.
>>>>>>>> Other metrics, specific to the certain sstable format should be
>>>>>>> registered
>>>>>>>> by the implementation itself.
>>>>>>>>
>>>>>>>> Speaking about the implementation, one idea I was thinking about was
>>> that
>>>>>>>> the factories for formats are registered using Java's native service
>>>>>>>> loader. This way we could get the list of all the factories on the
>>>>>>>> classpath and call some method, like `registerMetrics` during system
>>>>>>>> initialization. That could be also implemented in static initializer
>>> in
>>>>>>> the
>>>>>>>> factory but it would make it less obvious for the implementors where
>>> such
>>>>>>>> initialization should be done.
>>>>>>>>
>>>>>>>> I hope this explains a bit
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Jacek
>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>
>>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

Posted by David Capwell <dc...@apple.com.INVALID>.
> My understanding is that the only interface that is expected to be stable for external consumers is the secondary index API

I may be wrong here, but the CEP directly calls out making this api public for people who wish to replace the SSTable format ("Cassandra developers who want to develop and publish different file format implementations."), so if we need to support 2i API, why would we not support SSTable API as well?

> All of the other mentioned APIs are in my opinion for internal usage only

This gets back to my point; it is currently tribal knowledge what needs to work and what doesn’t, and without the broader set of committers knowing this then the likely hood any new API will break in a minor is high.

> On Nov 9, 2021, at 12:13 PM, benedict@apache.org wrote:
> 
> I agree that we don’t need to block the CEP on this, and that we should have that discussion. But it’s worth noting that the CEP should not anticipate or depend on any specific outcome of that discussion.
> 
> Since it is somewhat relevant for this discussion, my view is that no interface should be assumed to be stable without the prior explicit agreement of the community.
> 
> My understanding is that the only interface that is expected to be stable for external consumers is the secondary index API. Perhaps also snitches? But also perhaps not, as the difficulty of upgrading these at the same time is pretty low for custom snitches. All of the other mentioned APIs are in my opinion for internal usage only, so users should not assume compile time compatibility across any release, and I am certain we have never tried to maintained this. This still facilitates forks of course, by localising the compatibility work.
> 
> 
> From: Jeremiah D Jordan <je...@gmail.com>
> Date: Tuesday, 9 November 2021 at 19:43
> To: Cassandra DEV <de...@cassandra.apache.org>
> Subject: Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)
> I would love to have this discussion and setup annotations or similar to formalize things.  I just do not think we need to hold any up CEPs to do so.  That discussion should possibly be a CEP of its own proposing how we want to formalize interfaces?  I would be happy to go through and try to put together something for that or since you feel so strongly about it maybe you want to David?  At the very least it should get its own DISCUSS thread and then be written up in the wiki.
> 
> -Jeremiah
> 
>> On Nov 9, 2021, at 1:06 PM, Joshua McKenzie <jm...@apache.org> wrote:
>> 
>>> 
>>> trunk -> anything goes, not trunk -> try not to change these interfaces
>> 
>> Have we ever clarified what "these interfaces" are? Was just talking to
>> David and I realized I didn't even JavaDoc CommitLogReadHandler as _being
>> designed_ for external usage. /sigh
>> 
>> I think it'd be valuable for us to go through the codebase and annotate
>> interfaces as intended to be exposed to 3rd parties; this has bothered me
>> for years. Especially as we come up on a large number of new cleanups,
>> refactorings, and potentially genericizing some subsystems into API's
>> (CEP-18 descendents).
>> 
>> 
>> On Tue, Nov 9, 2021 at 2:01 PM David Capwell <dc...@apple.com.invalid>
>> wrote:
>> 
>>>> We already have many interfaces similar to these for Compaction
>>> Strategy, Indexing, Query Handler.
>>> 
>>> Today-I-Learned QueryHandler is not allowed to be touched in a minor… good
>>> to know…
>>> 
>>>> not trunk -> try not to change these interfaces
>>> 
>>> Outside of MBeans, I honestly do not know what interfaces fall into this
>>> group; and for MBeans we have tests which block breaking changes.  The
>>> point I am making is that not everyone is aware of the rules, so having
>>> something in place to help enforce such rules should be thought about; if
>>> we want to add pluggable hooks with the intent that external parties can
>>> leverage such hooks, we should also add to the scope the maintenance of
>>> these interfaces (we should not assume “tribal knowledge” will work).
>>> 
>>> I am not trying to ask for something large or something requiring a ton of
>>> work, I am just asking that this gets thought about during the project so
>>> it doesn’t get neglected.  This could be as simple as an annotation like
>>> @ExposedTo3rdParties (Hadoop does this to show an interface is exposed and
>>> must be maintained), or it could be something like split directories
>>> (src/java = private, src/java-exposed = public); I am trying not to dictate
>>> an implementation, only trying to make sure we are setup to support the CEP
>>> after the work is done.
>>> 
>>> 
>>>> On Nov 9, 2021, at 9:52 AM, Jeremiah D Jordan <je...@gmail.com>
>>> wrote:
>>>> 
>>>> We already have many interfaces similar to these for Compaction
>>> Strategy, Indexing, Query Handler.  I would hope that commiters are already
>>> following a policy along the lines of trunk -> anything goes, not trunk ->
>>> try not to change these interfaces.  I would expect that to be the same
>>> policy for any new internal interfaces that are added.  But given we
>>> already have many such interfaces, I see no reason to block adding more of
>>> them while change policies are discussed.
>>>> 
>>>> -Jeremiah
>>>> 
>>>>> On Nov 9, 2021, at 10:44 AM, David Capwell <dc...@apple.com.INVALID>
>>> wrote:
>>>>> 
>>>>> I still have one outstanding comment, but this is a comment for several
>>> of the CEPs being worked on
>>>>> 
>>>>>> And last comment, which I have also done in the other modularity
>>> thread… backwards compatibility and maintenance. It is not clear right now
>>> what java interfaces may not break and how we can maintain and extend such
>>> interfaces in the future.  If the goal is to allow 3rd parties to plugin
>>> and offer new SSTable formats, are we as a project ok with having a minor
>>> release do a binary or source non-compatible change?  If not how do we
>>> detect this?  Until this problem is solved, I do not think we should add
>>> any such interfaces.
>>>>> 
>>>>> I would love some clarity on this.  Specifically, if we assume a patch
>>> author/reviewers are not familiar with the impact of changes these
>>> interfaces, what happens?  Do we have tools to block this? Do we require
>>> 3rd party authors to create massive shims to deal with every patch level
>>> version out there?  I would love more clarity on how we maintain these new
>>> pluggable interfaces.
>>>>> 
>>>>>> On Nov 9, 2021, at 4:45 AM, Branimir Lambov <bl...@apache.org>
>>> wrote:
>>>>>> 
>>>>>> Does anyone have any further comments or questions on the proposal, or
>>> are
>>>>>> we ready to  move forward to a vote?
>>>>>> 
>>>>>> Regards,
>>>>>> Branimir
>>>>>> 
>>>>>> On Tue, Nov 2, 2021 at 7:15 PM David Capwell
>>> <dc...@apple.com.invalid>
>>>>>> wrote:
>>>>>> 
>>>>>>>> I apologize I did not mention those things explicitly. All the places
>>>>>>> where
>>>>>>>> sstable files are accessed directly would have to be refactored.
>>>>>>> 
>>>>>>> Works for me
>>>>>>> 
>>>>>>>> Speaking about the implementation, one idea I was thinking about was
>>> that
>>>>>>>> the factories for formats are registered using Java's native service
>>>>>>>> loader.
>>>>>>> 
>>>>>>> I am a fan of ServiceLoader as a means of plugging in.
>>>>>>> 
>>>>>>>> I hope this explains a bit
>>>>>>> 
>>>>>>> Yep; thanks!
>>>>>>> 
>>>>>>>> On Nov 2, 2021, at 1:46 AM, Jacek Lewandowski <
>>>>>>> lewandowski.jacek@gmail.com> wrote:
>>>>>>>> 
>>>>>>>> David,
>>>>>>>> 
>>>>>>>> I apologize I did not mention those things explicitly. All the places
>>>>>>> where
>>>>>>>> sstable files are accessed directly would have to be refactored.
>>>>>>>> 
>>>>>>>> Regarding TableMetrics - currently it includes many metrics, some of
>>> them
>>>>>>>> are unrelated to sstables at all, but there are metrics which are
>>>>>>> specific
>>>>>>>> to the current sstable format, like metrics related to index
>>> summaries or
>>>>>>>> bloom filters. The created gauges query certain methods on sstable
>>>>>>> reader -
>>>>>>>> I think the only common metrics for sstables we can leave in
>>> TableMetrics
>>>>>>>> are those for which there are query methods in generic sstable
>>> interface.
>>>>>>>> Other metrics, specific to the certain sstable format should be
>>>>>>> registered
>>>>>>>> by the implementation itself.
>>>>>>>> 
>>>>>>>> Speaking about the implementation, one idea I was thinking about was
>>> that
>>>>>>>> the factories for formats are registered using Java's native service
>>>>>>>> loader. This way we could get the list of all the factories on the
>>>>>>>> classpath and call some method, like `registerMetrics` during system
>>>>>>>> initialization. That could be also implemented in static initializer
>>> in
>>>>>>> the
>>>>>>>> factory but it would make it less obvious for the implementors where
>>> such
>>>>>>>> initialization should be done.
>>>>>>>> 
>>>>>>>> I hope this explains a bit
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Jacek
>>>>>>> 
>>>>>>> 
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>> 
>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>> 
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>> 
>>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org


Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

Posted by "benedict@apache.org" <be...@apache.org>.
I agree that we don’t need to block the CEP on this, and that we should have that discussion. But it’s worth noting that the CEP should not anticipate or depend on any specific outcome of that discussion.

Since it is somewhat relevant for this discussion, my view is that no interface should be assumed to be stable without the prior explicit agreement of the community.

My understanding is that the only interface that is expected to be stable for external consumers is the secondary index API. Perhaps also snitches? But also perhaps not, as the difficulty of upgrading these at the same time is pretty low for custom snitches. All of the other mentioned APIs are in my opinion for internal usage only, so users should not assume compile time compatibility across any release, and I am certain we have never tried to maintained this. This still facilitates forks of course, by localising the compatibility work.


From: Jeremiah D Jordan <je...@gmail.com>
Date: Tuesday, 9 November 2021 at 19:43
To: Cassandra DEV <de...@cassandra.apache.org>
Subject: Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)
I would love to have this discussion and setup annotations or similar to formalize things.  I just do not think we need to hold any up CEPs to do so.  That discussion should possibly be a CEP of its own proposing how we want to formalize interfaces?  I would be happy to go through and try to put together something for that or since you feel so strongly about it maybe you want to David?  At the very least it should get its own DISCUSS thread and then be written up in the wiki.

-Jeremiah

> On Nov 9, 2021, at 1:06 PM, Joshua McKenzie <jm...@apache.org> wrote:
>
>>
>> trunk -> anything goes, not trunk -> try not to change these interfaces
>
> Have we ever clarified what "these interfaces" are? Was just talking to
> David and I realized I didn't even JavaDoc CommitLogReadHandler as _being
> designed_ for external usage. /sigh
>
> I think it'd be valuable for us to go through the codebase and annotate
> interfaces as intended to be exposed to 3rd parties; this has bothered me
> for years. Especially as we come up on a large number of new cleanups,
> refactorings, and potentially genericizing some subsystems into API's
> (CEP-18 descendents).
>
>
> On Tue, Nov 9, 2021 at 2:01 PM David Capwell <dc...@apple.com.invalid>
> wrote:
>
>>> We already have many interfaces similar to these for Compaction
>> Strategy, Indexing, Query Handler.
>>
>> Today-I-Learned QueryHandler is not allowed to be touched in a minor… good
>> to know…
>>
>>> not trunk -> try not to change these interfaces
>>
>> Outside of MBeans, I honestly do not know what interfaces fall into this
>> group; and for MBeans we have tests which block breaking changes.  The
>> point I am making is that not everyone is aware of the rules, so having
>> something in place to help enforce such rules should be thought about; if
>> we want to add pluggable hooks with the intent that external parties can
>> leverage such hooks, we should also add to the scope the maintenance of
>> these interfaces (we should not assume “tribal knowledge” will work).
>>
>> I am not trying to ask for something large or something requiring a ton of
>> work, I am just asking that this gets thought about during the project so
>> it doesn’t get neglected.  This could be as simple as an annotation like
>> @ExposedTo3rdParties (Hadoop does this to show an interface is exposed and
>> must be maintained), or it could be something like split directories
>> (src/java = private, src/java-exposed = public); I am trying not to dictate
>> an implementation, only trying to make sure we are setup to support the CEP
>> after the work is done.
>>
>>
>>> On Nov 9, 2021, at 9:52 AM, Jeremiah D Jordan <je...@gmail.com>
>> wrote:
>>>
>>> We already have many interfaces similar to these for Compaction
>> Strategy, Indexing, Query Handler.  I would hope that commiters are already
>> following a policy along the lines of trunk -> anything goes, not trunk ->
>> try not to change these interfaces.  I would expect that to be the same
>> policy for any new internal interfaces that are added.  But given we
>> already have many such interfaces, I see no reason to block adding more of
>> them while change policies are discussed.
>>>
>>> -Jeremiah
>>>
>>>> On Nov 9, 2021, at 10:44 AM, David Capwell <dc...@apple.com.INVALID>
>> wrote:
>>>>
>>>> I still have one outstanding comment, but this is a comment for several
>> of the CEPs being worked on
>>>>
>>>>> And last comment, which I have also done in the other modularity
>> thread… backwards compatibility and maintenance. It is not clear right now
>> what java interfaces may not break and how we can maintain and extend such
>> interfaces in the future.  If the goal is to allow 3rd parties to plugin
>> and offer new SSTable formats, are we as a project ok with having a minor
>> release do a binary or source non-compatible change?  If not how do we
>> detect this?  Until this problem is solved, I do not think we should add
>> any such interfaces.
>>>>
>>>> I would love some clarity on this.  Specifically, if we assume a patch
>> author/reviewers are not familiar with the impact of changes these
>> interfaces, what happens?  Do we have tools to block this? Do we require
>> 3rd party authors to create massive shims to deal with every patch level
>> version out there?  I would love more clarity on how we maintain these new
>> pluggable interfaces.
>>>>
>>>>> On Nov 9, 2021, at 4:45 AM, Branimir Lambov <bl...@apache.org>
>> wrote:
>>>>>
>>>>> Does anyone have any further comments or questions on the proposal, or
>> are
>>>>> we ready to  move forward to a vote?
>>>>>
>>>>> Regards,
>>>>> Branimir
>>>>>
>>>>> On Tue, Nov 2, 2021 at 7:15 PM David Capwell
>> <dc...@apple.com.invalid>
>>>>> wrote:
>>>>>
>>>>>>> I apologize I did not mention those things explicitly. All the places
>>>>>> where
>>>>>>> sstable files are accessed directly would have to be refactored.
>>>>>>
>>>>>> Works for me
>>>>>>
>>>>>>> Speaking about the implementation, one idea I was thinking about was
>> that
>>>>>>> the factories for formats are registered using Java's native service
>>>>>>> loader.
>>>>>>
>>>>>> I am a fan of ServiceLoader as a means of plugging in.
>>>>>>
>>>>>>> I hope this explains a bit
>>>>>>
>>>>>> Yep; thanks!
>>>>>>
>>>>>>> On Nov 2, 2021, at 1:46 AM, Jacek Lewandowski <
>>>>>> lewandowski.jacek@gmail.com> wrote:
>>>>>>>
>>>>>>> David,
>>>>>>>
>>>>>>> I apologize I did not mention those things explicitly. All the places
>>>>>> where
>>>>>>> sstable files are accessed directly would have to be refactored.
>>>>>>>
>>>>>>> Regarding TableMetrics - currently it includes many metrics, some of
>> them
>>>>>>> are unrelated to sstables at all, but there are metrics which are
>>>>>> specific
>>>>>>> to the current sstable format, like metrics related to index
>> summaries or
>>>>>>> bloom filters. The created gauges query certain methods on sstable
>>>>>> reader -
>>>>>>> I think the only common metrics for sstables we can leave in
>> TableMetrics
>>>>>>> are those for which there are query methods in generic sstable
>> interface.
>>>>>>> Other metrics, specific to the certain sstable format should be
>>>>>> registered
>>>>>>> by the implementation itself.
>>>>>>>
>>>>>>> Speaking about the implementation, one idea I was thinking about was
>> that
>>>>>>> the factories for formats are registered using Java's native service
>>>>>>> loader. This way we could get the list of all the factories on the
>>>>>>> classpath and call some method, like `registerMetrics` during system
>>>>>>> initialization. That could be also implemented in static initializer
>> in
>>>>>> the
>>>>>>> factory but it would make it less obvious for the implementors where
>> such
>>>>>>> initialization should be done.
>>>>>>>
>>>>>>> I hope this explains a bit
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Jacek
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>>
>>>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

Posted by David Capwell <dc...@apple.com.INVALID>.
> I would be happy to go through and try to put together something for that ...  At the very least it should get its own DISCUSS thread and then be written up in the wiki.

+1. Thanks.

> On Nov 9, 2021, at 11:43 AM, Jeremiah D Jordan <je...@gmail.com> wrote:
> 
> I would love to have this discussion and setup annotations or similar to formalize things.  I just do not think we need to hold any up CEPs to do so.  That discussion should possibly be a CEP of its own proposing how we want to formalize interfaces?  I would be happy to go through and try to put together something for that or since you feel so strongly about it maybe you want to David?  At the very least it should get its own DISCUSS thread and then be written up in the wiki.
> 
> -Jeremiah
> 
>> On Nov 9, 2021, at 1:06 PM, Joshua McKenzie <jm...@apache.org> wrote:
>> 
>>> 
>>> trunk -> anything goes, not trunk -> try not to change these interfaces
>> 
>> Have we ever clarified what "these interfaces" are? Was just talking to
>> David and I realized I didn't even JavaDoc CommitLogReadHandler as _being
>> designed_ for external usage. /sigh
>> 
>> I think it'd be valuable for us to go through the codebase and annotate
>> interfaces as intended to be exposed to 3rd parties; this has bothered me
>> for years. Especially as we come up on a large number of new cleanups,
>> refactorings, and potentially genericizing some subsystems into API's
>> (CEP-18 descendents).
>> 
>> 
>> On Tue, Nov 9, 2021 at 2:01 PM David Capwell <dc...@apple.com.invalid>
>> wrote:
>> 
>>>> We already have many interfaces similar to these for Compaction
>>> Strategy, Indexing, Query Handler.
>>> 
>>> Today-I-Learned QueryHandler is not allowed to be touched in a minor… good
>>> to know…
>>> 
>>>> not trunk -> try not to change these interfaces
>>> 
>>> Outside of MBeans, I honestly do not know what interfaces fall into this
>>> group; and for MBeans we have tests which block breaking changes.  The
>>> point I am making is that not everyone is aware of the rules, so having
>>> something in place to help enforce such rules should be thought about; if
>>> we want to add pluggable hooks with the intent that external parties can
>>> leverage such hooks, we should also add to the scope the maintenance of
>>> these interfaces (we should not assume “tribal knowledge” will work).
>>> 
>>> I am not trying to ask for something large or something requiring a ton of
>>> work, I am just asking that this gets thought about during the project so
>>> it doesn’t get neglected.  This could be as simple as an annotation like
>>> @ExposedTo3rdParties (Hadoop does this to show an interface is exposed and
>>> must be maintained), or it could be something like split directories
>>> (src/java = private, src/java-exposed = public); I am trying not to dictate
>>> an implementation, only trying to make sure we are setup to support the CEP
>>> after the work is done.
>>> 
>>> 
>>>> On Nov 9, 2021, at 9:52 AM, Jeremiah D Jordan <je...@gmail.com>
>>> wrote:
>>>> 
>>>> We already have many interfaces similar to these for Compaction
>>> Strategy, Indexing, Query Handler.  I would hope that commiters are already
>>> following a policy along the lines of trunk -> anything goes, not trunk ->
>>> try not to change these interfaces.  I would expect that to be the same
>>> policy for any new internal interfaces that are added.  But given we
>>> already have many such interfaces, I see no reason to block adding more of
>>> them while change policies are discussed.
>>>> 
>>>> -Jeremiah
>>>> 
>>>>> On Nov 9, 2021, at 10:44 AM, David Capwell <dc...@apple.com.INVALID>
>>> wrote:
>>>>> 
>>>>> I still have one outstanding comment, but this is a comment for several
>>> of the CEPs being worked on
>>>>> 
>>>>>> And last comment, which I have also done in the other modularity
>>> thread… backwards compatibility and maintenance. It is not clear right now
>>> what java interfaces may not break and how we can maintain and extend such
>>> interfaces in the future.  If the goal is to allow 3rd parties to plugin
>>> and offer new SSTable formats, are we as a project ok with having a minor
>>> release do a binary or source non-compatible change?  If not how do we
>>> detect this?  Until this problem is solved, I do not think we should add
>>> any such interfaces.
>>>>> 
>>>>> I would love some clarity on this.  Specifically, if we assume a patch
>>> author/reviewers are not familiar with the impact of changes these
>>> interfaces, what happens?  Do we have tools to block this? Do we require
>>> 3rd party authors to create massive shims to deal with every patch level
>>> version out there?  I would love more clarity on how we maintain these new
>>> pluggable interfaces.
>>>>> 
>>>>>> On Nov 9, 2021, at 4:45 AM, Branimir Lambov <bl...@apache.org>
>>> wrote:
>>>>>> 
>>>>>> Does anyone have any further comments or questions on the proposal, or
>>> are
>>>>>> we ready to  move forward to a vote?
>>>>>> 
>>>>>> Regards,
>>>>>> Branimir
>>>>>> 
>>>>>> On Tue, Nov 2, 2021 at 7:15 PM David Capwell
>>> <dc...@apple.com.invalid>
>>>>>> wrote:
>>>>>> 
>>>>>>>> I apologize I did not mention those things explicitly. All the places
>>>>>>> where
>>>>>>>> sstable files are accessed directly would have to be refactored.
>>>>>>> 
>>>>>>> Works for me
>>>>>>> 
>>>>>>>> Speaking about the implementation, one idea I was thinking about was
>>> that
>>>>>>>> the factories for formats are registered using Java's native service
>>>>>>>> loader.
>>>>>>> 
>>>>>>> I am a fan of ServiceLoader as a means of plugging in.
>>>>>>> 
>>>>>>>> I hope this explains a bit
>>>>>>> 
>>>>>>> Yep; thanks!
>>>>>>> 
>>>>>>>> On Nov 2, 2021, at 1:46 AM, Jacek Lewandowski <
>>>>>>> lewandowski.jacek@gmail.com> wrote:
>>>>>>>> 
>>>>>>>> David,
>>>>>>>> 
>>>>>>>> I apologize I did not mention those things explicitly. All the places
>>>>>>> where
>>>>>>>> sstable files are accessed directly would have to be refactored.
>>>>>>>> 
>>>>>>>> Regarding TableMetrics - currently it includes many metrics, some of
>>> them
>>>>>>>> are unrelated to sstables at all, but there are metrics which are
>>>>>>> specific
>>>>>>>> to the current sstable format, like metrics related to index
>>> summaries or
>>>>>>>> bloom filters. The created gauges query certain methods on sstable
>>>>>>> reader -
>>>>>>>> I think the only common metrics for sstables we can leave in
>>> TableMetrics
>>>>>>>> are those for which there are query methods in generic sstable
>>> interface.
>>>>>>>> Other metrics, specific to the certain sstable format should be
>>>>>>> registered
>>>>>>>> by the implementation itself.
>>>>>>>> 
>>>>>>>> Speaking about the implementation, one idea I was thinking about was
>>> that
>>>>>>>> the factories for formats are registered using Java's native service
>>>>>>>> loader. This way we could get the list of all the factories on the
>>>>>>>> classpath and call some method, like `registerMetrics` during system
>>>>>>>> initialization. That could be also implemented in static initializer
>>> in
>>>>>>> the
>>>>>>>> factory but it would make it less obvious for the implementors where
>>> such
>>>>>>>> initialization should be done.
>>>>>>>> 
>>>>>>>> I hope this explains a bit
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Jacek
>>>>>>> 
>>>>>>> 
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>> 
>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>> 
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>> 
>>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org


Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

Posted by Jeremiah D Jordan <je...@gmail.com>.
I would love to have this discussion and setup annotations or similar to formalize things.  I just do not think we need to hold any up CEPs to do so.  That discussion should possibly be a CEP of its own proposing how we want to formalize interfaces?  I would be happy to go through and try to put together something for that or since you feel so strongly about it maybe you want to David?  At the very least it should get its own DISCUSS thread and then be written up in the wiki.

-Jeremiah

> On Nov 9, 2021, at 1:06 PM, Joshua McKenzie <jm...@apache.org> wrote:
> 
>> 
>> trunk -> anything goes, not trunk -> try not to change these interfaces
> 
> Have we ever clarified what "these interfaces" are? Was just talking to
> David and I realized I didn't even JavaDoc CommitLogReadHandler as _being
> designed_ for external usage. /sigh
> 
> I think it'd be valuable for us to go through the codebase and annotate
> interfaces as intended to be exposed to 3rd parties; this has bothered me
> for years. Especially as we come up on a large number of new cleanups,
> refactorings, and potentially genericizing some subsystems into API's
> (CEP-18 descendents).
> 
> 
> On Tue, Nov 9, 2021 at 2:01 PM David Capwell <dc...@apple.com.invalid>
> wrote:
> 
>>> We already have many interfaces similar to these for Compaction
>> Strategy, Indexing, Query Handler.
>> 
>> Today-I-Learned QueryHandler is not allowed to be touched in a minor… good
>> to know…
>> 
>>> not trunk -> try not to change these interfaces
>> 
>> Outside of MBeans, I honestly do not know what interfaces fall into this
>> group; and for MBeans we have tests which block breaking changes.  The
>> point I am making is that not everyone is aware of the rules, so having
>> something in place to help enforce such rules should be thought about; if
>> we want to add pluggable hooks with the intent that external parties can
>> leverage such hooks, we should also add to the scope the maintenance of
>> these interfaces (we should not assume “tribal knowledge” will work).
>> 
>> I am not trying to ask for something large or something requiring a ton of
>> work, I am just asking that this gets thought about during the project so
>> it doesn’t get neglected.  This could be as simple as an annotation like
>> @ExposedTo3rdParties (Hadoop does this to show an interface is exposed and
>> must be maintained), or it could be something like split directories
>> (src/java = private, src/java-exposed = public); I am trying not to dictate
>> an implementation, only trying to make sure we are setup to support the CEP
>> after the work is done.
>> 
>> 
>>> On Nov 9, 2021, at 9:52 AM, Jeremiah D Jordan <je...@gmail.com>
>> wrote:
>>> 
>>> We already have many interfaces similar to these for Compaction
>> Strategy, Indexing, Query Handler.  I would hope that commiters are already
>> following a policy along the lines of trunk -> anything goes, not trunk ->
>> try not to change these interfaces.  I would expect that to be the same
>> policy for any new internal interfaces that are added.  But given we
>> already have many such interfaces, I see no reason to block adding more of
>> them while change policies are discussed.
>>> 
>>> -Jeremiah
>>> 
>>>> On Nov 9, 2021, at 10:44 AM, David Capwell <dc...@apple.com.INVALID>
>> wrote:
>>>> 
>>>> I still have one outstanding comment, but this is a comment for several
>> of the CEPs being worked on
>>>> 
>>>>> And last comment, which I have also done in the other modularity
>> thread… backwards compatibility and maintenance. It is not clear right now
>> what java interfaces may not break and how we can maintain and extend such
>> interfaces in the future.  If the goal is to allow 3rd parties to plugin
>> and offer new SSTable formats, are we as a project ok with having a minor
>> release do a binary or source non-compatible change?  If not how do we
>> detect this?  Until this problem is solved, I do not think we should add
>> any such interfaces.
>>>> 
>>>> I would love some clarity on this.  Specifically, if we assume a patch
>> author/reviewers are not familiar with the impact of changes these
>> interfaces, what happens?  Do we have tools to block this? Do we require
>> 3rd party authors to create massive shims to deal with every patch level
>> version out there?  I would love more clarity on how we maintain these new
>> pluggable interfaces.
>>>> 
>>>>> On Nov 9, 2021, at 4:45 AM, Branimir Lambov <bl...@apache.org>
>> wrote:
>>>>> 
>>>>> Does anyone have any further comments or questions on the proposal, or
>> are
>>>>> we ready to  move forward to a vote?
>>>>> 
>>>>> Regards,
>>>>> Branimir
>>>>> 
>>>>> On Tue, Nov 2, 2021 at 7:15 PM David Capwell
>> <dc...@apple.com.invalid>
>>>>> wrote:
>>>>> 
>>>>>>> I apologize I did not mention those things explicitly. All the places
>>>>>> where
>>>>>>> sstable files are accessed directly would have to be refactored.
>>>>>> 
>>>>>> Works for me
>>>>>> 
>>>>>>> Speaking about the implementation, one idea I was thinking about was
>> that
>>>>>>> the factories for formats are registered using Java's native service
>>>>>>> loader.
>>>>>> 
>>>>>> I am a fan of ServiceLoader as a means of plugging in.
>>>>>> 
>>>>>>> I hope this explains a bit
>>>>>> 
>>>>>> Yep; thanks!
>>>>>> 
>>>>>>> On Nov 2, 2021, at 1:46 AM, Jacek Lewandowski <
>>>>>> lewandowski.jacek@gmail.com> wrote:
>>>>>>> 
>>>>>>> David,
>>>>>>> 
>>>>>>> I apologize I did not mention those things explicitly. All the places
>>>>>> where
>>>>>>> sstable files are accessed directly would have to be refactored.
>>>>>>> 
>>>>>>> Regarding TableMetrics - currently it includes many metrics, some of
>> them
>>>>>>> are unrelated to sstables at all, but there are metrics which are
>>>>>> specific
>>>>>>> to the current sstable format, like metrics related to index
>> summaries or
>>>>>>> bloom filters. The created gauges query certain methods on sstable
>>>>>> reader -
>>>>>>> I think the only common metrics for sstables we can leave in
>> TableMetrics
>>>>>>> are those for which there are query methods in generic sstable
>> interface.
>>>>>>> Other metrics, specific to the certain sstable format should be
>>>>>> registered
>>>>>>> by the implementation itself.
>>>>>>> 
>>>>>>> Speaking about the implementation, one idea I was thinking about was
>> that
>>>>>>> the factories for formats are registered using Java's native service
>>>>>>> loader. This way we could get the list of all the factories on the
>>>>>>> classpath and call some method, like `registerMetrics` during system
>>>>>>> initialization. That could be also implemented in static initializer
>> in
>>>>>> the
>>>>>>> factory but it would make it less obvious for the implementors where
>> such
>>>>>>> initialization should be done.
>>>>>>> 
>>>>>>> I hope this explains a bit
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Jacek
>>>>>> 
>>>>>> 
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>> 
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org


Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

Posted by Joshua McKenzie <jm...@apache.org>.
>
> trunk -> anything goes, not trunk -> try not to change these interfaces

Have we ever clarified what "these interfaces" are? Was just talking to
David and I realized I didn't even JavaDoc CommitLogReadHandler as _being
designed_ for external usage. /sigh

I think it'd be valuable for us to go through the codebase and annotate
interfaces as intended to be exposed to 3rd parties; this has bothered me
for years. Especially as we come up on a large number of new cleanups,
refactorings, and potentially genericizing some subsystems into API's
(CEP-18 descendents).


On Tue, Nov 9, 2021 at 2:01 PM David Capwell <dc...@apple.com.invalid>
wrote:

> > We already have many interfaces similar to these for Compaction
> Strategy, Indexing, Query Handler.
>
> Today-I-Learned QueryHandler is not allowed to be touched in a minor… good
> to know…
>
> > not trunk -> try not to change these interfaces
>
> Outside of MBeans, I honestly do not know what interfaces fall into this
> group; and for MBeans we have tests which block breaking changes.  The
> point I am making is that not everyone is aware of the rules, so having
> something in place to help enforce such rules should be thought about; if
> we want to add pluggable hooks with the intent that external parties can
> leverage such hooks, we should also add to the scope the maintenance of
> these interfaces (we should not assume “tribal knowledge” will work).
>
> I am not trying to ask for something large or something requiring a ton of
> work, I am just asking that this gets thought about during the project so
> it doesn’t get neglected.  This could be as simple as an annotation like
> @ExposedTo3rdParties (Hadoop does this to show an interface is exposed and
> must be maintained), or it could be something like split directories
> (src/java = private, src/java-exposed = public); I am trying not to dictate
> an implementation, only trying to make sure we are setup to support the CEP
> after the work is done.
>
>
> > On Nov 9, 2021, at 9:52 AM, Jeremiah D Jordan <je...@gmail.com>
> wrote:
> >
> > We already have many interfaces similar to these for Compaction
> Strategy, Indexing, Query Handler.  I would hope that commiters are already
> following a policy along the lines of trunk -> anything goes, not trunk ->
> try not to change these interfaces.  I would expect that to be the same
> policy for any new internal interfaces that are added.  But given we
> already have many such interfaces, I see no reason to block adding more of
> them while change policies are discussed.
> >
> > -Jeremiah
> >
> >> On Nov 9, 2021, at 10:44 AM, David Capwell <dc...@apple.com.INVALID>
> wrote:
> >>
> >> I still have one outstanding comment, but this is a comment for several
> of the CEPs being worked on
> >>
> >>> And last comment, which I have also done in the other modularity
> thread… backwards compatibility and maintenance. It is not clear right now
> what java interfaces may not break and how we can maintain and extend such
> interfaces in the future.  If the goal is to allow 3rd parties to plugin
> and offer new SSTable formats, are we as a project ok with having a minor
> release do a binary or source non-compatible change?  If not how do we
> detect this?  Until this problem is solved, I do not think we should add
> any such interfaces.
> >>
> >> I would love some clarity on this.  Specifically, if we assume a patch
> author/reviewers are not familiar with the impact of changes these
> interfaces, what happens?  Do we have tools to block this? Do we require
> 3rd party authors to create massive shims to deal with every patch level
> version out there?  I would love more clarity on how we maintain these new
> pluggable interfaces.
> >>
> >>> On Nov 9, 2021, at 4:45 AM, Branimir Lambov <bl...@apache.org>
> wrote:
> >>>
> >>> Does anyone have any further comments or questions on the proposal, or
> are
> >>> we ready to  move forward to a vote?
> >>>
> >>> Regards,
> >>> Branimir
> >>>
> >>> On Tue, Nov 2, 2021 at 7:15 PM David Capwell
> <dc...@apple.com.invalid>
> >>> wrote:
> >>>
> >>>>> I apologize I did not mention those things explicitly. All the places
> >>>> where
> >>>>> sstable files are accessed directly would have to be refactored.
> >>>>
> >>>> Works for me
> >>>>
> >>>>> Speaking about the implementation, one idea I was thinking about was
> that
> >>>>> the factories for formats are registered using Java's native service
> >>>>> loader.
> >>>>
> >>>> I am a fan of ServiceLoader as a means of plugging in.
> >>>>
> >>>>> I hope this explains a bit
> >>>>
> >>>> Yep; thanks!
> >>>>
> >>>>> On Nov 2, 2021, at 1:46 AM, Jacek Lewandowski <
> >>>> lewandowski.jacek@gmail.com> wrote:
> >>>>>
> >>>>> David,
> >>>>>
> >>>>> I apologize I did not mention those things explicitly. All the places
> >>>> where
> >>>>> sstable files are accessed directly would have to be refactored.
> >>>>>
> >>>>> Regarding TableMetrics - currently it includes many metrics, some of
> them
> >>>>> are unrelated to sstables at all, but there are metrics which are
> >>>> specific
> >>>>> to the current sstable format, like metrics related to index
> summaries or
> >>>>> bloom filters. The created gauges query certain methods on sstable
> >>>> reader -
> >>>>> I think the only common metrics for sstables we can leave in
> TableMetrics
> >>>>> are those for which there are query methods in generic sstable
> interface.
> >>>>> Other metrics, specific to the certain sstable format should be
> >>>> registered
> >>>>> by the implementation itself.
> >>>>>
> >>>>> Speaking about the implementation, one idea I was thinking about was
> that
> >>>>> the factories for formats are registered using Java's native service
> >>>>> loader. This way we could get the list of all the factories on the
> >>>>> classpath and call some method, like `registerMetrics` during system
> >>>>> initialization. That could be also implemented in static initializer
> in
> >>>> the
> >>>>> factory but it would make it less obvious for the implementors where
> such
> >>>>> initialization should be done.
> >>>>>
> >>>>> I hope this explains a bit
> >>>>>
> >>>>> Thanks,
> >>>>> Jacek
> >>>>
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>
> >>>>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

Posted by David Capwell <dc...@apple.com.INVALID>.
> We already have many interfaces similar to these for Compaction Strategy, Indexing, Query Handler.

Today-I-Learned QueryHandler is not allowed to be touched in a minor… good to know…

> not trunk -> try not to change these interfaces

Outside of MBeans, I honestly do not know what interfaces fall into this group; and for MBeans we have tests which block breaking changes.  The point I am making is that not everyone is aware of the rules, so having something in place to help enforce such rules should be thought about; if we want to add pluggable hooks with the intent that external parties can leverage such hooks, we should also add to the scope the maintenance of these interfaces (we should not assume “tribal knowledge” will work).

I am not trying to ask for something large or something requiring a ton of work, I am just asking that this gets thought about during the project so it doesn’t get neglected.  This could be as simple as an annotation like @ExposedTo3rdParties (Hadoop does this to show an interface is exposed and must be maintained), or it could be something like split directories (src/java = private, src/java-exposed = public); I am trying not to dictate an implementation, only trying to make sure we are setup to support the CEP after the work is done.


> On Nov 9, 2021, at 9:52 AM, Jeremiah D Jordan <je...@gmail.com> wrote:
> 
> We already have many interfaces similar to these for Compaction Strategy, Indexing, Query Handler.  I would hope that commiters are already following a policy along the lines of trunk -> anything goes, not trunk -> try not to change these interfaces.  I would expect that to be the same policy for any new internal interfaces that are added.  But given we already have many such interfaces, I see no reason to block adding more of them while change policies are discussed.
> 
> -Jeremiah
> 
>> On Nov 9, 2021, at 10:44 AM, David Capwell <dc...@apple.com.INVALID> wrote:
>> 
>> I still have one outstanding comment, but this is a comment for several of the CEPs being worked on
>> 
>>> And last comment, which I have also done in the other modularity thread… backwards compatibility and maintenance. It is not clear right now what java interfaces may not break and how we can maintain and extend such interfaces in the future.  If the goal is to allow 3rd parties to plugin and offer new SSTable formats, are we as a project ok with having a minor release do a binary or source non-compatible change?  If not how do we detect this?  Until this problem is solved, I do not think we should add any such interfaces.
>> 
>> I would love some clarity on this.  Specifically, if we assume a patch author/reviewers are not familiar with the impact of changes these interfaces, what happens?  Do we have tools to block this? Do we require 3rd party authors to create massive shims to deal with every patch level version out there?  I would love more clarity on how we maintain these new pluggable interfaces.
>> 
>>> On Nov 9, 2021, at 4:45 AM, Branimir Lambov <bl...@apache.org> wrote:
>>> 
>>> Does anyone have any further comments or questions on the proposal, or are
>>> we ready to  move forward to a vote?
>>> 
>>> Regards,
>>> Branimir
>>> 
>>> On Tue, Nov 2, 2021 at 7:15 PM David Capwell <dc...@apple.com.invalid>
>>> wrote:
>>> 
>>>>> I apologize I did not mention those things explicitly. All the places
>>>> where
>>>>> sstable files are accessed directly would have to be refactored.
>>>> 
>>>> Works for me
>>>> 
>>>>> Speaking about the implementation, one idea I was thinking about was that
>>>>> the factories for formats are registered using Java's native service
>>>>> loader.
>>>> 
>>>> I am a fan of ServiceLoader as a means of plugging in.
>>>> 
>>>>> I hope this explains a bit
>>>> 
>>>> Yep; thanks!
>>>> 
>>>>> On Nov 2, 2021, at 1:46 AM, Jacek Lewandowski <
>>>> lewandowski.jacek@gmail.com> wrote:
>>>>> 
>>>>> David,
>>>>> 
>>>>> I apologize I did not mention those things explicitly. All the places
>>>> where
>>>>> sstable files are accessed directly would have to be refactored.
>>>>> 
>>>>> Regarding TableMetrics - currently it includes many metrics, some of them
>>>>> are unrelated to sstables at all, but there are metrics which are
>>>> specific
>>>>> to the current sstable format, like metrics related to index summaries or
>>>>> bloom filters. The created gauges query certain methods on sstable
>>>> reader -
>>>>> I think the only common metrics for sstables we can leave in TableMetrics
>>>>> are those for which there are query methods in generic sstable interface.
>>>>> Other metrics, specific to the certain sstable format should be
>>>> registered
>>>>> by the implementation itself.
>>>>> 
>>>>> Speaking about the implementation, one idea I was thinking about was that
>>>>> the factories for formats are registered using Java's native service
>>>>> loader. This way we could get the list of all the factories on the
>>>>> classpath and call some method, like `registerMetrics` during system
>>>>> initialization. That could be also implemented in static initializer in
>>>> the
>>>>> factory but it would make it less obvious for the implementors where such
>>>>> initialization should be done.
>>>>> 
>>>>> I hope this explains a bit
>>>>> 
>>>>> Thanks,
>>>>> Jacek
>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>> 
>>>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org


Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

Posted by Jeremiah D Jordan <je...@gmail.com>.
We already have many interfaces similar to these for Compaction Strategy, Indexing, Query Handler.  I would hope that commiters are already following a policy along the lines of trunk -> anything goes, not trunk -> try not to change these interfaces.  I would expect that to be the same policy for any new internal interfaces that are added.  But given we already have many such interfaces, I see no reason to block adding more of them while change policies are discussed.

-Jeremiah

> On Nov 9, 2021, at 10:44 AM, David Capwell <dc...@apple.com.INVALID> wrote:
> 
> I still have one outstanding comment, but this is a comment for several of the CEPs being worked on
> 
>> And last comment, which I have also done in the other modularity thread… backwards compatibility and maintenance. It is not clear right now what java interfaces may not break and how we can maintain and extend such interfaces in the future.  If the goal is to allow 3rd parties to plugin and offer new SSTable formats, are we as a project ok with having a minor release do a binary or source non-compatible change?  If not how do we detect this?  Until this problem is solved, I do not think we should add any such interfaces.
> 
> I would love some clarity on this.  Specifically, if we assume a patch author/reviewers are not familiar with the impact of changes these interfaces, what happens?  Do we have tools to block this? Do we require 3rd party authors to create massive shims to deal with every patch level version out there?  I would love more clarity on how we maintain these new pluggable interfaces.
> 
>> On Nov 9, 2021, at 4:45 AM, Branimir Lambov <bl...@apache.org> wrote:
>> 
>> Does anyone have any further comments or questions on the proposal, or are
>> we ready to  move forward to a vote?
>> 
>> Regards,
>> Branimir
>> 
>> On Tue, Nov 2, 2021 at 7:15 PM David Capwell <dc...@apple.com.invalid>
>> wrote:
>> 
>>>> I apologize I did not mention those things explicitly. All the places
>>> where
>>>> sstable files are accessed directly would have to be refactored.
>>> 
>>> Works for me
>>> 
>>>> Speaking about the implementation, one idea I was thinking about was that
>>>> the factories for formats are registered using Java's native service
>>>> loader.
>>> 
>>> I am a fan of ServiceLoader as a means of plugging in.
>>> 
>>>> I hope this explains a bit
>>> 
>>> Yep; thanks!
>>> 
>>>> On Nov 2, 2021, at 1:46 AM, Jacek Lewandowski <
>>> lewandowski.jacek@gmail.com> wrote:
>>>> 
>>>> David,
>>>> 
>>>> I apologize I did not mention those things explicitly. All the places
>>> where
>>>> sstable files are accessed directly would have to be refactored.
>>>> 
>>>> Regarding TableMetrics - currently it includes many metrics, some of them
>>>> are unrelated to sstables at all, but there are metrics which are
>>> specific
>>>> to the current sstable format, like metrics related to index summaries or
>>>> bloom filters. The created gauges query certain methods on sstable
>>> reader -
>>>> I think the only common metrics for sstables we can leave in TableMetrics
>>>> are those for which there are query methods in generic sstable interface.
>>>> Other metrics, specific to the certain sstable format should be
>>> registered
>>>> by the implementation itself.
>>>> 
>>>> Speaking about the implementation, one idea I was thinking about was that
>>>> the factories for formats are registered using Java's native service
>>>> loader. This way we could get the list of all the factories on the
>>>> classpath and call some method, like `registerMetrics` during system
>>>> initialization. That could be also implemented in static initializer in
>>> the
>>>> factory but it would make it less obvious for the implementors where such
>>>> initialization should be done.
>>>> 
>>>> I hope this explains a bit
>>>> 
>>>> Thanks,
>>>> Jacek
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>> 
>>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org


Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

Posted by David Capwell <dc...@apple.com.INVALID>.
I still have one outstanding comment, but this is a comment for several of the CEPs being worked on

> And last comment, which I have also done in the other modularity thread… backwards compatibility and maintenance. It is not clear right now what java interfaces may not break and how we can maintain and extend such interfaces in the future.  If the goal is to allow 3rd parties to plugin and offer new SSTable formats, are we as a project ok with having a minor release do a binary or source non-compatible change?  If not how do we detect this?  Until this problem is solved, I do not think we should add any such interfaces.

I would love some clarity on this.  Specifically, if we assume a patch author/reviewers are not familiar with the impact of changes these interfaces, what happens?  Do we have tools to block this? Do we require 3rd party authors to create massive shims to deal with every patch level version out there?  I would love more clarity on how we maintain these new pluggable interfaces.

> On Nov 9, 2021, at 4:45 AM, Branimir Lambov <bl...@apache.org> wrote:
> 
> Does anyone have any further comments or questions on the proposal, or are
> we ready to  move forward to a vote?
> 
> Regards,
> Branimir
> 
> On Tue, Nov 2, 2021 at 7:15 PM David Capwell <dc...@apple.com.invalid>
> wrote:
> 
>>> I apologize I did not mention those things explicitly. All the places
>> where
>>> sstable files are accessed directly would have to be refactored.
>> 
>> Works for me
>> 
>>> Speaking about the implementation, one idea I was thinking about was that
>>> the factories for formats are registered using Java's native service
>>> loader.
>> 
>> I am a fan of ServiceLoader as a means of plugging in.
>> 
>>> I hope this explains a bit
>> 
>> Yep; thanks!
>> 
>>> On Nov 2, 2021, at 1:46 AM, Jacek Lewandowski <
>> lewandowski.jacek@gmail.com> wrote:
>>> 
>>> David,
>>> 
>>> I apologize I did not mention those things explicitly. All the places
>> where
>>> sstable files are accessed directly would have to be refactored.
>>> 
>>> Regarding TableMetrics - currently it includes many metrics, some of them
>>> are unrelated to sstables at all, but there are metrics which are
>> specific
>>> to the current sstable format, like metrics related to index summaries or
>>> bloom filters. The created gauges query certain methods on sstable
>> reader -
>>> I think the only common metrics for sstables we can leave in TableMetrics
>>> are those for which there are query methods in generic sstable interface.
>>> Other metrics, specific to the certain sstable format should be
>> registered
>>> by the implementation itself.
>>> 
>>> Speaking about the implementation, one idea I was thinking about was that
>>> the factories for formats are registered using Java's native service
>>> loader. This way we could get the list of all the factories on the
>>> classpath and call some method, like `registerMetrics` during system
>>> initialization. That could be also implemented in static initializer in
>> the
>>> factory but it would make it less obvious for the implementors where such
>>> initialization should be done.
>>> 
>>> I hope this explains a bit
>>> 
>>> Thanks,
>>> Jacek
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org


Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

Posted by Branimir Lambov <bl...@apache.org>.
Does anyone have any further comments or questions on the proposal, or are
we ready to  move forward to a vote?

Regards,
Branimir

On Tue, Nov 2, 2021 at 7:15 PM David Capwell <dc...@apple.com.invalid>
wrote:

> > I apologize I did not mention those things explicitly. All the places
> where
> > sstable files are accessed directly would have to be refactored.
>
> Works for me
>
> > Speaking about the implementation, one idea I was thinking about was that
> > the factories for formats are registered using Java's native service
> > loader.
>
> I am a fan of ServiceLoader as a means of plugging in.
>
> > I hope this explains a bit
>
> Yep; thanks!
>
> > On Nov 2, 2021, at 1:46 AM, Jacek Lewandowski <
> lewandowski.jacek@gmail.com> wrote:
> >
> > David,
> >
> > I apologize I did not mention those things explicitly. All the places
> where
> > sstable files are accessed directly would have to be refactored.
> >
> > Regarding TableMetrics - currently it includes many metrics, some of them
> > are unrelated to sstables at all, but there are metrics which are
> specific
> > to the current sstable format, like metrics related to index summaries or
> > bloom filters. The created gauges query certain methods on sstable
> reader -
> > I think the only common metrics for sstables we can leave in TableMetrics
> > are those for which there are query methods in generic sstable interface.
> > Other metrics, specific to the certain sstable format should be
> registered
> > by the implementation itself.
> >
> > Speaking about the implementation, one idea I was thinking about was that
> > the factories for formats are registered using Java's native service
> > loader. This way we could get the list of all the factories on the
> > classpath and call some method, like `registerMetrics` during system
> > initialization. That could be also implemented in static initializer in
> the
> > factory but it would make it less obvious for the implementors where such
> > initialization should be done.
> >
> > I hope this explains a bit
> >
> > Thanks,
> > Jacek
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

Posted by David Capwell <dc...@apple.com.INVALID>.
> I apologize I did not mention those things explicitly. All the places where
> sstable files are accessed directly would have to be refactored.

Works for me

> Speaking about the implementation, one idea I was thinking about was that
> the factories for formats are registered using Java's native service
> loader.

I am a fan of ServiceLoader as a means of plugging in.

> I hope this explains a bit

Yep; thanks!

> On Nov 2, 2021, at 1:46 AM, Jacek Lewandowski <le...@gmail.com> wrote:
> 
> David,
> 
> I apologize I did not mention those things explicitly. All the places where
> sstable files are accessed directly would have to be refactored.
> 
> Regarding TableMetrics - currently it includes many metrics, some of them
> are unrelated to sstables at all, but there are metrics which are specific
> to the current sstable format, like metrics related to index summaries or
> bloom filters. The created gauges query certain methods on sstable reader -
> I think the only common metrics for sstables we can leave in TableMetrics
> are those for which there are query methods in generic sstable interface.
> Other metrics, specific to the certain sstable format should be registered
> by the implementation itself.
> 
> Speaking about the implementation, one idea I was thinking about was that
> the factories for formats are registered using Java's native service
> loader. This way we could get the list of all the factories on the
> classpath and call some method, like `registerMetrics` during system
> initialization. That could be also implemented in static initializer in the
> factory but it would make it less obvious for the implementors where such
> initialization should be done.
> 
> I hope this explains a bit
> 
> Thanks,
> Jacek


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org


Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

Posted by Jacek Lewandowski <le...@gmail.com>.
David,

I apologize I did not mention those things explicitly. All the places where
sstable files are accessed directly would have to be refactored.

Regarding TableMetrics - currently it includes many metrics, some of them
are unrelated to sstables at all, but there are metrics which are specific
to the current sstable format, like metrics related to index summaries or
bloom filters. The created gauges query certain methods on sstable reader -
I think the only common metrics for sstables we can leave in TableMetrics
are those for which there are query methods in generic sstable interface.
Other metrics, specific to the certain sstable format should be registered
by the implementation itself.

Speaking about the implementation, one idea I was thinking about was that
the factories for formats are registered using Java's native service
loader. This way we could get the list of all the factories on the
classpath and call some method, like `registerMetrics` during system
initialization. That could be also implemented in static initializer in the
factory but it would make it less obvious for the implementors where such
initialization should be done.

I hope this explains a bit

Thanks,
Jacek

Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

Posted by David Capwell <dc...@apple.com.INVALID>.
Reading the CEP I don’t see any mention to the systems which access SSTables; such as streaming (small callout to zero-copy-streaming with ZeroCopyBigTableWriter) and repair.  If you are abstracting out BigTableReader then you are not dealing with the implementation assumptions that users of SSTables have (such as direct mutation to auxiliary files outside of -Data.db).

> Audience
> 	• Cassandra developers who wish to see SSTableReader and SSTableWriter more modular than they are today,

This statement relates to the above comment, many parts of the code do not use Reader/Writer but instead use direct format knowledge to apply changes to the file format (normally outside of -Data.db); to me the interfaces has to be at the SSTable level, which then expose readers/writers, but also has to expose the other things we do outside of those paths.  

> 	• move the metrics related to sstable format out from TableMetrics class and make them tied to certain sstable implementation

I am curious about this comment, are you removing exposing this information?

> 	• have a single factory for creating both readers and writers for particular implementation of sstable and use it consistently - no direct creation of any reader / writer

I am -1 here, for the reasons listed above; the problem (in my eye) is not reader/writer but higher level at the actual SSTable.  If we plug out read/write but still allow direct file access, then these abstractions fail to provide the goals of the CEP.

I am +1 to the intent of the CEP.

And last comment, which I have also done in the other modularity thread… backwards compatibility and maintenance. It is not clear right now what java interfaces may not break and how we can maintain and extend such interfaces in the future.  If the goal is to allow 3rd parties to plugin and offer new SSTable formats, are we as a project ok with having a minor release do a binary or source non-compatible change?  If not how do we detect this?  Until this problem is solved, I do not think we should add any such interfaces.

> On Oct 22, 2021, at 7:23 AM, Jeremiah Jordan <je...@gmail.com> wrote:
> 
> Hi Stefan,
> That idea is not related to this CEP which is about the file formats of the
> sstables, not file system access.  But you should take a look at the work
> recently committed in https://issues.apache.org/jira/browse/CASSANDRA-16926
> to switch to using java.nio.file.Path for file access.  This should allow
> the use of a file system provider to access files which could be the basis
> for work to load the files from S3.
> 
> -Jeremiah
> 
> On Fri, Oct 22, 2021 at 4:07 AM Stefan Miklosovic <
> stefan.miklosovic@instaclustr.com> wrote:
> 
>> One point I would like to add to this; I was already looking into how
>> to extend this but what I saw in SSTableReader was that it is very
>> much "file system oriented". There was not any possibility to actually
>> hook something like that there. I think what importing does is that it
>> will use SSTableReader / Writer stuff so I think that the modification
>> of these classes to accommodate this idea would be necessary.
>> 
>> On Fri, 22 Oct 2021 at 11:02, Stefan Miklosovic
>> <st...@instaclustr.com> wrote:
>>> 
>>> Hi Jacek,
>>> 
>>> Thanks for taking the lead on this.
>>> 
>>> There was importing of SSTables introduced in 4.0 via
>>> StorageService#importNewSSTables. The "problem" with this is that
>>> SSTables need to be physically located at disk so Cassandra can read
>>> them. If a backup is taken and SSTables are uploaded to, for example,
>>> S3 bucket, then upon restore, all these SSTables need to be downloaded
>>> first and then imported. What about downloading them / importing them
>>> directly from S3? Or any custom source for that matter? Importing of
>>> SSTables is a very nice feature in 4.0, we do not need to copy / hard
>>> link / refresh, it is all handled internally.
>>> 
>>> I am not sure if your work is related to this idea but I would
>>> appreciate it if this is pluggable as well for the sake of simplicity
>>> and effectiveness as we would not have to download all sstables before
>>> importing them.
>>> 
>>> If it is not related, feel free to skip that completely and I guess I
>>> would have to try to push that forward myself.
>>> 
>>> Regards
>>> 
>>> 
>>> On Fri, 22 Oct 2021 at 10:24, Jacek Lewandowski
>>> <le...@gmail.com> wrote:
>>>> 
>>>> I'd like to start a discussion about SSTable format API proposal
>> (CEP-17)
>>>> 
>>>> Jira: https://issues.apache.org/jira/browse/CASSANDRA-17056
>>>> CEP:
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-17%3A+SSTable+format+API
>>>> 
>>>> Thanks,
>>>> Jacek
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org


Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

Posted by Jeremiah Jordan <je...@gmail.com>.
Hi Stefan,
That idea is not related to this CEP which is about the file formats of the
sstables, not file system access.  But you should take a look at the work
recently committed in https://issues.apache.org/jira/browse/CASSANDRA-16926
to switch to using java.nio.file.Path for file access.  This should allow
the use of a file system provider to access files which could be the basis
for work to load the files from S3.

-Jeremiah

On Fri, Oct 22, 2021 at 4:07 AM Stefan Miklosovic <
stefan.miklosovic@instaclustr.com> wrote:

> One point I would like to add to this; I was already looking into how
> to extend this but what I saw in SSTableReader was that it is very
> much "file system oriented". There was not any possibility to actually
> hook something like that there. I think what importing does is that it
> will use SSTableReader / Writer stuff so I think that the modification
> of these classes to accommodate this idea would be necessary.
>
> On Fri, 22 Oct 2021 at 11:02, Stefan Miklosovic
> <st...@instaclustr.com> wrote:
> >
> > Hi Jacek,
> >
> > Thanks for taking the lead on this.
> >
> > There was importing of SSTables introduced in 4.0 via
> > StorageService#importNewSSTables. The "problem" with this is that
> > SSTables need to be physically located at disk so Cassandra can read
> > them. If a backup is taken and SSTables are uploaded to, for example,
> > S3 bucket, then upon restore, all these SSTables need to be downloaded
> > first and then imported. What about downloading them / importing them
> > directly from S3? Or any custom source for that matter? Importing of
> > SSTables is a very nice feature in 4.0, we do not need to copy / hard
> > link / refresh, it is all handled internally.
> >
> > I am not sure if your work is related to this idea but I would
> > appreciate it if this is pluggable as well for the sake of simplicity
> > and effectiveness as we would not have to download all sstables before
> > importing them.
> >
> > If it is not related, feel free to skip that completely and I guess I
> > would have to try to push that forward myself.
> >
> > Regards
> >
> >
> > On Fri, 22 Oct 2021 at 10:24, Jacek Lewandowski
> > <le...@gmail.com> wrote:
> > >
> > > I'd like to start a discussion about SSTable format API proposal
> (CEP-17)
> > >
> > > Jira: https://issues.apache.org/jira/browse/CASSANDRA-17056
> > > CEP:
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-17%3A+SSTable+format+API
> > >
> > > Thanks,
> > > Jacek
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > For additional commands, e-mail: dev-help@cassandra.apache.org
> > >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

Posted by Stefan Miklosovic <st...@instaclustr.com>.
One point I would like to add to this; I was already looking into how
to extend this but what I saw in SSTableReader was that it is very
much "file system oriented". There was not any possibility to actually
hook something like that there. I think what importing does is that it
will use SSTableReader / Writer stuff so I think that the modification
of these classes to accommodate this idea would be necessary.

On Fri, 22 Oct 2021 at 11:02, Stefan Miklosovic
<st...@instaclustr.com> wrote:
>
> Hi Jacek,
>
> Thanks for taking the lead on this.
>
> There was importing of SSTables introduced in 4.0 via
> StorageService#importNewSSTables. The "problem" with this is that
> SSTables need to be physically located at disk so Cassandra can read
> them. If a backup is taken and SSTables are uploaded to, for example,
> S3 bucket, then upon restore, all these SSTables need to be downloaded
> first and then imported. What about downloading them / importing them
> directly from S3? Or any custom source for that matter? Importing of
> SSTables is a very nice feature in 4.0, we do not need to copy / hard
> link / refresh, it is all handled internally.
>
> I am not sure if your work is related to this idea but I would
> appreciate it if this is pluggable as well for the sake of simplicity
> and effectiveness as we would not have to download all sstables before
> importing them.
>
> If it is not related, feel free to skip that completely and I guess I
> would have to try to push that forward myself.
>
> Regards
>
>
> On Fri, 22 Oct 2021 at 10:24, Jacek Lewandowski
> <le...@gmail.com> wrote:
> >
> > I'd like to start a discussion about SSTable format API proposal (CEP-17)
> >
> > Jira: https://issues.apache.org/jira/browse/CASSANDRA-17056
> > CEP: https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-17%3A+SSTable+format+API
> >
> > Thanks,
> > Jacek
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org


Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

Posted by Stefan Miklosovic <st...@instaclustr.com>.
Hi Jacek,

Thanks for taking the lead on this.

There was importing of SSTables introduced in 4.0 via
StorageService#importNewSSTables. The "problem" with this is that
SSTables need to be physically located at disk so Cassandra can read
them. If a backup is taken and SSTables are uploaded to, for example,
S3 bucket, then upon restore, all these SSTables need to be downloaded
first and then imported. What about downloading them / importing them
directly from S3? Or any custom source for that matter? Importing of
SSTables is a very nice feature in 4.0, we do not need to copy / hard
link / refresh, it is all handled internally.

I am not sure if your work is related to this idea but I would
appreciate it if this is pluggable as well for the sake of simplicity
and effectiveness as we would not have to download all sstables before
importing them.

If it is not related, feel free to skip that completely and I guess I
would have to try to push that forward myself.

Regards


On Fri, 22 Oct 2021 at 10:24, Jacek Lewandowski
<le...@gmail.com> wrote:
>
> I'd like to start a discussion about SSTable format API proposal (CEP-17)
>
> Jira: https://issues.apache.org/jira/browse/CASSANDRA-17056
> CEP: https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-17%3A+SSTable+format+API
>
> Thanks,
> Jacek
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org