You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cassandra.apache.org by Branimir Lambov <bl...@apache.org> on 2021/11/01 16:23:15 UTC

Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

As Jacek is not a committer, this proposal needs a shepherd. I would be
happy to take this role.

> to me the interfaces has to be at the SSTable level, which then expose
readers/writers, but also has to expose the other things we do outside of
those paths

Could you give some detail on what these things are? Are they something
different from what the standalone Cassandra tools (scrub/verify/upgrade)
are currently doing? Obviously, any pluggability proposal will have to
provide a solution to these, and it would be helpful to know what needs to
be done beyond making sure the bundled tools work correctly (which includes
iterating indexes; format-specific operations (e.g. index summary
redistribution) are excluded as they are to be handled by the individual
format).

There is another problem in the current code alluded to in the question, in
the fact that "SSTableReader" (tied to the sstable format and ready for
querying data (i.e. with open data files and bloom filters loaded in
memory)) is the only concept that the code uses to work with sstables. As I
understand it, this proposal does not aim to solve that problem, only to
make sure that we can properly read and write sstables of a given format,
including in streaming and standalone tools. In other words, to provide the
machinery to convert sstable descriptors into sstable readers and writers.

I see this as an expansion of CASSANDRA-7443 and cleanup of any changes
that came after it and broke the intended capability.

Regards,
Branimir

On Thu, Oct 28, 2021 at 7:43 PM David Capwell <dc...@apple.com.invalid>
wrote:

> Sorry about that; used -1/+1 to show preference, not binding action
>
> > On Oct 28, 2021, at 5:50 AM, benedict@apache.org wrote:
> >
> >> I am -1 here, for the reasons listed above; the problem (in my eye) is
> not reader/writer but higher level at the actual SSTable.  If we plug out
> read/write but still allow direct file access, then these abstractions fail
> to provide the goals of the CEP.
> >
> > Be careful dropping -1s, as your -1s here are binding. I realise this
> isn’t a vote thread, but the effect is the same. IMO we should try to
> express our preferences and defer to the collective opinion where possible.
> True -1s should very rarely appear.
> >
> >
> > From: David Capwell <dc...@apple.com.INVALID>
> > Date: Wednesday, 27 October 2021 at 15:33
> > To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> > Subject: Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)
> > Reading the CEP I don’t see any mention to the systems which access
> SSTables; such as streaming (small callout to zero-copy-streaming with
> ZeroCopyBigTableWriter) and repair.  If you are abstracting out
> BigTableReader then you are not dealing with the implementation assumptions
> that users of SSTables have (such as direct mutation to auxiliary files
> outside of -Data.db).
> >
> >> Audience
> >>       • Cassandra developers who wish to see SSTableReader and
> SSTableWriter more modular than they are today,
> >
> > This statement relates to the above comment, many parts of the code do
> not use Reader/Writer but instead use direct format knowledge to apply
> changes to the file format (normally outside of -Data.db); to me the
> interfaces has to be at the SSTable level, which then expose
> readers/writers, but also has to expose the other things we do outside of
> those paths.
> >
> >>       • move the metrics related to sstable format out from
> TableMetrics class and make them tied to certain sstable implementation
> >
> > I am curious about this comment, are you removing exposing this
> information?
> >
> >>       • have a single factory for creating both readers and writers for
> particular implementation of sstable and use it consistently - no direct
> creation of any reader / writer
> >
> > I am -1 here, for the reasons listed above; the problem (in my eye) is
> not reader/writer but higher level at the actual SSTable.  If we plug out
> read/write but still allow direct file access, then these abstractions fail
> to provide the goals of the CEP.
> >
> > I am +1 to the intent of the CEP.
> >
> > And last comment, which I have also done in the other modularity thread…
> backwards compatibility and maintenance. It is not clear right now what
> java interfaces may not break and how we can maintain and extend such
> interfaces in the future.  If the goal is to allow 3rd parties to plugin
> and offer new SSTable formats, are we as a project ok with having a minor
> release do a binary or source non-compatible change?  If not how do we
> detect this?  Until this problem is solved, I do not think we should add
> any such interfaces.
> >
> >> On Oct 22, 2021, at 7:23 AM, Jeremiah Jordan <je...@gmail.com>
> wrote:
> >>
> >> Hi Stefan,
> >> That idea is not related to this CEP which is about the file formats of
> the
> >> sstables, not file system access.  But you should take a look at the
> work
> >> recently committed in
> https://issues.apache.org/jira/browse/CASSANDRA-16926
> >> to switch to using java.nio.file.Path for file access.  This should
> allow
> >> the use of a file system provider to access files which could be the
> basis
> >> for work to load the files from S3.
> >>
> >> -Jeremiah
> >>
> >> On Fri, Oct 22, 2021 at 4:07 AM Stefan Miklosovic <
> >> stefan.miklosovic@instaclustr.com> wrote:
> >>
> >>> One point I would like to add to this; I was already looking into how
> >>> to extend this but what I saw in SSTableReader was that it is very
> >>> much "file system oriented". There was not any possibility to actually
> >>> hook something like that there. I think what importing does is that it
> >>> will use SSTableReader / Writer stuff so I think that the modification
> >>> of these classes to accommodate this idea would be necessary.
> >>>
> >>> On Fri, 22 Oct 2021 at 11:02, Stefan Miklosovic
> >>> <st...@instaclustr.com> wrote:
> >>>>
> >>>> Hi Jacek,
> >>>>
> >>>> Thanks for taking the lead on this.
> >>>>
> >>>> There was importing of SSTables introduced in 4.0 via
> >>>> StorageService#importNewSSTables. The "problem" with this is that
> >>>> SSTables need to be physically located at disk so Cassandra can read
> >>>> them. If a backup is taken and SSTables are uploaded to, for example,
> >>>> S3 bucket, then upon restore, all these SSTables need to be downloaded
> >>>> first and then imported. What about downloading them / importing them
> >>>> directly from S3? Or any custom source for that matter? Importing of
> >>>> SSTables is a very nice feature in 4.0, we do not need to copy / hard
> >>>> link / refresh, it is all handled internally.
> >>>>
> >>>> I am not sure if your work is related to this idea but I would
> >>>> appreciate it if this is pluggable as well for the sake of simplicity
> >>>> and effectiveness as we would not have to download all sstables before
> >>>> importing them.
> >>>>
> >>>> If it is not related, feel free to skip that completely and I guess I
> >>>> would have to try to push that forward myself.
> >>>>
> >>>> Regards
> >>>>
> >>>>
> >>>> On Fri, 22 Oct 2021 at 10:24, Jacek Lewandowski
> >>>> <le...@gmail.com> wrote:
> >>>>>
> >>>>> I'd like to start a discussion about SSTable format API proposal
> >>> (CEP-17)
> >>>>>
> >>>>> Jira: https://issues.apache.org/jira/browse/CASSANDRA-17056
> >>>>> CEP:
> >>>
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-17%3A+SSTable+format+API
> >>>>>
> >>>>> Thanks,
> >>>>> Jacek
> >>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>
> >>>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)

Posted by David Capwell <dc...@apple.com.INVALID>.
Inline

> On Nov 1, 2021, at 9:23 AM, Branimir Lambov <bl...@apache.org> wrote:
> 
> As Jacek is not a committer, this proposal needs a shepherd. I would be
> happy to take this role.
> 
>> to me the interfaces has to be at the SSTable level, which then expose
> readers/writers, but also has to expose the other things we do outside of
> those paths
> 
> Could you give some detail on what these things are? Are they something
> different from what the standalone Cassandra tools (scrub/verify/upgrade)
> are currently doing? Obviously, any pluggability proposal will have to
> provide a solution to these, and it would be helpful to know what needs to
> be done beyond making sure the bundled tools work correctly (which includes
> iterating indexes; format-specific operations (e.g. index summary
> redistribution) are excluded as they are to be handled by the individual
> format).

Looking closer at compaction and repair I had forgotten that they were changed in CASSANDRA-15861 to go through the reader interface rather than directly mutate the files (concurrency bug).  I was thinking the logic which is now org.apache.cassandra.io.sstable.format.SSTableReader#mutateLevelAndReload and org.apache.cassandra.io.sstable.format.SSTableReader#mutateRepairedAndReload; so I believe compaction/repair may be ok with reader/writer; ignore those examples.

Checking usage of descriptor you find examples like

org.apache.cassandra.db.streaming.CassandraEntireSSTableStreamReader#read - which calls: writer.descriptor.getMetadataSerializer().mutate(writer.descriptor, description, transform);
org.apache.cassandra.tools.Util#metadataFromSSTable - which is used by sstablemetadata tool
org.apache.cassandra.io.sstable.KeyIterator#KeyIterator - directly loads primary index from descriptor: new In(new File(desc.filenameFor(Component.PRIMARY_INDEX)));

Non of the examples I see couldn’t be rewritten to use read/writer; so relying on reader/writer as the main interfaces would work.

> 
> There is another problem in the current code alluded to in the question, in
> the fact that "SSTableReader" (tied to the sstable format and ready for
> querying data (i.e. with open data files and bloom filters loaded in
> memory)) is the only concept that the code uses to work with sstables. As I
> understand it, this proposal does not aim to solve that problem, only to
> make sure that we can properly read and write sstables of a given format,
> including in streaming and standalone tools. In other words, to provide the
> machinery to convert sstable descriptors into sstable readers and writers.
> 
> I see this as an expansion of CASSANDRA-7443 and cleanup of any changes
> that came after it and broke the intended capability.
> 
> Regards,
> Branimir
> 
> On Thu, Oct 28, 2021 at 7:43 PM David Capwell <dc...@apple.com.invalid>
> wrote:
> 
>> Sorry about that; used -1/+1 to show preference, not binding action
>> 
>>> On Oct 28, 2021, at 5:50 AM, benedict@apache.org wrote:
>>> 
>>>> I am -1 here, for the reasons listed above; the problem (in my eye) is
>> not reader/writer but higher level at the actual SSTable.  If we plug out
>> read/write but still allow direct file access, then these abstractions fail
>> to provide the goals of the CEP.
>>> 
>>> Be careful dropping -1s, as your -1s here are binding. I realise this
>> isn’t a vote thread, but the effect is the same. IMO we should try to
>> express our preferences and defer to the collective opinion where possible.
>> True -1s should very rarely appear.
>>> 
>>> 
>>> From: David Capwell <dc...@apple.com.INVALID>
>>> Date: Wednesday, 27 October 2021 at 15:33
>>> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
>>> Subject: Re: [DISCUSS] CEP-17: SSTable format API (CASSANDRA-17056)
>>> Reading the CEP I don’t see any mention to the systems which access
>> SSTables; such as streaming (small callout to zero-copy-streaming with
>> ZeroCopyBigTableWriter) and repair.  If you are abstracting out
>> BigTableReader then you are not dealing with the implementation assumptions
>> that users of SSTables have (such as direct mutation to auxiliary files
>> outside of -Data.db).
>>> 
>>>> Audience
>>>>      • Cassandra developers who wish to see SSTableReader and
>> SSTableWriter more modular than they are today,
>>> 
>>> This statement relates to the above comment, many parts of the code do
>> not use Reader/Writer but instead use direct format knowledge to apply
>> changes to the file format (normally outside of -Data.db); to me the
>> interfaces has to be at the SSTable level, which then expose
>> readers/writers, but also has to expose the other things we do outside of
>> those paths.
>>> 
>>>>      • move the metrics related to sstable format out from
>> TableMetrics class and make them tied to certain sstable implementation
>>> 
>>> I am curious about this comment, are you removing exposing this
>> information?
>>> 
>>>>      • have a single factory for creating both readers and writers for
>> particular implementation of sstable and use it consistently - no direct
>> creation of any reader / writer
>>> 
>>> I am -1 here, for the reasons listed above; the problem (in my eye) is
>> not reader/writer but higher level at the actual SSTable.  If we plug out
>> read/write but still allow direct file access, then these abstractions fail
>> to provide the goals of the CEP.
>>> 
>>> I am +1 to the intent of the CEP.
>>> 
>>> And last comment, which I have also done in the other modularity thread…
>> backwards compatibility and maintenance. It is not clear right now what
>> java interfaces may not break and how we can maintain and extend such
>> interfaces in the future.  If the goal is to allow 3rd parties to plugin
>> and offer new SSTable formats, are we as a project ok with having a minor
>> release do a binary or source non-compatible change?  If not how do we
>> detect this?  Until this problem is solved, I do not think we should add
>> any such interfaces.
>>> 
>>>> On Oct 22, 2021, at 7:23 AM, Jeremiah Jordan <je...@gmail.com>
>> wrote:
>>>> 
>>>> Hi Stefan,
>>>> That idea is not related to this CEP which is about the file formats of
>> the
>>>> sstables, not file system access.  But you should take a look at the
>> work
>>>> recently committed in
>> https://issues.apache.org/jira/browse/CASSANDRA-16926
>>>> to switch to using java.nio.file.Path for file access.  This should
>> allow
>>>> the use of a file system provider to access files which could be the
>> basis
>>>> for work to load the files from S3.
>>>> 
>>>> -Jeremiah
>>>> 
>>>> On Fri, Oct 22, 2021 at 4:07 AM Stefan Miklosovic <
>>>> stefan.miklosovic@instaclustr.com> wrote:
>>>> 
>>>>> One point I would like to add to this; I was already looking into how
>>>>> to extend this but what I saw in SSTableReader was that it is very
>>>>> much "file system oriented". There was not any possibility to actually
>>>>> hook something like that there. I think what importing does is that it
>>>>> will use SSTableReader / Writer stuff so I think that the modification
>>>>> of these classes to accommodate this idea would be necessary.
>>>>> 
>>>>> On Fri, 22 Oct 2021 at 11:02, Stefan Miklosovic
>>>>> <st...@instaclustr.com> wrote:
>>>>>> 
>>>>>> Hi Jacek,
>>>>>> 
>>>>>> Thanks for taking the lead on this.
>>>>>> 
>>>>>> There was importing of SSTables introduced in 4.0 via
>>>>>> StorageService#importNewSSTables. The "problem" with this is that
>>>>>> SSTables need to be physically located at disk so Cassandra can read
>>>>>> them. If a backup is taken and SSTables are uploaded to, for example,
>>>>>> S3 bucket, then upon restore, all these SSTables need to be downloaded
>>>>>> first and then imported. What about downloading them / importing them
>>>>>> directly from S3? Or any custom source for that matter? Importing of
>>>>>> SSTables is a very nice feature in 4.0, we do not need to copy / hard
>>>>>> link / refresh, it is all handled internally.
>>>>>> 
>>>>>> I am not sure if your work is related to this idea but I would
>>>>>> appreciate it if this is pluggable as well for the sake of simplicity
>>>>>> and effectiveness as we would not have to download all sstables before
>>>>>> importing them.
>>>>>> 
>>>>>> If it is not related, feel free to skip that completely and I guess I
>>>>>> would have to try to push that forward myself.
>>>>>> 
>>>>>> Regards
>>>>>> 
>>>>>> 
>>>>>> On Fri, 22 Oct 2021 at 10:24, Jacek Lewandowski
>>>>>> <le...@gmail.com> wrote:
>>>>>>> 
>>>>>>> I'd like to start a discussion about SSTable format API proposal
>>>>> (CEP-17)
>>>>>>> 
>>>>>>> Jira: https://issues.apache.org/jira/browse/CASSANDRA-17056
>>>>>>> CEP:
>>>>> 
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-17%3A+SSTable+format+API
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Jacek
>>>>>>> 
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>>>> 
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>> 
>>>>> 
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org