You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cassandra.apache.org by David Capwell <dc...@gmail.com> on 2021/11/19 18:35:46 UTC

[DISCUSS] Nested YAML configs for new features

This has been brought up in a few tickets, so pushing to the dev list.

CASSANDRA-15234 - Standardise config and JVM parameters
CASSANDRA-16896 - hard/soft limits for queries
CASSANDRA-17147 - Guardrails prototype

In short, do we as a project wish to move "new features" into nested
YAML when the feature has "enough" to justify the nesting?  I would
really like to focus this discussion on new features rather than
retroactively grouping (leaving that to CASSANDRA-15234), as there is
already a place to talk about that.

To get things started, let's start with the track-warning feature
(hard/soft limits for queries), currently the configs look as follows
(assuming 15234)

track_warnings:
    enabled: true
    coordinator_read_size:
        warn_threshold: 10kb
        abort_threshold: 1mb
    local_read_size:
        warn_threshold: 10kb
        abort_threshold: 1mb
    row_index_size:
        warn_threshold: 100mb
        abort_threshold: 1gb

or should this be "flat"

track_warnings_enabled: true
track_warnings_coordinator_read_size_warn_threshold: 10kb
track_warnings_coordinator_read_size_abort_threshold: 1mb
track_warnings_local_read_size_warn_threshold: 10kb
track_warnings_local_read_size_abort_threshold: 1mb
track_warnings_row_index_size_warn_threshold: 100mb
track_warnings_row_index_size_abort_threshold: 1gb

For me I prefer nested for a few reasons
* easier to enforce consistency as the configs can use shared types;
in the track warnings patch I had mismatches cross configs (warn vs
warns, fail vs abort, etc.) before going nested, now everything reuses
the same types
* even though it is longer, things can be more clear how they are related
* parsing layer can add support for mixed or purely flat depending on
user preference (example:
track_warnings.row_index_size.abort_threshold, using the '.' notation
to represent nested structures)

Thoughts?

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Nested YAML configs for new features

Posted by Bowen Song <bo...@bso.ng.INVALID>.

Structured configuration was adopted for some options, but it is not 
used in the cluster I manage with the only exception of seeds which is 
already causing me a some headaches. If it keeps going down this road, 
managing the cluster is going to be a lot harder without proper toolkits 
(such as yq mentioned by Jacek). Until the virtual table is capable of 
querying those options, can we at least have the option of using flat 
config if the operator choose to do so?

On 24/11/2021 17:12, Joseph Lynch wrote:
> On Wed, Nov 24, 2021 at 9:00 AM Bowen Song <bo...@bso.ng.invalid> wrote:
>> Structured / nested config is easier for human eyes to read but very
>> hard for simple scripts to handle. Flat config is harder for human eyes
>> but easy for simple scripts. I can see user may prefer one over another
>> depending on their own use case. If the structured / nested config must
>> be introduced, I would like to see both syntaxes supported to allow the
>> user to make their own choice.
> To be clear, structured configuration was already adopted by Cassandra
> a long time ago and is already used successfully in the status quo
> (for example server/client encryption options, all of the pluggable
> class configurations). I believe the question was "when we are adding
> a number of related options should we structure them?". I think the
> answer is clearly yes because it makes the configuration code in the
> database a lot cleaner and allows us to leverage strongly typed
> configuration. Related configuration should continue to be grouped as
> if you were using a prefix of a dot encoded property (so {"a": {"b":
> 4}} is equivalent to "a.b: 4").
>
> There is the separate question of "how can an operator tell what
> configuration a node is running with" and for obvious reasons grepping
> cassandra.yaml is not a good public interface, we can do better via
> either virtual tables (JSON over CQL) or the sidecar (JSON over rest)
> that preserves the structured configuration rather than trying to
> flatten it.
>
> -Joey
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Nested YAML configs for new features

Posted by Jacek Lewandowski <le...@gmail.com>.

We still have yq, mentioned a couple of posts earlier which does even more
than grep, so i suppose it could satisfy both camps :)

- - -- --- ----- -------- -------------
Jacek Lewandowski


On Wed, Nov 24, 2021 at 6:13 PM Joseph Lynch <jo...@gmail.com> wrote:

> On Wed, Nov 24, 2021 at 9:00 AM Bowen Song <bo...@bso.ng.invalid> wrote:
> > Structured / nested config is easier for human eyes to read but very
> > hard for simple scripts to handle. Flat config is harder for human eyes
> > but easy for simple scripts. I can see user may prefer one over another
> > depending on their own use case. If the structured / nested config must
> > be introduced, I would like to see both syntaxes supported to allow the
> > user to make their own choice.
>
> To be clear, structured configuration was already adopted by Cassandra
> a long time ago and is already used successfully in the status quo
> (for example server/client encryption options, all of the pluggable
> class configurations). I believe the question was "when we are adding
> a number of related options should we structure them?". I think the
> answer is clearly yes because it makes the configuration code in the
> database a lot cleaner and allows us to leverage strongly typed
> configuration. Related configuration should continue to be grouped as
> if you were using a prefix of a dot encoded property (so {"a": {"b":
> 4}} is equivalent to "a.b: 4").
>
> There is the separate question of "how can an operator tell what
> configuration a node is running with" and for obvious reasons grepping
> cassandra.yaml is not a good public interface, we can do better via
> either virtual tables (JSON over CQL) or the sidecar (JSON over rest)
> that preserves the structured configuration rather than trying to
> flatten it.
>
> -Joey
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: [DISCUSS] Nested YAML configs for new features

Posted by Joseph Lynch <jo...@gmail.com>.

On Wed, Nov 24, 2021 at 9:00 AM Bowen Song <bo...@bso.ng.invalid> wrote:
> Structured / nested config is easier for human eyes to read but very
> hard for simple scripts to handle. Flat config is harder for human eyes
> but easy for simple scripts. I can see user may prefer one over another
> depending on their own use case. If the structured / nested config must
> be introduced, I would like to see both syntaxes supported to allow the
> user to make their own choice.

To be clear, structured configuration was already adopted by Cassandra
a long time ago and is already used successfully in the status quo
(for example server/client encryption options, all of the pluggable
class configurations). I believe the question was "when we are adding
a number of related options should we structure them?". I think the
answer is clearly yes because it makes the configuration code in the
database a lot cleaner and allows us to leverage strongly typed
configuration. Related configuration should continue to be grouped as
if you were using a prefix of a dot encoded property (so {"a": {"b":
4}} is equivalent to "a.b: 4").

There is the separate question of "how can an operator tell what
configuration a node is running with" and for obvious reasons grepping
cassandra.yaml is not a good public interface, we can do better via
either virtual tables (JSON over CQL) or the sidecar (JSON over rest)
that preserves the structured configuration rather than trying to
flatten it.

-Joey

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Nested YAML configs for new features

Posted by Bowen Song <bo...@bso.ng.INVALID>.

It only works if the output is for human to read. If you have a large 
number of servers, very often you want to do "grep -q ... && 
other_command" (or || other_command), or chaining the grep results frin 
parallel-ssh into another command (grep or sort). The -A/-B/-C switches 
will not work in this case. If the nested configurations have multiple 
keys with the same name (e.g.: a dictionary where the values are very 
similar dictionaries), even chaining 3 grep commands in the form of 
"grep -A ... | grep -B ... | grep -q ... " is unlikely to work.

Structured / nested config is easier for human eyes to read but very 
hard for simple scripts to handle. Flat config is harder for human eyes 
but easy for simple scripts. I can see user may prefer one over another 
depending on their own use case. If the structured / nested config must 
be introduced, I would like to see both syntaxes supported to allow the 
user to make their own choice.


On 24/11/2021 16:21, Henrik Ingo wrote:
> Grepping is an important use case, and having worked with another database
> that does nest its configs, I can offer some tips how I survived:
>
> With good old grep, it can help to use the before and after options:
>
> grep -A 5 track_warnings | grep -B 5 warn_threshold
>
> Would find you this:
>
> track_warnings:
>      enabled: true
>      coordinator_read_size:
>          warn_threshold: 10kb
>
> It would require magic expert knowledge to guess right numbers for -A and
> -B but in many cases you could just use a large number like 9999 and it
> will work in most cases.
>
> For more frequent use, you will want to just install `yq` (aka yaml query):
> https://github.com/kislyuk/yq
>
> henrik
>
>
> On Fri, Nov 19, 2021 at 9:07 PM Stefan Miklosovic <
> stefan.miklosovic@instaclustr.com> wrote:
>
>> Hi David,
>>
>> while I do not oppose nested structure, it is really handy to grep
>> cassandra.yaml on some config key and you know the value instantly.
>> This is not possible when it is nested (easily & fastly) as it is on
>> two lines. Or maybe my grepping is just not advanced enough to cover
>> this case? If it is flat, I can just grep "track_warnings" and I have
>> them all.
>>
>> Can you elaborate on your last bullet point? Parsing layer ... What do
>> you mean specifically?
>>
>> Thanks
>>
>> On Fri, 19 Nov 2021 at 19:36, David Capwell <dc...@gmail.com> wrote:
>>> This has been brought up in a few tickets, so pushing to the dev list.
>>>
>>> CASSANDRA-15234 - Standardise config and JVM parameters
>>> CASSANDRA-16896 - hard/soft limits for queries
>>> CASSANDRA-17147 - Guardrails prototype
>>>
>>> In short, do we as a project wish to move "new features" into nested
>>> YAML when the feature has "enough" to justify the nesting?  I would
>>> really like to focus this discussion on new features rather than
>>> retroactively grouping (leaving that to CASSANDRA-15234), as there is
>>> already a place to talk about that.
>>>
>>> To get things started, let's start with the track-warning feature
>>> (hard/soft limits for queries), currently the configs look as follows
>>> (assuming 15234)
>>>
>>> track_warnings:
>>>      enabled: true
>>>      coordinator_read_size:
>>>          warn_threshold: 10kb
>>>          abort_threshold: 1mb
>>>      local_read_size:
>>>          warn_threshold: 10kb
>>>          abort_threshold: 1mb
>>>      row_index_size:
>>>          warn_threshold: 100mb
>>>          abort_threshold: 1gb
>>>
>>> or should this be "flat"
>>>
>>> track_warnings_enabled: true
>>> track_warnings_coordinator_read_size_warn_threshold: 10kb
>>> track_warnings_coordinator_read_size_abort_threshold: 1mb
>>> track_warnings_local_read_size_warn_threshold: 10kb
>>> track_warnings_local_read_size_abort_threshold: 1mb
>>> track_warnings_row_index_size_warn_threshold: 100mb
>>> track_warnings_row_index_size_abort_threshold: 1gb
>>>
>>> For me I prefer nested for a few reasons
>>> * easier to enforce consistency as the configs can use shared types;
>>> in the track warnings patch I had mismatches cross configs (warn vs
>>> warns, fail vs abort, etc.) before going nested, now everything reuses
>>> the same types
>>> * even though it is longer, things can be more clear how they are related
>>> * parsing layer can add support for mixed or purely flat depending on
>>> user preference (example:
>>> track_warnings.row_index_size.abort_threshold, using the '.' notation
>>> to represent nested structures)
>>>
>>> Thoughts?
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Nested YAML configs for new features

Posted by Henrik Ingo <he...@datastax.com>.

Grepping is an important use case, and having worked with another database
that does nest its configs, I can offer some tips how I survived:

With good old grep, it can help to use the before and after options:

grep -A 5 track_warnings | grep -B 5 warn_threshold

Would find you this:

track_warnings:
    enabled: true
    coordinator_read_size:
        warn_threshold: 10kb

It would require magic expert knowledge to guess right numbers for -A and
-B but in many cases you could just use a large number like 9999 and it
will work in most cases.

For more frequent use, you will want to just install `yq` (aka yaml query):
https://github.com/kislyuk/yq

henrik


On Fri, Nov 19, 2021 at 9:07 PM Stefan Miklosovic <
stefan.miklosovic@instaclustr.com> wrote:

> Hi David,
>
> while I do not oppose nested structure, it is really handy to grep
> cassandra.yaml on some config key and you know the value instantly.
> This is not possible when it is nested (easily & fastly) as it is on
> two lines. Or maybe my grepping is just not advanced enough to cover
> this case? If it is flat, I can just grep "track_warnings" and I have
> them all.
>
> Can you elaborate on your last bullet point? Parsing layer ... What do
> you mean specifically?
>
> Thanks
>
> On Fri, 19 Nov 2021 at 19:36, David Capwell <dc...@gmail.com> wrote:
> >
> > This has been brought up in a few tickets, so pushing to the dev list.
> >
> > CASSANDRA-15234 - Standardise config and JVM parameters
> > CASSANDRA-16896 - hard/soft limits for queries
> > CASSANDRA-17147 - Guardrails prototype
> >
> > In short, do we as a project wish to move "new features" into nested
> > YAML when the feature has "enough" to justify the nesting?  I would
> > really like to focus this discussion on new features rather than
> > retroactively grouping (leaving that to CASSANDRA-15234), as there is
> > already a place to talk about that.
> >
> > To get things started, let's start with the track-warning feature
> > (hard/soft limits for queries), currently the configs look as follows
> > (assuming 15234)
> >
> > track_warnings:
> >     enabled: true
> >     coordinator_read_size:
> >         warn_threshold: 10kb
> >         abort_threshold: 1mb
> >     local_read_size:
> >         warn_threshold: 10kb
> >         abort_threshold: 1mb
> >     row_index_size:
> >         warn_threshold: 100mb
> >         abort_threshold: 1gb
> >
> > or should this be "flat"
> >
> > track_warnings_enabled: true
> > track_warnings_coordinator_read_size_warn_threshold: 10kb
> > track_warnings_coordinator_read_size_abort_threshold: 1mb
> > track_warnings_local_read_size_warn_threshold: 10kb
> > track_warnings_local_read_size_abort_threshold: 1mb
> > track_warnings_row_index_size_warn_threshold: 100mb
> > track_warnings_row_index_size_abort_threshold: 1gb
> >
> > For me I prefer nested for a few reasons
> > * easier to enforce consistency as the configs can use shared types;
> > in the track warnings patch I had mismatches cross configs (warn vs
> > warns, fail vs abort, etc.) before going nested, now everything reuses
> > the same types
> > * even though it is longer, things can be more clear how they are related
> > * parsing layer can add support for mixed or purely flat depending on
> > user preference (example:
> > track_warnings.row_index_size.abort_threshold, using the '.' notation
> > to represent nested structures)
> >
> > Thoughts?
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>

Re: [DISCUSS] Nested YAML configs for new features

Posted by "benedict@apache.org" <be...@apache.org>.

I don’t think it’s necessarily a requirement that we use the flattened version in vtables. At the very least we can make use of sets, lists, etc. But we can probably also use UDTs if this improves clarity.

From: Benjamin Lerer <bl...@apache.org>
Date: Monday, 29 November 2021 at 15:54
To: dev@cassandra.apache.org <de...@cassandra.apache.org>
Subject: Re: [DISCUSS] Nested YAML configs for new features
I do not think that supporting both options is an issue. The settings
virtual table would have to use the flattened version.
If we support both formats, the question would be: what should be the one
used by default in the configuration file?

Le ven. 26 nov. 2021 à 15:40, benedict@apache.org <be...@apache.org> a
écrit :

> This is the approach I favour for config files also. We had a much less
> engaged discussion on this topic only a few months ago, so glad to see more
> people getting involved now.
>
> I would however personally prefer to see the configuration file slowly
> deprecated (if perhaps never retired), in favour of virtual tables, so that
> operators may easily set configurations for the entire cluster. Ideally it
> would be possible to specify configuration per cluster, per DC and per
> node, with the most specific configuration applying I would like to see a
> similar hierarchy for Keyspace, Table and Per-Query options. Ideally only
> the barest minimum number of options would be necessary to supply in a
> config file, and only on first launch – seed nodes, for instance.
>
> So whatever design we employ here, we should IMO be aiming for it to be
> compatible with a CQL representation also.
>
>
> From: Bowen Song <bo...@bso.ng.INVALID>
> Date: Wednesday, 24 November 2021 at 18:15
> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> Subject: Re: [DISCUSS] Nested YAML configs for new features
> Since you mentioned ElasticSearch, I'm actually pretty happy with their
> config file syntax. It allows the user to completely flatten out the
> entire config file. To give people who isn't familiar with ElasticSearch
> an idea, here is a config file we use:
>
>     cluster.name: foobar
>
>     node.remote_cluster_client: false
>     node.name: "foo.example.com"
>     node.master: true
>     node.data: true
>     node.ingest: true
>     node.ml: false
>
>     xpack.ml.enabled: false
>     xpack.security.enabled: false
>     xpack.security.audit.enabled: false
>     xpack.watcher.enabled: false
>
>     action.auto_create_index: "+.,-*"
>
>     network.host: _global_
>
>     discovery.zen.hosts_provider: file
>     discovery.zen.minimum_master_nodes: 2
>
>     http.publish_host: "foo.example.com"
>     http.publish_port: 443
>     http.bind_host: 127.0.0.1
>
>     transport.publish_host: "bar.example.com"
>     transport.bind_host: 0.0.0.0
>
>     indices.fielddata.cache.size: 1GB
>     indices.breaker.total.use_real_memory: false
>
>     path.logs: /var/log/elasticsearch
>     path.data: /var/lib/elasticsearch/data
>
> As you can see we can use the flat (grep-able) syntax for everything.
> This is also human readable because we can group options together by
> inserting empty lines between them.
>
> The equivalent of the above in a structured syntax will be:
>
>     cluster:
>          name: foobar
>
>     node:
>          remote_cluster_client: false
>          name: "foo.example.com"
>          master: true
>          data: true
>          ingest: true
>          ml: false
>
>     xpack:
>          ml:
>              enabled: false
>          security:
>              enabled: false
>              audit:
>                  enabled: false
>          watcher:
>              enabled: false
>
>     action:
>          auto_create_index: "+.,-*"
>
>     network:
>          host: _global_
>
>     discovery:
>          zen:
>              hosts_provider: file
>              minimum_master_nodes: 2
>
>     http:
>          publish_host: "foo.example.com"
>          publish_port: 443
>          bind_host: 127.0.0.1
>
>     transport:
>          publish_host: "bar.example.com"
>          bind_host: 0.0.0.0
>
>     indices:
>          fielddata:
>              cache:
>                  size: 1GB
>     indices:
>          breaker:
>              total:
>                  use_real_memory: false
>
>     path:
>          logs: /var/log/elasticsearch
>          data: /var/lib/elasticsearch/data
>
> This may be easier to read for some people, but it is a total nightmare
> for "grep" - so many keys have identical names, such as "enabled".
>
> Also, for the virtual tables, it would be a lot easier to represent
> individual values in a virtual table when the config is flat and keys
> are unique. The virtual tables would need to either support the encoding
> and decoding of the structured config into a flat structure, or use JSON
> encoded string value. The use of JSON would make querying individual
> value much harder.
>
> On 22/11/2021 16:16, Joseph Lynch wrote:
> > Isn't one of the primary reasons to have a YAML configuration instead
> > of a properties file is to allow typed and structured (implies nested)
> > configuration? I think it makes a lot of sense to group related
> > configuration options (e.g. a feature) into a typed class when we're
> > talking about more than one or two related options.
> >
> > It's pretty standard elsewhere in the JVM ecosystem to encode YAMLs to
> > period encoded key->value pairs when required (usually when providing
> > a property or override layer), Spring and Elasticsearch yamls both
> > come to mind. It seems pretty reasonable to support dot encoding and
> > decoding, for example {"a": {"b": 12}} -> '"a.b": 12'.
> >
> > Regarding quickly telling what configuration a node is running I think
> > we should lean on virtual tables for "what is the current
> > configuration" now that we have them, as others have said the written
> > cassandra.yaml is not necessarily the current configuration ... and
> > also grep -C or -A exist for this reason.
> >
> > -Joey
> >
> > On Mon, Nov 22, 2021 at 4:14 AM Benjamin Lerer<bl...@apache.org>
> wrote:
> >> I do not have a strong opinion for one or the other but wanted to raise
> the
> >> issue I see with the "Settings" virtual table.
> >>
> >> Currently the "Settings" virtual table converts nested options into flat
> >> options using a "_" separator. For those options it allows a user to
> query
> >> the all set of options through some hack.
> >> If we decide to move to more nesting (more than one level), it seems to
> me
> >> that we need to change the way this table is behaving and how we can
> query
> >> its data.
> >>
> >> We would need to start using "." as a nesting separator to ensure that
> >> things are consistent between the configuration and the table and add
> >> support for LIKE restrictions for filtering queries to allow operators
> to
> >> be able to select the precise set of settings that the operator is
> looking
> >> for.
> >>
> >> Doing so is not really complicated in itself but might impact some
> users.
> >>
> >> Le ven. 19 nov. 2021 à 22:39, David Capwell<dc...@apple.com.invalid>
> a
> >> écrit :
> >>
> >>>> it is really handy to grep
> >>>> cassandra.yaml on some config key and you know the value instantly.
> >>> You can still do that
> >>>
> >>> $ grep -A2 coordinator_read_size conf/cassandra.yaml
> >>> #     coordinator_read_size:
> >>> #         warn_threshold_kb: 0
> >>> #         abort_threshold_kb: 0
> >>>
> >>> I was also arguing we should support nested and flat, so if your infra
> >>> works better with flat then you could use
> >>>
> >>> track_warnings.coordinator_read_size.warn_threshold_kb: 0
> >>> track_warnings.coordinator_read_size.abort_threshold_kb: 0
> >>>
> >>>> On Nov 19, 2021, at 1:34 PM, David Capwell<dc...@apple.com>
> wrote:
> >>>>
> >>>>> With the flat structure it turns into properties file - would it be
> >>>>> possible to support both formats - nested yaml and flat properties?
> >>>>
> >>>> For majority of our configs yes, but there are a subset where flat
> >>> properties is annoying
> >>>> hinted_handoff_disabled_datacenters - set type, so you could do
> >>> hinted_handoff_disabled_datacenters=“a,b,c,d” but we would need to deal
> >>> with separators as the format doesn’t support
> >>>> seed_provider.parameters - this is a map type… so would need to do
> >>> something like seed_provider.parameters=“{\”a\”: \a\”}” …. Maybe we
> special
> >>> case maps as dynamic fields?  Then seed_provider.parameters.a=a?  We
> have
> >>> ParameterizedClass all over the code
> >>>> So, as long as we define how to deal with java collections; we could
> in
> >>> theory support properties files (not arguing for that in this thread)
> as
> >>> well as system properties.
> >>>>
> >>>>> On Nov 19, 2021, at 1:22 PM, Jacek Lewandowski <
> >>> lewandowski.jacek@gmail.com> wrote:
> >>>>> With the flat structure it turns into properties file - would it be
> >>>>> possible to support both formats - nested yaml and flat properties?
> >>>>>
> >>>>>
> >>>>> - - -- --- ----- -------- -------------
> >>>>> Jacek Lewandowski
> >>>>>
> >>>>>
> >>>>> On Fri, Nov 19, 2021 at 10:08 PM Caleb Rackliffe <
> >>> calebrackliffe@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> If it's nested, "track_warnings" would still work if you're grepping
> >>> around
> >>>>>> vim or less.
> >>>>>>
> >>>>>> I'd have to concede the point about grep output, although there are
> >>> tools
> >>>>>> likehttps://github.com/kislyuk/yq  that could probably be bent to
> do
> >>> what
> >>>>>> you want.
> >>>>>>
> >>>>>> On Fri, Nov 19, 2021 at 1:08 PM Stefan Miklosovic <
> >>>>>> stefan.miklosovic@instaclustr.com> wrote:
> >>>>>>
> >>>>>>> Hi David,
> >>>>>>>
> >>>>>>> while I do not oppose nested structure, it is really handy to grep
> >>>>>>> cassandra.yaml on some config key and you know the value instantly.
> >>>>>>> This is not possible when it is nested (easily & fastly) as it is
> on
> >>>>>>> two lines. Or maybe my grepping is just not advanced enough to
> cover
> >>>>>>> this case? If it is flat, I can just grep "track_warnings" and I
> have
> >>>>>>> them all.
> >>>>>>>
> >>>>>>> Can you elaborate on your last bullet point? Parsing layer ...
> What do
> >>>>>>> you mean specifically?
> >>>>>>>
> >>>>>>> Thanks
> >>>>>>>
> >>>>>>> On Fri, 19 Nov 2021 at 19:36, David Capwell<dc...@gmail.com>
> >>> wrote:
> >>>>>>>> This has been brought up in a few tickets, so pushing to the dev
> >>> list.
> >>>>>>>> CASSANDRA-15234 - Standardise config and JVM parameters
> >>>>>>>> CASSANDRA-16896 - hard/soft limits for queries
> >>>>>>>> CASSANDRA-17147 - Guardrails prototype
> >>>>>>>>
> >>>>>>>> In short, do we as a project wish to move "new features" into
> nested
> >>>>>>>> YAML when the feature has "enough" to justify the nesting?  I
> would
> >>>>>>>> really like to focus this discussion on new features rather than
> >>>>>>>> retroactively grouping (leaving that to CASSANDRA-15234), as
> there is
> >>>>>>>> already a place to talk about that.
> >>>>>>>>
> >>>>>>>> To get things started, let's start with the track-warning feature
> >>>>>>>> (hard/soft limits for queries), currently the configs look as
> follows
> >>>>>>>> (assuming 15234)
> >>>>>>>>
> >>>>>>>> track_warnings:
> >>>>>>>>    enabled: true
> >>>>>>>>    coordinator_read_size:
> >>>>>>>>        warn_threshold: 10kb
> >>>>>>>>        abort_threshold: 1mb
> >>>>>>>>    local_read_size:
> >>>>>>>>        warn_threshold: 10kb
> >>>>>>>>        abort_threshold: 1mb
> >>>>>>>>    row_index_size:
> >>>>>>>>        warn_threshold: 100mb
> >>>>>>>>        abort_threshold: 1gb
> >>>>>>>>
> >>>>>>>> or should this be "flat"
> >>>>>>>>
> >>>>>>>> track_warnings_enabled: true
> >>>>>>>> track_warnings_coordinator_read_size_warn_threshold: 10kb
> >>>>>>>> track_warnings_coordinator_read_size_abort_threshold: 1mb
> >>>>>>>> track_warnings_local_read_size_warn_threshold: 10kb
> >>>>>>>> track_warnings_local_read_size_abort_threshold: 1mb
> >>>>>>>> track_warnings_row_index_size_warn_threshold: 100mb
> >>>>>>>> track_warnings_row_index_size_abort_threshold: 1gb
> >>>>>>>>
> >>>>>>>> For me I prefer nested for a few reasons
> >>>>>>>> * easier to enforce consistency as the configs can use shared
> types;
> >>>>>>>> in the track warnings patch I had mismatches cross configs (warn
> vs
> >>>>>>>> warns, fail vs abort, etc.) before going nested, now everything
> >>> reuses
> >>>>>>>> the same types
> >>>>>>>> * even though it is longer, things can be more clear how they are
> >>>>>> related
> >>>>>>>> * parsing layer can add support for mixed or purely flat
> depending on
> >>>>>>>> user preference (example:
> >>>>>>>> track_warnings.row_index_size.abort_threshold, using the '.'
> notation
> >>>>>>>> to represent nested structures)
> >>>>>>>>
> >>>>>>>> Thoughts?
> >>>>>>>>
> >>>>>>>>
> ---------------------------------------------------------------------
> >>>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
> >>>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
> >>>>>>>>
> >>>>>>>
> ---------------------------------------------------------------------
> >>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
> >>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
> >>>>>>>
> >>>>>>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
> >>> For additional commands, e-mail:dev-help@cassandra.apache.org
> >>>
> >>>
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail:dev-help@cassandra.apache.org
> >
>

Re: [DISCUSS] Nested YAML configs for new features

Posted by Ekaterina Dimitrova <e....@gmail.com>.

 Please find my comments inline below

* Good with nested configs - the question is how they will be introduced
and maintained? I wouldn't advocate for maintaining more than yaml file but
probably as you once mentioned some time ago (if I remember correctly) -
having one format as default and just documenting the support for the other
one/ones. Now which is the default one is a different topic.
* Where/How we group is an open question, maybe we move this to a JIRA as
follow up work to CASSANDRA-15234? - not part of CASSANDRA-15234 as per all
the discussions, already in review (thank you for your first quick round
btw, appreciate it!)

Spoke with Ekaterina about this, and not solved in 15234; let's move this
to a follow up JIRA for 15234? - For the broader audience, currently what I
solve around naming in CASSANDRA-15234 is removing the unit suffix and
moving to the format noun_verb the config parameters names. After all
discussions and realizing the great interest and variety of opinions, I
tried really to split more tickets from CASSANDRA-15234 and to keep
primarily the new custom types and the new framework with backward
compatibility as the main body of work, good also for the reviewers. Last
year I came up with the idea of reorganizing the config file a bit which
led to discussions. So considering my previous point about splitting to a
more incremental approach considering the variety of opinions, I suggested
when submitting for review to open a new ticket for that new organization
of our config file. Probably we can add the abort/fail and any other
similar concerns/questions there post CASSANDRA-15234?


On Fri, 3 Dec 2021 at 13:34, David Capwell <dc...@apple.com.invalid>
wrote:

> Thanks everyone for the feedback!  If I am reading this properly I am
> seeing the following
>
> * Good with nested configs
> * Good with YAML layer supporting flat structure (possible foo.bar.baz for
> the path foo: {bar: {baz: 42}}), how this relates with Settings table
> should be resolved, but there is a open ticket for this (enhance our YAML
> CASSANDRA-17166, and support updates to Settings vtable CASSANDRA-15254)
> * Where/How we group is an open question, maybe we move this to a JIRA as
> follow up work to CASSANDRA-15234?
>
> > We’re also mixing terminology already, with limits/thresholds and
> fail/abort.
>
> Spoke with Ekaterina about this, and not solved in 15234; lets move this
> to a follow up JIRA for 15234?
>
> > On Nov 30, 2021, at 6:08 AM, Ekaterina Dimitrova <e....@gmail.com>
> wrote:
> >
> > Thank you for confirming as I misread your email at first :-)
> > I had a chat with David last week and I don’t think his plan is reworking
> > of 15234 but incremental improvements on top of it.
> > Regarding config, after spending time cleaning around and looking more
> into
> > detail my only appeal is:
> > - Centralized management and not 5 places to change things when you add
> new
> > config so we are less error-prone
> > - Documenting things for people who add new config or for our users (I
> > promised and I will do it for 15234 but it will be good to continue doing
> > it with any further changes down the road)
> > - be careful with breaking changes
> >
> > Thank you
> > Ekaterina
> >
> > On Tue, 30 Nov 2021 at 8:59, benedict@apache.org <be...@apache.org>
> > wrote:
> >
> >> I mean that it has been waiting for months, is ready to go, and I don’t
> >> want to hold you up any longer.
> >>
> >> From: Ekaterina Dimitrova <e....@gmail.com>
> >> Date: Tuesday, 30 November 2021 at 13:44
> >> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> >> Subject: Re: [DISCUSS] Nested YAML configs for new features
> >> “
> >> IMO 15234 has sailed – it’s been held up for a long time, and was
> brought
> >> to this list for discussion with no engagement. Ekaterina is long
> overdue
> >> being able to commit her work. “
> >>
> >>
> >> Sailed? I submitted the patch a week ago for review. Not sure how to
> >> understand this statement. Can elaborate, please?
> >>
> >> On Tue, 30 Nov 2021 at 8:09, benedict@apache.org <be...@apache.org>
> >> wrote:
> >>
> >>> The problem with scoping this to “features” is that we end up with at
> >> best
> >>> local coherence. The config file as a whole will end up just as
> >> incoherent
> >>> through its design evolution as it has historically.
> >>>
> >>> If you take a look at my proposed layout for the overall config, there
> is
> >>> a “limits” section that specifies thresholds for reporting warnings and
> >>> errors for various scenario. In this case, we probably don’t also want
> >>> per-feature limits? We’re also mixing terminology already, with
> >>> limits/thresholds and fail/abort.
> >>>
> >>> It’s a lot of work to come up with a coherent and intuitive config
> >> layout.
> >>> We probably want to at least create some documentation in-tree
> >> stipulating
> >>> terminology with respect to plurals, verbs/nouns, and specific terms
> >>> (period, abort, limit, datacenter vs dc, etc), but ideally we would
> have
> >> a
> >>> common end goal for the config file.
> >>>
> >>>> leave non-features to CASSANDRA-15234
> >>>
> >>> IMO 15234 has sailed – it’s been held up for a long time, and was
> brought
> >>> to this list for discussion with no engagement. Ekaterina is long
> overdue
> >>> being able to commit her work.
> >>>
> >>>
> >>> From: David Capwell <dc...@apple.com.INVALID>
> >>> Date: Monday, 29 November 2021 at 23:44
> >>> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> >>> Subject: Re: [DISCUSS] Nested YAML configs for new features
> >>>> but I would hate to repeat the mistakes of our past by evolving the
> >>> config in a new direction without any coherent overarching design.
> >>>
> >>> At the start I asked to keep the thread local to new features, but to
> >> more
> >>> flesh out an “overarching design” maybe we should increase the
> “desired”
> >>> scope to be “feature” (and leave non-features to CASSANDRA-15234 -
> >>> Standardise config and JVM parameters)?  Aka, do we think the following
> >> is
> >>> more ideal (configs scoped to a feature)
> >>>
> >>> hinted_handoff:
> >>>  enabled: true
> >>>  disabled_datacenters:
> >>>    - DC1
> >>>    - DC2
> >>>  max_window: 3h
> >>>  flush_period: 10s
> >>>  max_file_size: 128mb
> >>>  compression:
> >>>    class_name: LZ4Compressor
> >>>    parameters:
> >>>      a: b
> >>>
> >>> track_warnings:
> >>>  enabled: true
> >>>  local_read_size:
> >>>    warn_threshold: 1mb
> >>>    abort_threshold: 10mb
> >>>  coordinator_read_size:
> >>>    warn_threshold: 5mb
> >>>    abort_threshold: 20mb
> >>>
> >>>
> >>> OR
> >>>
> >>> # I had to rename hint configs as there was 0 consistent naming
> >>> hinted_handoff_enabled: true
> >>> hinted_handoff_disabled_datacenters:
> >>>  - 'DC1'
> >>>  - 'DC2'
> >>> hinted_handoff_max_window: 3h
> >>> hinted_handoff_max_file_size: 128mb
> >>> hinted_handoff_flush_period: 10s
> >>> hinted_handoff_compression:
> >>>  class_name: LZ4Compressor
> >>>  parameters:
> >>>    a: b
> >>>
> >>> track_warnings_enabled: true
> >>> track_warnings_local_read_size_warn_threshold: 1mb
> >>> track_warnings_local_read_size_abort_threshold: 10mb
> >>> track_warnings_coordinator_read_size_warn_threshold: 5mb
> >>> track_warnings_coordinator_read_size_abort_threshold: 20mb
> >>>
> >>>
> >>> The main issue I have with flat structure is that we have no way to
> >>> enforce standard naming; if you look at the hint example there were at
> >>> least 3 naming conventions (CASSANDRA-15234 is to clean this up, but
> can
> >> we
> >>> actually maintain that?).  And one of the core reasons track_warnings
> >> went
> >>> nested was that warn/abort some times became warn/fail and threshold
> some
> >>> times was thresholds…. By embracing nested structure we can actually
> >>> enforce consistency, with flat we have no way to maintain consistency.
> >>>
> >>> Additionally by embracing the nested structure we can accept a flat one
> >> as
> >>> well (PR in CASSANDRA-17166 shows this working) if users desire it; so
> we
> >>> get the consistency of nested, and the “grep” benefits of flat.
> >>>
> >>>
> >>>> On Nov 29, 2021, at 2:17 PM, benedict@apache.org wrote:
> >>>>
> >>>> If we’re thinking of moving towards nested configuration, then before
> >>> employing the approach further we would ideally consider what a fully
> >>> nested config looks like for the project. Ekaterina has done a lot to
> >> clean
> >>> up inconsistent naming, but I would hate to repeat the mistakes of our
> >> past
> >>> by evolving the config in a new direction without any coherent
> >> overarching
> >>> design.
> >>>>
> >>>> In case anyone missed it in the earlier discussion, this was my
> attempt
> >>> to prototype a nested config:
> >>>
> >>
> https://github.com/belliottsmith/cassandra/blob/5f80d1c0d38873b7a27dc137656d8b81f8e6bbd7/conf/cassandra_nocomment.yaml
> >>>>
> >>>> I don’t have any specific attachment to it, but settling on some
> >>> approximate scheme would be helpful IMO.
> >>>>
> >>>> From: David Capwell <dc...@apple.com.INVALID>
> >>>> Date: Monday, 29 November 2021 at 20:38
> >>>> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> >>>> Subject: Re: [DISCUSS] Nested YAML configs for new features
> >>>>> What should our default example cassandra.yaml file use (flat or
> >>> nested)?  Currently default shows nested
> >>>>
> >>>> Was told this statement was confusing, so trying to clarify.  At the
> >>> moment we do not allow a nested config to be expressed in any way
> outside
> >>> of nesting it (excluding YAML’s ability to inline objects), so if we
> did
> >>> allow flat config representation of nested configs, then this would be
> a
> >>> brand new feature; we currently show the nested structure in
> >> cassandra.yaml
> >>>>
> >>>>> On Nov 29, 2021, at 11:58 AM, David Capwell
> >> <dc...@apple.com.INVALID>
> >>> wrote:
> >>>>>
> >>>>> Thanks everyone for the comments, I hope below is a good summary of
> >> all
> >>> the talking points?
> >>>>>
> >>>>> We already use nested configs (networking, seed provider, commit
> >>> log/hint compression, back pressure, etc.)
> >>>>> Flat configs are easier for grep, but can be solved with grep -A/-B
> >>> and/or yq
> >>>>> It would be possible to support flat versions of our configs in
> >>> cassandra.yaml (in addition to the nested versions)
> >>>>> "Settings" vtable currently uses the "_" separator (example of
> >>> encryption/audit log).  Switching to "." Would be a change in behavior
> >>> which may impact some users
> >>>>> "." Separator for nested configs are common in other systems (yq,
> >>> elastic search, etc.)
> >>>>> "Structured / nested config is easier for human eyes to read"...
> "Flat
> >>> config is harder for human eyes but easy for simple scripts"
> >>>>> For learning what configs are enabled, cassandra.yaml isn't the best
> >>> interface as it may not reflect the actual configs; we can better
> expose
> >>> this in CQL and/or Sidecar
> >>>>> What should our default example cassandra.yaml file use (flat or
> >>> nested)?  Currently default shows nested
> >>>>> When projecting the Config into CQL, we may want to consider UDTs to
> >>> represent the complex types
> >>>>> Current limitations in CQL make nested structures hard to work with,
> >> it
> >>> may be worth wild to expand CQL support for nested structures.
> >>>>>
> >>>>> I also took a quick stab at enhancing our cassandra.yaml logic to: 1)
> >>> be reusable outside of yaml parsing, 2) support setters (we currently
> do,
> >>> but setters must be snake case… I fixed that)…, 3) support both nested
> >> and
> >>> structured, 4) support ignoring fields in a consistent way (Settings
> >> vtable
> >>> will include things SnakeYAML won’t and visa-versa).
> >>>>>
> >>>>> https://github.com/apache/cassandra/pull/1335 <
> >>> https://github.com/apache/cassandra/pull/1335><
> >> https://github.com/apache/cassandra/pull/1335%3e>.  This PR is NOT a
> final
> >>> ready to merge thing, but instead a POC to show how we can solve a lot
> of
> >>> the core problems in a consistent and reusable manner.
> >>>>>
> >>>>> The following cassandra.yaml was used to show both worlds would work
> >>> fine in the config (and compliment each other)
> >>>>>
> >>>>> track_warnings:
> >>>>> enabled: true
> >>>>> # nested relative to the local level (TrackWarnings)
> >>>>> coordinator_read_size.warn_threshold_kb: 1024
> >>>>> local_read_size.abort_threshold_kb: 1024
> >>>>> row_index_size:
> >>>>>  warn_threshold_kb: 1024
> >>>>>  abort_threshold_kb: 1024
> >>>>> # nested relative to the top level
> >>>>> track_warnings.coordinator_read_size.abort_threshold_kb: 42
> >>>>>
> >>>>> For the “Settings” vtable, a new Loader interface was added to get
> all
> >>> the properties, and Properties.flatten would turn every property into a
> >>> “flatten” version (isScalar (isPrimitive or not hasSubProperties) or
> >>> isCollection).  This doesn’t solve 100% of the issues that vtable has
> >>> (types such as Duration would need additional translation as they are
> >>> Scalar but need a translation from String -> Duration), and doesn’t
> solve
> >>> the fact the table currently uses “_”.
> >>>>>
> >>>>>> On Nov 29, 2021, at 10:11 AM, benedict@apache.org wrote:
> >>>>>>
> >>>>>> I meant to imply we should improve our UDT usability to support this
> >>> kind of querying, essentially – but that if we support a simple
> >>> text->property setup we might want to offer LIKE support so we can
> search
> >>> them (via simple filtering, not any index) – which is actually pretty
> >> easy
> >>> to provide.
> >>>>>>
> >>>>>> I think we should aim to provide users all the facilities they need
> >> to
> >>> interact with config via vtables. If the user requires external
> tooling,
> >> it
> >>> suggests a weakness in CQL that we should address, and maybe help the
> >> user
> >>> in other scenario too…
> >>>>>>
> >>>>>> From: Joseph Lynch <jo...@gmail.com>
> >>>>>> Date: Monday, 29 November 2021 at 17:32
> >>>>>> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> >>>>>> Subject: Re: [DISCUSS] Nested YAML configs for new features
> >>>>>> On Mon, Nov 29, 2021 at 11:51 AM benedict@apache.org
> >>>>>> <be...@apache.org> wrote:
> >>>>>>>
> >>>>>>> Maybe we can make our query language more expressive 😊
> >>>>>>>
> >>>>>>> We might anyway want to introduce e.g. a LIKE filtering option to
> >>> find/discover flattened config parameters?
> >>>>>>
> >>>>>> This sounds more complicated than just having the settings virtual
> >>>>>> table return text (dot encoded) -> text (json) and probably not even
> >>>>>> that much more useful. A full table scan on the settings table could
> >>>>>> return all top level keys (strings before the first dot) and if we
> >>>>>> just return a valid json string then users can bring their own
> >>>>>> querying capabilities via jq [1], or one line of code in almost any
> >>>>>> programming language (especially python, perl, etc ...).
> >>>>>>
> >>>>>> Alternatively if we want to modify the grammar it seems supporting
> >>>>>> structured data querying on text fields would maybe be more
> >> preferable
> >>>>>> to LIKE since you could get what you want without a grammar change
> >> and
> >>>>>> if we could generalize to any text column it would be amazingly
> >> useful
> >>>>>> elsewhere to users. For example, we could emulate jq's query syntax
> >> in
> >>>>>> the select which is, imo, best-in-class for quickly querying into
> >>>>>> nearest structures. Assuming a key (text) -> value (json) schema:
> >>>>>>
> >>>>>> 'a' -> "{'b': [{'c': {'d': 4}}]}",
> >>>>>>
> >>>>>> SELECT json(value).b.0.c.d FROM settings WHERE key = 'a';
> >>>>>>
> >>>>>> To have exactly jq syntax (but harder to parse) it would be:
> >>>>>>
> >>>>>> SELECT json(value).b[0].c.d FROM settings WHERE key = 'a';
> >>>>>>
> >>>>>> Since we're not indexing the structured data in any way, filtering
> >>>>>> before selection probably doesn't give us much performance
> >> improvement
> >>>>>> as we'd still have to parse the whole text field in most cases.
> >>>>>>
> >>>>>> -Joey
> >>>>>>
> >>>>>> [1] https://stedolan.github.io/jq/
> >>>>>>
> >>>>>>
> ---------------------------------------------------------------------
> >>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>>
> >>>>
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: [DISCUSS] Nested YAML configs for new features

Posted by David Capwell <dc...@apple.com.INVALID>.

Thanks everyone for the feedback!  If I am reading this properly I am seeing the following

* Good with nested configs
* Good with YAML layer supporting flat structure (possible foo.bar.baz for the path foo: {bar: {baz: 42}}), how this relates with Settings table should be resolved, but there is a open ticket for this (enhance our YAML CASSANDRA-17166, and support updates to Settings vtable CASSANDRA-15254)
* Where/How we group is an open question, maybe we move this to a JIRA as follow up work to CASSANDRA-15234?

> We’re also mixing terminology already, with limits/thresholds and fail/abort.

Spoke with Ekaterina about this, and not solved in 15234; lets move this to a follow up JIRA for 15234?

> On Nov 30, 2021, at 6:08 AM, Ekaterina Dimitrova <e....@gmail.com> wrote:
> 
> Thank you for confirming as I misread your email at first :-)
> I had a chat with David last week and I don’t think his plan is reworking
> of 15234 but incremental improvements on top of it.
> Regarding config, after spending time cleaning around and looking more into
> detail my only appeal is:
> - Centralized management and not 5 places to change things when you add new
> config so we are less error-prone
> - Documenting things for people who add new config or for our users (I
> promised and I will do it for 15234 but it will be good to continue doing
> it with any further changes down the road)
> - be careful with breaking changes
> 
> Thank you
> Ekaterina
> 
> On Tue, 30 Nov 2021 at 8:59, benedict@apache.org <be...@apache.org>
> wrote:
> 
>> I mean that it has been waiting for months, is ready to go, and I don’t
>> want to hold you up any longer.
>> 
>> From: Ekaterina Dimitrova <e....@gmail.com>
>> Date: Tuesday, 30 November 2021 at 13:44
>> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
>> Subject: Re: [DISCUSS] Nested YAML configs for new features
>> “
>> IMO 15234 has sailed – it’s been held up for a long time, and was brought
>> to this list for discussion with no engagement. Ekaterina is long overdue
>> being able to commit her work. “
>> 
>> 
>> Sailed? I submitted the patch a week ago for review. Not sure how to
>> understand this statement. Can elaborate, please?
>> 
>> On Tue, 30 Nov 2021 at 8:09, benedict@apache.org <be...@apache.org>
>> wrote:
>> 
>>> The problem with scoping this to “features” is that we end up with at
>> best
>>> local coherence. The config file as a whole will end up just as
>> incoherent
>>> through its design evolution as it has historically.
>>> 
>>> If you take a look at my proposed layout for the overall config, there is
>>> a “limits” section that specifies thresholds for reporting warnings and
>>> errors for various scenario. In this case, we probably don’t also want
>>> per-feature limits? We’re also mixing terminology already, with
>>> limits/thresholds and fail/abort.
>>> 
>>> It’s a lot of work to come up with a coherent and intuitive config
>> layout.
>>> We probably want to at least create some documentation in-tree
>> stipulating
>>> terminology with respect to plurals, verbs/nouns, and specific terms
>>> (period, abort, limit, datacenter vs dc, etc), but ideally we would have
>> a
>>> common end goal for the config file.
>>> 
>>>> leave non-features to CASSANDRA-15234
>>> 
>>> IMO 15234 has sailed – it’s been held up for a long time, and was brought
>>> to this list for discussion with no engagement. Ekaterina is long overdue
>>> being able to commit her work.
>>> 
>>> 
>>> From: David Capwell <dc...@apple.com.INVALID>
>>> Date: Monday, 29 November 2021 at 23:44
>>> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
>>> Subject: Re: [DISCUSS] Nested YAML configs for new features
>>>> but I would hate to repeat the mistakes of our past by evolving the
>>> config in a new direction without any coherent overarching design.
>>> 
>>> At the start I asked to keep the thread local to new features, but to
>> more
>>> flesh out an “overarching design” maybe we should increase the “desired”
>>> scope to be “feature” (and leave non-features to CASSANDRA-15234 -
>>> Standardise config and JVM parameters)?  Aka, do we think the following
>> is
>>> more ideal (configs scoped to a feature)
>>> 
>>> hinted_handoff:
>>>  enabled: true
>>>  disabled_datacenters:
>>>    - DC1
>>>    - DC2
>>>  max_window: 3h
>>>  flush_period: 10s
>>>  max_file_size: 128mb
>>>  compression:
>>>    class_name: LZ4Compressor
>>>    parameters:
>>>      a: b
>>> 
>>> track_warnings:
>>>  enabled: true
>>>  local_read_size:
>>>    warn_threshold: 1mb
>>>    abort_threshold: 10mb
>>>  coordinator_read_size:
>>>    warn_threshold: 5mb
>>>    abort_threshold: 20mb
>>> 
>>> 
>>> OR
>>> 
>>> # I had to rename hint configs as there was 0 consistent naming
>>> hinted_handoff_enabled: true
>>> hinted_handoff_disabled_datacenters:
>>>  - 'DC1'
>>>  - 'DC2'
>>> hinted_handoff_max_window: 3h
>>> hinted_handoff_max_file_size: 128mb
>>> hinted_handoff_flush_period: 10s
>>> hinted_handoff_compression:
>>>  class_name: LZ4Compressor
>>>  parameters:
>>>    a: b
>>> 
>>> track_warnings_enabled: true
>>> track_warnings_local_read_size_warn_threshold: 1mb
>>> track_warnings_local_read_size_abort_threshold: 10mb
>>> track_warnings_coordinator_read_size_warn_threshold: 5mb
>>> track_warnings_coordinator_read_size_abort_threshold: 20mb
>>> 
>>> 
>>> The main issue I have with flat structure is that we have no way to
>>> enforce standard naming; if you look at the hint example there were at
>>> least 3 naming conventions (CASSANDRA-15234 is to clean this up, but can
>> we
>>> actually maintain that?).  And one of the core reasons track_warnings
>> went
>>> nested was that warn/abort some times became warn/fail and threshold some
>>> times was thresholds…. By embracing nested structure we can actually
>>> enforce consistency, with flat we have no way to maintain consistency.
>>> 
>>> Additionally by embracing the nested structure we can accept a flat one
>> as
>>> well (PR in CASSANDRA-17166 shows this working) if users desire it; so we
>>> get the consistency of nested, and the “grep” benefits of flat.
>>> 
>>> 
>>>> On Nov 29, 2021, at 2:17 PM, benedict@apache.org wrote:
>>>> 
>>>> If we’re thinking of moving towards nested configuration, then before
>>> employing the approach further we would ideally consider what a fully
>>> nested config looks like for the project. Ekaterina has done a lot to
>> clean
>>> up inconsistent naming, but I would hate to repeat the mistakes of our
>> past
>>> by evolving the config in a new direction without any coherent
>> overarching
>>> design.
>>>> 
>>>> In case anyone missed it in the earlier discussion, this was my attempt
>>> to prototype a nested config:
>>> 
>> https://github.com/belliottsmith/cassandra/blob/5f80d1c0d38873b7a27dc137656d8b81f8e6bbd7/conf/cassandra_nocomment.yaml
>>>> 
>>>> I don’t have any specific attachment to it, but settling on some
>>> approximate scheme would be helpful IMO.
>>>> 
>>>> From: David Capwell <dc...@apple.com.INVALID>
>>>> Date: Monday, 29 November 2021 at 20:38
>>>> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
>>>> Subject: Re: [DISCUSS] Nested YAML configs for new features
>>>>> What should our default example cassandra.yaml file use (flat or
>>> nested)?  Currently default shows nested
>>>> 
>>>> Was told this statement was confusing, so trying to clarify.  At the
>>> moment we do not allow a nested config to be expressed in any way outside
>>> of nesting it (excluding YAML’s ability to inline objects), so if we did
>>> allow flat config representation of nested configs, then this would be a
>>> brand new feature; we currently show the nested structure in
>> cassandra.yaml
>>>> 
>>>>> On Nov 29, 2021, at 11:58 AM, David Capwell
>> <dc...@apple.com.INVALID>
>>> wrote:
>>>>> 
>>>>> Thanks everyone for the comments, I hope below is a good summary of
>> all
>>> the talking points?
>>>>> 
>>>>> We already use nested configs (networking, seed provider, commit
>>> log/hint compression, back pressure, etc.)
>>>>> Flat configs are easier for grep, but can be solved with grep -A/-B
>>> and/or yq
>>>>> It would be possible to support flat versions of our configs in
>>> cassandra.yaml (in addition to the nested versions)
>>>>> "Settings" vtable currently uses the "_" separator (example of
>>> encryption/audit log).  Switching to "." Would be a change in behavior
>>> which may impact some users
>>>>> "." Separator for nested configs are common in other systems (yq,
>>> elastic search, etc.)
>>>>> "Structured / nested config is easier for human eyes to read"... "Flat
>>> config is harder for human eyes but easy for simple scripts"
>>>>> For learning what configs are enabled, cassandra.yaml isn't the best
>>> interface as it may not reflect the actual configs; we can better expose
>>> this in CQL and/or Sidecar
>>>>> What should our default example cassandra.yaml file use (flat or
>>> nested)?  Currently default shows nested
>>>>> When projecting the Config into CQL, we may want to consider UDTs to
>>> represent the complex types
>>>>> Current limitations in CQL make nested structures hard to work with,
>> it
>>> may be worth wild to expand CQL support for nested structures.
>>>>> 
>>>>> I also took a quick stab at enhancing our cassandra.yaml logic to: 1)
>>> be reusable outside of yaml parsing, 2) support setters (we currently do,
>>> but setters must be snake case… I fixed that)…, 3) support both nested
>> and
>>> structured, 4) support ignoring fields in a consistent way (Settings
>> vtable
>>> will include things SnakeYAML won’t and visa-versa).
>>>>> 
>>>>> https://github.com/apache/cassandra/pull/1335 <
>>> https://github.com/apache/cassandra/pull/1335><
>> https://github.com/apache/cassandra/pull/1335%3e>.  This PR is NOT a final
>>> ready to merge thing, but instead a POC to show how we can solve a lot of
>>> the core problems in a consistent and reusable manner.
>>>>> 
>>>>> The following cassandra.yaml was used to show both worlds would work
>>> fine in the config (and compliment each other)
>>>>> 
>>>>> track_warnings:
>>>>> enabled: true
>>>>> # nested relative to the local level (TrackWarnings)
>>>>> coordinator_read_size.warn_threshold_kb: 1024
>>>>> local_read_size.abort_threshold_kb: 1024
>>>>> row_index_size:
>>>>>  warn_threshold_kb: 1024
>>>>>  abort_threshold_kb: 1024
>>>>> # nested relative to the top level
>>>>> track_warnings.coordinator_read_size.abort_threshold_kb: 42
>>>>> 
>>>>> For the “Settings” vtable, a new Loader interface was added to get all
>>> the properties, and Properties.flatten would turn every property into a
>>> “flatten” version (isScalar (isPrimitive or not hasSubProperties) or
>>> isCollection).  This doesn’t solve 100% of the issues that vtable has
>>> (types such as Duration would need additional translation as they are
>>> Scalar but need a translation from String -> Duration), and doesn’t solve
>>> the fact the table currently uses “_”.
>>>>> 
>>>>>> On Nov 29, 2021, at 10:11 AM, benedict@apache.org wrote:
>>>>>> 
>>>>>> I meant to imply we should improve our UDT usability to support this
>>> kind of querying, essentially – but that if we support a simple
>>> text->property setup we might want to offer LIKE support so we can search
>>> them (via simple filtering, not any index) – which is actually pretty
>> easy
>>> to provide.
>>>>>> 
>>>>>> I think we should aim to provide users all the facilities they need
>> to
>>> interact with config via vtables. If the user requires external tooling,
>> it
>>> suggests a weakness in CQL that we should address, and maybe help the
>> user
>>> in other scenario too…
>>>>>> 
>>>>>> From: Joseph Lynch <jo...@gmail.com>
>>>>>> Date: Monday, 29 November 2021 at 17:32
>>>>>> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
>>>>>> Subject: Re: [DISCUSS] Nested YAML configs for new features
>>>>>> On Mon, Nov 29, 2021 at 11:51 AM benedict@apache.org
>>>>>> <be...@apache.org> wrote:
>>>>>>> 
>>>>>>> Maybe we can make our query language more expressive 😊
>>>>>>> 
>>>>>>> We might anyway want to introduce e.g. a LIKE filtering option to
>>> find/discover flattened config parameters?
>>>>>> 
>>>>>> This sounds more complicated than just having the settings virtual
>>>>>> table return text (dot encoded) -> text (json) and probably not even
>>>>>> that much more useful. A full table scan on the settings table could
>>>>>> return all top level keys (strings before the first dot) and if we
>>>>>> just return a valid json string then users can bring their own
>>>>>> querying capabilities via jq [1], or one line of code in almost any
>>>>>> programming language (especially python, perl, etc ...).
>>>>>> 
>>>>>> Alternatively if we want to modify the grammar it seems supporting
>>>>>> structured data querying on text fields would maybe be more
>> preferable
>>>>>> to LIKE since you could get what you want without a grammar change
>> and
>>>>>> if we could generalize to any text column it would be amazingly
>> useful
>>>>>> elsewhere to users. For example, we could emulate jq's query syntax
>> in
>>>>>> the select which is, imo, best-in-class for quickly querying into
>>>>>> nearest structures. Assuming a key (text) -> value (json) schema:
>>>>>> 
>>>>>> 'a' -> "{'b': [{'c': {'d': 4}}]}",
>>>>>> 
>>>>>> SELECT json(value).b.0.c.d FROM settings WHERE key = 'a';
>>>>>> 
>>>>>> To have exactly jq syntax (but harder to parse) it would be:
>>>>>> 
>>>>>> SELECT json(value).b[0].c.d FROM settings WHERE key = 'a';
>>>>>> 
>>>>>> Since we're not indexing the structured data in any way, filtering
>>>>>> before selection probably doesn't give us much performance
>> improvement
>>>>>> as we'd still have to parse the whole text field in most cases.
>>>>>> 
>>>>>> -Joey
>>>>>> 
>>>>>> [1] https://stedolan.github.io/jq/
>>>>>> 
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>> 
>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Nested YAML configs for new features

Posted by Ekaterina Dimitrova <e....@gmail.com>.

Thank you for confirming as I misread your email at first :-)
I had a chat with David last week and I don’t think his plan is reworking
of 15234 but incremental improvements on top of it.
Regarding config, after spending time cleaning around and looking more into
detail my only appeal is:
- Centralized management and not 5 places to change things when you add new
config so we are less error-prone
- Documenting things for people who add new config or for our users (I
promised and I will do it for 15234 but it will be good to continue doing
 it with any further changes down the road)
- be careful with breaking changes

Thank you
Ekaterina

On Tue, 30 Nov 2021 at 8:59, benedict@apache.org <be...@apache.org>
wrote:

> I mean that it has been waiting for months, is ready to go, and I don’t
> want to hold you up any longer.
>
> From: Ekaterina Dimitrova <e....@gmail.com>
> Date: Tuesday, 30 November 2021 at 13:44
> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> Subject: Re: [DISCUSS] Nested YAML configs for new features
> “
> IMO 15234 has sailed – it’s been held up for a long time, and was brought
> to this list for discussion with no engagement. Ekaterina is long overdue
> being able to commit her work. “
>
>
>  Sailed? I submitted the patch a week ago for review. Not sure how to
> understand this statement. Can elaborate, please?
>
> On Tue, 30 Nov 2021 at 8:09, benedict@apache.org <be...@apache.org>
> wrote:
>
> > The problem with scoping this to “features” is that we end up with at
> best
> > local coherence. The config file as a whole will end up just as
> incoherent
> > through its design evolution as it has historically.
> >
> > If you take a look at my proposed layout for the overall config, there is
> > a “limits” section that specifies thresholds for reporting warnings and
> > errors for various scenario. In this case, we probably don’t also want
> > per-feature limits? We’re also mixing terminology already, with
> > limits/thresholds and fail/abort.
> >
> > It’s a lot of work to come up with a coherent and intuitive config
> layout.
> > We probably want to at least create some documentation in-tree
> stipulating
> > terminology with respect to plurals, verbs/nouns, and specific terms
> > (period, abort, limit, datacenter vs dc, etc), but ideally we would have
> a
> > common end goal for the config file.
> >
> > > leave non-features to CASSANDRA-15234
> >
> > IMO 15234 has sailed – it’s been held up for a long time, and was brought
> > to this list for discussion with no engagement. Ekaterina is long overdue
> > being able to commit her work.
> >
> >
> > From: David Capwell <dc...@apple.com.INVALID>
> > Date: Monday, 29 November 2021 at 23:44
> > To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> > Subject: Re: [DISCUSS] Nested YAML configs for new features
> > >  but I would hate to repeat the mistakes of our past by evolving the
> > config in a new direction without any coherent overarching design.
> >
> > At the start I asked to keep the thread local to new features, but to
> more
> > flesh out an “overarching design” maybe we should increase the “desired”
> > scope to be “feature” (and leave non-features to CASSANDRA-15234 -
> > Standardise config and JVM parameters)?  Aka, do we think the following
> is
> > more ideal (configs scoped to a feature)
> >
> > hinted_handoff:
> >   enabled: true
> >   disabled_datacenters:
> >     - DC1
> >     - DC2
> >   max_window: 3h
> >   flush_period: 10s
> >   max_file_size: 128mb
> >   compression:
> >     class_name: LZ4Compressor
> >     parameters:
> >       a: b
> >
> > track_warnings:
> >   enabled: true
> >   local_read_size:
> >     warn_threshold: 1mb
> >     abort_threshold: 10mb
> >   coordinator_read_size:
> >     warn_threshold: 5mb
> >     abort_threshold: 20mb
> >
> >
> > OR
> >
> > # I had to rename hint configs as there was 0 consistent naming
> > hinted_handoff_enabled: true
> > hinted_handoff_disabled_datacenters:
> >   - 'DC1'
> >   - 'DC2'
> > hinted_handoff_max_window: 3h
> > hinted_handoff_max_file_size: 128mb
> > hinted_handoff_flush_period: 10s
> > hinted_handoff_compression:
> >   class_name: LZ4Compressor
> >   parameters:
> >     a: b
> >
> > track_warnings_enabled: true
> > track_warnings_local_read_size_warn_threshold: 1mb
> > track_warnings_local_read_size_abort_threshold: 10mb
> > track_warnings_coordinator_read_size_warn_threshold: 5mb
> > track_warnings_coordinator_read_size_abort_threshold: 20mb
> >
> >
> > The main issue I have with flat structure is that we have no way to
> > enforce standard naming; if you look at the hint example there were at
> > least 3 naming conventions (CASSANDRA-15234 is to clean this up, but can
> we
> > actually maintain that?).  And one of the core reasons track_warnings
> went
> > nested was that warn/abort some times became warn/fail and threshold some
> > times was thresholds…. By embracing nested structure we can actually
> > enforce consistency, with flat we have no way to maintain consistency.
> >
> > Additionally by embracing the nested structure we can accept a flat one
> as
> > well (PR in CASSANDRA-17166 shows this working) if users desire it; so we
> > get the consistency of nested, and the “grep” benefits of flat.
> >
> >
> > > On Nov 29, 2021, at 2:17 PM, benedict@apache.org wrote:
> > >
> > > If we’re thinking of moving towards nested configuration, then before
> > employing the approach further we would ideally consider what a fully
> > nested config looks like for the project. Ekaterina has done a lot to
> clean
> > up inconsistent naming, but I would hate to repeat the mistakes of our
> past
> > by evolving the config in a new direction without any coherent
> overarching
> > design.
> > >
> > > In case anyone missed it in the earlier discussion, this was my attempt
> > to prototype a nested config:
> >
> https://github.com/belliottsmith/cassandra/blob/5f80d1c0d38873b7a27dc137656d8b81f8e6bbd7/conf/cassandra_nocomment.yaml
> > >
> > > I don’t have any specific attachment to it, but settling on some
> > approximate scheme would be helpful IMO.
> > >
> > > From: David Capwell <dc...@apple.com.INVALID>
> > > Date: Monday, 29 November 2021 at 20:38
> > > To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> > > Subject: Re: [DISCUSS] Nested YAML configs for new features
> > >> What should our default example cassandra.yaml file use (flat or
> > nested)?  Currently default shows nested
> > >
> > > Was told this statement was confusing, so trying to clarify.  At the
> > moment we do not allow a nested config to be expressed in any way outside
> > of nesting it (excluding YAML’s ability to inline objects), so if we did
> > allow flat config representation of nested configs, then this would be a
> > brand new feature; we currently show the nested structure in
> cassandra.yaml
> > >
> > >> On Nov 29, 2021, at 11:58 AM, David Capwell
> <dc...@apple.com.INVALID>
> > wrote:
> > >>
> > >> Thanks everyone for the comments, I hope below is a good summary of
> all
> > the talking points?
> > >>
> > >> We already use nested configs (networking, seed provider, commit
> > log/hint compression, back pressure, etc.)
> > >> Flat configs are easier for grep, but can be solved with grep -A/-B
> > and/or yq
> > >> It would be possible to support flat versions of our configs in
> > cassandra.yaml (in addition to the nested versions)
> > >> "Settings" vtable currently uses the "_" separator (example of
> > encryption/audit log).  Switching to "." Would be a change in behavior
> > which may impact some users
> > >> "." Separator for nested configs are common in other systems (yq,
> > elastic search, etc.)
> > >> "Structured / nested config is easier for human eyes to read"... "Flat
> > config is harder for human eyes but easy for simple scripts"
> > >> For learning what configs are enabled, cassandra.yaml isn't the best
> > interface as it may not reflect the actual configs; we can better expose
> > this in CQL and/or Sidecar
> > >> What should our default example cassandra.yaml file use (flat or
> > nested)?  Currently default shows nested
> > >> When projecting the Config into CQL, we may want to consider UDTs to
> > represent the complex types
> > >> Current limitations in CQL make nested structures hard to work with,
> it
> > may be worth wild to expand CQL support for nested structures.
> > >>
> > >> I also took a quick stab at enhancing our cassandra.yaml logic to: 1)
> > be reusable outside of yaml parsing, 2) support setters (we currently do,
> > but setters must be snake case… I fixed that)…, 3) support both nested
> and
> > structured, 4) support ignoring fields in a consistent way (Settings
> vtable
> > will include things SnakeYAML won’t and visa-versa).
> > >>
> > >> https://github.com/apache/cassandra/pull/1335 <
> > https://github.com/apache/cassandra/pull/1335><
> https://github.com/apache/cassandra/pull/1335%3e>.  This PR is NOT a final
> > ready to merge thing, but instead a POC to show how we can solve a lot of
> > the core problems in a consistent and reusable manner.
> > >>
> > >> The following cassandra.yaml was used to show both worlds would work
> > fine in the config (and compliment each other)
> > >>
> > >> track_warnings:
> > >> enabled: true
> > >> # nested relative to the local level (TrackWarnings)
> > >> coordinator_read_size.warn_threshold_kb: 1024
> > >> local_read_size.abort_threshold_kb: 1024
> > >> row_index_size:
> > >>   warn_threshold_kb: 1024
> > >>   abort_threshold_kb: 1024
> > >> # nested relative to the top level
> > >> track_warnings.coordinator_read_size.abort_threshold_kb: 42
> > >>
> > >> For the “Settings” vtable, a new Loader interface was added to get all
> > the properties, and Properties.flatten would turn every property into a
> > “flatten” version (isScalar (isPrimitive or not hasSubProperties) or
> > isCollection).  This doesn’t solve 100% of the issues that vtable has
> > (types such as Duration would need additional translation as they are
> > Scalar but need a translation from String -> Duration), and doesn’t solve
> > the fact the table currently uses “_”.
> > >>
> > >>> On Nov 29, 2021, at 10:11 AM, benedict@apache.org wrote:
> > >>>
> > >>> I meant to imply we should improve our UDT usability to support this
> > kind of querying, essentially – but that if we support a simple
> > text->property setup we might want to offer LIKE support so we can search
> > them (via simple filtering, not any index) – which is actually pretty
> easy
> > to provide.
> > >>>
> > >>> I think we should aim to provide users all the facilities they need
> to
> > interact with config via vtables. If the user requires external tooling,
> it
> > suggests a weakness in CQL that we should address, and maybe help the
> user
> > in other scenario too…
> > >>>
> > >>> From: Joseph Lynch <jo...@gmail.com>
> > >>> Date: Monday, 29 November 2021 at 17:32
> > >>> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> > >>> Subject: Re: [DISCUSS] Nested YAML configs for new features
> > >>> On Mon, Nov 29, 2021 at 11:51 AM benedict@apache.org
> > >>> <be...@apache.org> wrote:
> > >>>>
> > >>>> Maybe we can make our query language more expressive 😊
> > >>>>
> > >>>> We might anyway want to introduce e.g. a LIKE filtering option to
> > find/discover flattened config parameters?
> > >>>
> > >>> This sounds more complicated than just having the settings virtual
> > >>> table return text (dot encoded) -> text (json) and probably not even
> > >>> that much more useful. A full table scan on the settings table could
> > >>> return all top level keys (strings before the first dot) and if we
> > >>> just return a valid json string then users can bring their own
> > >>> querying capabilities via jq [1], or one line of code in almost any
> > >>> programming language (especially python, perl, etc ...).
> > >>>
> > >>> Alternatively if we want to modify the grammar it seems supporting
> > >>> structured data querying on text fields would maybe be more
> preferable
> > >>> to LIKE since you could get what you want without a grammar change
> and
> > >>> if we could generalize to any text column it would be amazingly
> useful
> > >>> elsewhere to users. For example, we could emulate jq's query syntax
> in
> > >>> the select which is, imo, best-in-class for quickly querying into
> > >>> nearest structures. Assuming a key (text) -> value (json) schema:
> > >>>
> > >>> 'a' -> "{'b': [{'c': {'d': 4}}]}",
> > >>>
> > >>> SELECT json(value).b.0.c.d FROM settings WHERE key = 'a';
> > >>>
> > >>> To have exactly jq syntax (but harder to parse) it would be:
> > >>>
> > >>> SELECT json(value).b[0].c.d FROM settings WHERE key = 'a';
> > >>>
> > >>> Since we're not indexing the structured data in any way, filtering
> > >>> before selection probably doesn't give us much performance
> improvement
> > >>> as we'd still have to parse the whole text field in most cases.
> > >>>
> > >>> -Joey
> > >>>
> > >>> [1] https://stedolan.github.io/jq/
> > >>>
> > >>> ---------------------------------------------------------------------
> > >>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > >>> For additional commands, e-mail: dev-help@cassandra.apache.org
> > >>
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
>

Re: [DISCUSS] Nested YAML configs for new features

Posted by "benedict@apache.org" <be...@apache.org>.

I mean that it has been waiting for months, is ready to go, and I don’t want to hold you up any longer.

From: Ekaterina Dimitrova <e....@gmail.com>
Date: Tuesday, 30 November 2021 at 13:44
To: dev@cassandra.apache.org <de...@cassandra.apache.org>
Subject: Re: [DISCUSS] Nested YAML configs for new features
“
IMO 15234 has sailed – it’s been held up for a long time, and was brought
to this list for discussion with no engagement. Ekaterina is long overdue
being able to commit her work. “


 Sailed? I submitted the patch a week ago for review. Not sure how to
understand this statement. Can elaborate, please?

On Tue, 30 Nov 2021 at 8:09, benedict@apache.org <be...@apache.org>
wrote:

> The problem with scoping this to “features” is that we end up with at best
> local coherence. The config file as a whole will end up just as incoherent
> through its design evolution as it has historically.
>
> If you take a look at my proposed layout for the overall config, there is
> a “limits” section that specifies thresholds for reporting warnings and
> errors for various scenario. In this case, we probably don’t also want
> per-feature limits? We’re also mixing terminology already, with
> limits/thresholds and fail/abort.
>
> It’s a lot of work to come up with a coherent and intuitive config layout.
> We probably want to at least create some documentation in-tree stipulating
> terminology with respect to plurals, verbs/nouns, and specific terms
> (period, abort, limit, datacenter vs dc, etc), but ideally we would have a
> common end goal for the config file.
>
> > leave non-features to CASSANDRA-15234
>
> IMO 15234 has sailed – it’s been held up for a long time, and was brought
> to this list for discussion with no engagement. Ekaterina is long overdue
> being able to commit her work.
>
>
> From: David Capwell <dc...@apple.com.INVALID>
> Date: Monday, 29 November 2021 at 23:44
> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> Subject: Re: [DISCUSS] Nested YAML configs for new features
> >  but I would hate to repeat the mistakes of our past by evolving the
> config in a new direction without any coherent overarching design.
>
> At the start I asked to keep the thread local to new features, but to more
> flesh out an “overarching design” maybe we should increase the “desired”
> scope to be “feature” (and leave non-features to CASSANDRA-15234 -
> Standardise config and JVM parameters)?  Aka, do we think the following is
> more ideal (configs scoped to a feature)
>
> hinted_handoff:
>   enabled: true
>   disabled_datacenters:
>     - DC1
>     - DC2
>   max_window: 3h
>   flush_period: 10s
>   max_file_size: 128mb
>   compression:
>     class_name: LZ4Compressor
>     parameters:
>       a: b
>
> track_warnings:
>   enabled: true
>   local_read_size:
>     warn_threshold: 1mb
>     abort_threshold: 10mb
>   coordinator_read_size:
>     warn_threshold: 5mb
>     abort_threshold: 20mb
>
>
> OR
>
> # I had to rename hint configs as there was 0 consistent naming
> hinted_handoff_enabled: true
> hinted_handoff_disabled_datacenters:
>   - 'DC1'
>   - 'DC2'
> hinted_handoff_max_window: 3h
> hinted_handoff_max_file_size: 128mb
> hinted_handoff_flush_period: 10s
> hinted_handoff_compression:
>   class_name: LZ4Compressor
>   parameters:
>     a: b
>
> track_warnings_enabled: true
> track_warnings_local_read_size_warn_threshold: 1mb
> track_warnings_local_read_size_abort_threshold: 10mb
> track_warnings_coordinator_read_size_warn_threshold: 5mb
> track_warnings_coordinator_read_size_abort_threshold: 20mb
>
>
> The main issue I have with flat structure is that we have no way to
> enforce standard naming; if you look at the hint example there were at
> least 3 naming conventions (CASSANDRA-15234 is to clean this up, but can we
> actually maintain that?).  And one of the core reasons track_warnings went
> nested was that warn/abort some times became warn/fail and threshold some
> times was thresholds…. By embracing nested structure we can actually
> enforce consistency, with flat we have no way to maintain consistency.
>
> Additionally by embracing the nested structure we can accept a flat one as
> well (PR in CASSANDRA-17166 shows this working) if users desire it; so we
> get the consistency of nested, and the “grep” benefits of flat.
>
>
> > On Nov 29, 2021, at 2:17 PM, benedict@apache.org wrote:
> >
> > If we’re thinking of moving towards nested configuration, then before
> employing the approach further we would ideally consider what a fully
> nested config looks like for the project. Ekaterina has done a lot to clean
> up inconsistent naming, but I would hate to repeat the mistakes of our past
> by evolving the config in a new direction without any coherent overarching
> design.
> >
> > In case anyone missed it in the earlier discussion, this was my attempt
> to prototype a nested config:
> https://github.com/belliottsmith/cassandra/blob/5f80d1c0d38873b7a27dc137656d8b81f8e6bbd7/conf/cassandra_nocomment.yaml
> >
> > I don’t have any specific attachment to it, but settling on some
> approximate scheme would be helpful IMO.
> >
> > From: David Capwell <dc...@apple.com.INVALID>
> > Date: Monday, 29 November 2021 at 20:38
> > To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> > Subject: Re: [DISCUSS] Nested YAML configs for new features
> >> What should our default example cassandra.yaml file use (flat or
> nested)?  Currently default shows nested
> >
> > Was told this statement was confusing, so trying to clarify.  At the
> moment we do not allow a nested config to be expressed in any way outside
> of nesting it (excluding YAML’s ability to inline objects), so if we did
> allow flat config representation of nested configs, then this would be a
> brand new feature; we currently show the nested structure in cassandra.yaml
> >
> >> On Nov 29, 2021, at 11:58 AM, David Capwell <dc...@apple.com.INVALID>
> wrote:
> >>
> >> Thanks everyone for the comments, I hope below is a good summary of all
> the talking points?
> >>
> >> We already use nested configs (networking, seed provider, commit
> log/hint compression, back pressure, etc.)
> >> Flat configs are easier for grep, but can be solved with grep -A/-B
> and/or yq
> >> It would be possible to support flat versions of our configs in
> cassandra.yaml (in addition to the nested versions)
> >> "Settings" vtable currently uses the "_" separator (example of
> encryption/audit log).  Switching to "." Would be a change in behavior
> which may impact some users
> >> "." Separator for nested configs are common in other systems (yq,
> elastic search, etc.)
> >> "Structured / nested config is easier for human eyes to read"... "Flat
> config is harder for human eyes but easy for simple scripts"
> >> For learning what configs are enabled, cassandra.yaml isn't the best
> interface as it may not reflect the actual configs; we can better expose
> this in CQL and/or Sidecar
> >> What should our default example cassandra.yaml file use (flat or
> nested)?  Currently default shows nested
> >> When projecting the Config into CQL, we may want to consider UDTs to
> represent the complex types
> >> Current limitations in CQL make nested structures hard to work with, it
> may be worth wild to expand CQL support for nested structures.
> >>
> >> I also took a quick stab at enhancing our cassandra.yaml logic to: 1)
> be reusable outside of yaml parsing, 2) support setters (we currently do,
> but setters must be snake case… I fixed that)…, 3) support both nested and
> structured, 4) support ignoring fields in a consistent way (Settings vtable
> will include things SnakeYAML won’t and visa-versa).
> >>
> >> https://github.com/apache/cassandra/pull/1335 <
> https://github.com/apache/cassandra/pull/1335><https://github.com/apache/cassandra/pull/1335%3e>.  This PR is NOT a final
> ready to merge thing, but instead a POC to show how we can solve a lot of
> the core problems in a consistent and reusable manner.
> >>
> >> The following cassandra.yaml was used to show both worlds would work
> fine in the config (and compliment each other)
> >>
> >> track_warnings:
> >> enabled: true
> >> # nested relative to the local level (TrackWarnings)
> >> coordinator_read_size.warn_threshold_kb: 1024
> >> local_read_size.abort_threshold_kb: 1024
> >> row_index_size:
> >>   warn_threshold_kb: 1024
> >>   abort_threshold_kb: 1024
> >> # nested relative to the top level
> >> track_warnings.coordinator_read_size.abort_threshold_kb: 42
> >>
> >> For the “Settings” vtable, a new Loader interface was added to get all
> the properties, and Properties.flatten would turn every property into a
> “flatten” version (isScalar (isPrimitive or not hasSubProperties) or
> isCollection).  This doesn’t solve 100% of the issues that vtable has
> (types such as Duration would need additional translation as they are
> Scalar but need a translation from String -> Duration), and doesn’t solve
> the fact the table currently uses “_”.
> >>
> >>> On Nov 29, 2021, at 10:11 AM, benedict@apache.org wrote:
> >>>
> >>> I meant to imply we should improve our UDT usability to support this
> kind of querying, essentially – but that if we support a simple
> text->property setup we might want to offer LIKE support so we can search
> them (via simple filtering, not any index) – which is actually pretty easy
> to provide.
> >>>
> >>> I think we should aim to provide users all the facilities they need to
> interact with config via vtables. If the user requires external tooling, it
> suggests a weakness in CQL that we should address, and maybe help the user
> in other scenario too…
> >>>
> >>> From: Joseph Lynch <jo...@gmail.com>
> >>> Date: Monday, 29 November 2021 at 17:32
> >>> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> >>> Subject: Re: [DISCUSS] Nested YAML configs for new features
> >>> On Mon, Nov 29, 2021 at 11:51 AM benedict@apache.org
> >>> <be...@apache.org> wrote:
> >>>>
> >>>> Maybe we can make our query language more expressive 😊
> >>>>
> >>>> We might anyway want to introduce e.g. a LIKE filtering option to
> find/discover flattened config parameters?
> >>>
> >>> This sounds more complicated than just having the settings virtual
> >>> table return text (dot encoded) -> text (json) and probably not even
> >>> that much more useful. A full table scan on the settings table could
> >>> return all top level keys (strings before the first dot) and if we
> >>> just return a valid json string then users can bring their own
> >>> querying capabilities via jq [1], or one line of code in almost any
> >>> programming language (especially python, perl, etc ...).
> >>>
> >>> Alternatively if we want to modify the grammar it seems supporting
> >>> structured data querying on text fields would maybe be more preferable
> >>> to LIKE since you could get what you want without a grammar change and
> >>> if we could generalize to any text column it would be amazingly useful
> >>> elsewhere to users. For example, we could emulate jq's query syntax in
> >>> the select which is, imo, best-in-class for quickly querying into
> >>> nearest structures. Assuming a key (text) -> value (json) schema:
> >>>
> >>> 'a' -> "{'b': [{'c': {'d': 4}}]}",
> >>>
> >>> SELECT json(value).b.0.c.d FROM settings WHERE key = 'a';
> >>>
> >>> To have exactly jq syntax (but harder to parse) it would be:
> >>>
> >>> SELECT json(value).b[0].c.d FROM settings WHERE key = 'a';
> >>>
> >>> Since we're not indexing the structured data in any way, filtering
> >>> before selection probably doesn't give us much performance improvement
> >>> as we'd still have to parse the whole text field in most cases.
> >>>
> >>> -Joey
> >>>
> >>> [1] https://stedolan.github.io/jq/
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>

Re: [DISCUSS] Nested YAML configs for new features

Posted by Ekaterina Dimitrova <e....@gmail.com>.

“
IMO 15234 has sailed – it’s been held up for a long time, and was brought
to this list for discussion with no engagement. Ekaterina is long overdue
being able to commit her work. “


 Sailed? I submitted the patch a week ago for review. Not sure how to
understand this statement. Can elaborate, please?

On Tue, 30 Nov 2021 at 8:09, benedict@apache.org <be...@apache.org>
wrote:

> The problem with scoping this to “features” is that we end up with at best
> local coherence. The config file as a whole will end up just as incoherent
> through its design evolution as it has historically.
>
> If you take a look at my proposed layout for the overall config, there is
> a “limits” section that specifies thresholds for reporting warnings and
> errors for various scenario. In this case, we probably don’t also want
> per-feature limits? We’re also mixing terminology already, with
> limits/thresholds and fail/abort.
>
> It’s a lot of work to come up with a coherent and intuitive config layout.
> We probably want to at least create some documentation in-tree stipulating
> terminology with respect to plurals, verbs/nouns, and specific terms
> (period, abort, limit, datacenter vs dc, etc), but ideally we would have a
> common end goal for the config file.
>
> > leave non-features to CASSANDRA-15234
>
> IMO 15234 has sailed – it’s been held up for a long time, and was brought
> to this list for discussion with no engagement. Ekaterina is long overdue
> being able to commit her work.
>
>
> From: David Capwell <dc...@apple.com.INVALID>
> Date: Monday, 29 November 2021 at 23:44
> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> Subject: Re: [DISCUSS] Nested YAML configs for new features
> >  but I would hate to repeat the mistakes of our past by evolving the
> config in a new direction without any coherent overarching design.
>
> At the start I asked to keep the thread local to new features, but to more
> flesh out an “overarching design” maybe we should increase the “desired”
> scope to be “feature” (and leave non-features to CASSANDRA-15234 -
> Standardise config and JVM parameters)?  Aka, do we think the following is
> more ideal (configs scoped to a feature)
>
> hinted_handoff:
>   enabled: true
>   disabled_datacenters:
>     - DC1
>     - DC2
>   max_window: 3h
>   flush_period: 10s
>   max_file_size: 128mb
>   compression:
>     class_name: LZ4Compressor
>     parameters:
>       a: b
>
> track_warnings:
>   enabled: true
>   local_read_size:
>     warn_threshold: 1mb
>     abort_threshold: 10mb
>   coordinator_read_size:
>     warn_threshold: 5mb
>     abort_threshold: 20mb
>
>
> OR
>
> # I had to rename hint configs as there was 0 consistent naming
> hinted_handoff_enabled: true
> hinted_handoff_disabled_datacenters:
>   - 'DC1'
>   - 'DC2'
> hinted_handoff_max_window: 3h
> hinted_handoff_max_file_size: 128mb
> hinted_handoff_flush_period: 10s
> hinted_handoff_compression:
>   class_name: LZ4Compressor
>   parameters:
>     a: b
>
> track_warnings_enabled: true
> track_warnings_local_read_size_warn_threshold: 1mb
> track_warnings_local_read_size_abort_threshold: 10mb
> track_warnings_coordinator_read_size_warn_threshold: 5mb
> track_warnings_coordinator_read_size_abort_threshold: 20mb
>
>
> The main issue I have with flat structure is that we have no way to
> enforce standard naming; if you look at the hint example there were at
> least 3 naming conventions (CASSANDRA-15234 is to clean this up, but can we
> actually maintain that?).  And one of the core reasons track_warnings went
> nested was that warn/abort some times became warn/fail and threshold some
> times was thresholds…. By embracing nested structure we can actually
> enforce consistency, with flat we have no way to maintain consistency.
>
> Additionally by embracing the nested structure we can accept a flat one as
> well (PR in CASSANDRA-17166 shows this working) if users desire it; so we
> get the consistency of nested, and the “grep” benefits of flat.
>
>
> > On Nov 29, 2021, at 2:17 PM, benedict@apache.org wrote:
> >
> > If we’re thinking of moving towards nested configuration, then before
> employing the approach further we would ideally consider what a fully
> nested config looks like for the project. Ekaterina has done a lot to clean
> up inconsistent naming, but I would hate to repeat the mistakes of our past
> by evolving the config in a new direction without any coherent overarching
> design.
> >
> > In case anyone missed it in the earlier discussion, this was my attempt
> to prototype a nested config:
> https://github.com/belliottsmith/cassandra/blob/5f80d1c0d38873b7a27dc137656d8b81f8e6bbd7/conf/cassandra_nocomment.yaml
> >
> > I don’t have any specific attachment to it, but settling on some
> approximate scheme would be helpful IMO.
> >
> > From: David Capwell <dc...@apple.com.INVALID>
> > Date: Monday, 29 November 2021 at 20:38
> > To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> > Subject: Re: [DISCUSS] Nested YAML configs for new features
> >> What should our default example cassandra.yaml file use (flat or
> nested)?  Currently default shows nested
> >
> > Was told this statement was confusing, so trying to clarify.  At the
> moment we do not allow a nested config to be expressed in any way outside
> of nesting it (excluding YAML’s ability to inline objects), so if we did
> allow flat config representation of nested configs, then this would be a
> brand new feature; we currently show the nested structure in cassandra.yaml
> >
> >> On Nov 29, 2021, at 11:58 AM, David Capwell <dc...@apple.com.INVALID>
> wrote:
> >>
> >> Thanks everyone for the comments, I hope below is a good summary of all
> the talking points?
> >>
> >> We already use nested configs (networking, seed provider, commit
> log/hint compression, back pressure, etc.)
> >> Flat configs are easier for grep, but can be solved with grep -A/-B
> and/or yq
> >> It would be possible to support flat versions of our configs in
> cassandra.yaml (in addition to the nested versions)
> >> "Settings" vtable currently uses the "_" separator (example of
> encryption/audit log).  Switching to "." Would be a change in behavior
> which may impact some users
> >> "." Separator for nested configs are common in other systems (yq,
> elastic search, etc.)
> >> "Structured / nested config is easier for human eyes to read"... "Flat
> config is harder for human eyes but easy for simple scripts"
> >> For learning what configs are enabled, cassandra.yaml isn't the best
> interface as it may not reflect the actual configs; we can better expose
> this in CQL and/or Sidecar
> >> What should our default example cassandra.yaml file use (flat or
> nested)?  Currently default shows nested
> >> When projecting the Config into CQL, we may want to consider UDTs to
> represent the complex types
> >> Current limitations in CQL make nested structures hard to work with, it
> may be worth wild to expand CQL support for nested structures.
> >>
> >> I also took a quick stab at enhancing our cassandra.yaml logic to: 1)
> be reusable outside of yaml parsing, 2) support setters (we currently do,
> but setters must be snake case… I fixed that)…, 3) support both nested and
> structured, 4) support ignoring fields in a consistent way (Settings vtable
> will include things SnakeYAML won’t and visa-versa).
> >>
> >> https://github.com/apache/cassandra/pull/1335 <
> https://github.com/apache/cassandra/pull/1335>.  This PR is NOT a final
> ready to merge thing, but instead a POC to show how we can solve a lot of
> the core problems in a consistent and reusable manner.
> >>
> >> The following cassandra.yaml was used to show both worlds would work
> fine in the config (and compliment each other)
> >>
> >> track_warnings:
> >> enabled: true
> >> # nested relative to the local level (TrackWarnings)
> >> coordinator_read_size.warn_threshold_kb: 1024
> >> local_read_size.abort_threshold_kb: 1024
> >> row_index_size:
> >>   warn_threshold_kb: 1024
> >>   abort_threshold_kb: 1024
> >> # nested relative to the top level
> >> track_warnings.coordinator_read_size.abort_threshold_kb: 42
> >>
> >> For the “Settings” vtable, a new Loader interface was added to get all
> the properties, and Properties.flatten would turn every property into a
> “flatten” version (isScalar (isPrimitive or not hasSubProperties) or
> isCollection).  This doesn’t solve 100% of the issues that vtable has
> (types such as Duration would need additional translation as they are
> Scalar but need a translation from String -> Duration), and doesn’t solve
> the fact the table currently uses “_”.
> >>
> >>> On Nov 29, 2021, at 10:11 AM, benedict@apache.org wrote:
> >>>
> >>> I meant to imply we should improve our UDT usability to support this
> kind of querying, essentially – but that if we support a simple
> text->property setup we might want to offer LIKE support so we can search
> them (via simple filtering, not any index) – which is actually pretty easy
> to provide.
> >>>
> >>> I think we should aim to provide users all the facilities they need to
> interact with config via vtables. If the user requires external tooling, it
> suggests a weakness in CQL that we should address, and maybe help the user
> in other scenario too…
> >>>
> >>> From: Joseph Lynch <jo...@gmail.com>
> >>> Date: Monday, 29 November 2021 at 17:32
> >>> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> >>> Subject: Re: [DISCUSS] Nested YAML configs for new features
> >>> On Mon, Nov 29, 2021 at 11:51 AM benedict@apache.org
> >>> <be...@apache.org> wrote:
> >>>>
> >>>> Maybe we can make our query language more expressive 😊
> >>>>
> >>>> We might anyway want to introduce e.g. a LIKE filtering option to
> find/discover flattened config parameters?
> >>>
> >>> This sounds more complicated than just having the settings virtual
> >>> table return text (dot encoded) -> text (json) and probably not even
> >>> that much more useful. A full table scan on the settings table could
> >>> return all top level keys (strings before the first dot) and if we
> >>> just return a valid json string then users can bring their own
> >>> querying capabilities via jq [1], or one line of code in almost any
> >>> programming language (especially python, perl, etc ...).
> >>>
> >>> Alternatively if we want to modify the grammar it seems supporting
> >>> structured data querying on text fields would maybe be more preferable
> >>> to LIKE since you could get what you want without a grammar change and
> >>> if we could generalize to any text column it would be amazingly useful
> >>> elsewhere to users. For example, we could emulate jq's query syntax in
> >>> the select which is, imo, best-in-class for quickly querying into
> >>> nearest structures. Assuming a key (text) -> value (json) schema:
> >>>
> >>> 'a' -> "{'b': [{'c': {'d': 4}}]}",
> >>>
> >>> SELECT json(value).b.0.c.d FROM settings WHERE key = 'a';
> >>>
> >>> To have exactly jq syntax (but harder to parse) it would be:
> >>>
> >>> SELECT json(value).b[0].c.d FROM settings WHERE key = 'a';
> >>>
> >>> Since we're not indexing the structured data in any way, filtering
> >>> before selection probably doesn't give us much performance improvement
> >>> as we'd still have to parse the whole text field in most cases.
> >>>
> >>> -Joey
> >>>
> >>> [1] https://stedolan.github.io/jq/
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>

Re: [DISCUSS] Nested YAML configs for new features

Posted by "benedict@apache.org" <be...@apache.org>.

The problem with scoping this to “features” is that we end up with at best local coherence. The config file as a whole will end up just as incoherent through its design evolution as it has historically.

If you take a look at my proposed layout for the overall config, there is a “limits” section that specifies thresholds for reporting warnings and errors for various scenario. In this case, we probably don’t also want per-feature limits? We’re also mixing terminology already, with limits/thresholds and fail/abort.

It’s a lot of work to come up with a coherent and intuitive config layout. We probably want to at least create some documentation in-tree stipulating terminology with respect to plurals, verbs/nouns, and specific terms (period, abort, limit, datacenter vs dc, etc), but ideally we would have a common end goal for the config file.

> leave non-features to CASSANDRA-15234

IMO 15234 has sailed – it’s been held up for a long time, and was brought to this list for discussion with no engagement. Ekaterina is long overdue being able to commit her work.


From: David Capwell <dc...@apple.com.INVALID>
Date: Monday, 29 November 2021 at 23:44
To: dev@cassandra.apache.org <de...@cassandra.apache.org>
Subject: Re: [DISCUSS] Nested YAML configs for new features
>  but I would hate to repeat the mistakes of our past by evolving the config in a new direction without any coherent overarching design.

At the start I asked to keep the thread local to new features, but to more flesh out an “overarching design” maybe we should increase the “desired” scope to be “feature” (and leave non-features to CASSANDRA-15234 - Standardise config and JVM parameters)?  Aka, do we think the following is more ideal (configs scoped to a feature)

hinted_handoff:
  enabled: true
  disabled_datacenters:
    - DC1
    - DC2
  max_window: 3h
  flush_period: 10s
  max_file_size: 128mb
  compression:
    class_name: LZ4Compressor
    parameters:
      a: b

track_warnings:
  enabled: true
  local_read_size:
    warn_threshold: 1mb
    abort_threshold: 10mb
  coordinator_read_size:
    warn_threshold: 5mb
    abort_threshold: 20mb


OR

# I had to rename hint configs as there was 0 consistent naming
hinted_handoff_enabled: true
hinted_handoff_disabled_datacenters:
  - 'DC1'
  - 'DC2'
hinted_handoff_max_window: 3h
hinted_handoff_max_file_size: 128mb
hinted_handoff_flush_period: 10s
hinted_handoff_compression:
  class_name: LZ4Compressor
  parameters:
    a: b

track_warnings_enabled: true
track_warnings_local_read_size_warn_threshold: 1mb
track_warnings_local_read_size_abort_threshold: 10mb
track_warnings_coordinator_read_size_warn_threshold: 5mb
track_warnings_coordinator_read_size_abort_threshold: 20mb


The main issue I have with flat structure is that we have no way to enforce standard naming; if you look at the hint example there were at least 3 naming conventions (CASSANDRA-15234 is to clean this up, but can we actually maintain that?).  And one of the core reasons track_warnings went nested was that warn/abort some times became warn/fail and threshold some times was thresholds…. By embracing nested structure we can actually enforce consistency, with flat we have no way to maintain consistency.

Additionally by embracing the nested structure we can accept a flat one as well (PR in CASSANDRA-17166 shows this working) if users desire it; so we get the consistency of nested, and the “grep” benefits of flat.


> On Nov 29, 2021, at 2:17 PM, benedict@apache.org wrote:
>
> If we’re thinking of moving towards nested configuration, then before employing the approach further we would ideally consider what a fully nested config looks like for the project. Ekaterina has done a lot to clean up inconsistent naming, but I would hate to repeat the mistakes of our past by evolving the config in a new direction without any coherent overarching design.
>
> In case anyone missed it in the earlier discussion, this was my attempt to prototype a nested config: https://github.com/belliottsmith/cassandra/blob/5f80d1c0d38873b7a27dc137656d8b81f8e6bbd7/conf/cassandra_nocomment.yaml
>
> I don’t have any specific attachment to it, but settling on some approximate scheme would be helpful IMO.
>
> From: David Capwell <dc...@apple.com.INVALID>
> Date: Monday, 29 November 2021 at 20:38
> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> Subject: Re: [DISCUSS] Nested YAML configs for new features
>> What should our default example cassandra.yaml file use (flat or nested)?  Currently default shows nested
>
> Was told this statement was confusing, so trying to clarify.  At the moment we do not allow a nested config to be expressed in any way outside of nesting it (excluding YAML’s ability to inline objects), so if we did allow flat config representation of nested configs, then this would be a brand new feature; we currently show the nested structure in cassandra.yaml
>
>> On Nov 29, 2021, at 11:58 AM, David Capwell <dc...@apple.com.INVALID> wrote:
>>
>> Thanks everyone for the comments, I hope below is a good summary of all the talking points?
>>
>> We already use nested configs (networking, seed provider, commit log/hint compression, back pressure, etc.)
>> Flat configs are easier for grep, but can be solved with grep -A/-B and/or yq
>> It would be possible to support flat versions of our configs in cassandra.yaml (in addition to the nested versions)
>> "Settings" vtable currently uses the "_" separator (example of encryption/audit log).  Switching to "." Would be a change in behavior which may impact some users
>> "." Separator for nested configs are common in other systems (yq, elastic search, etc.)
>> "Structured / nested config is easier for human eyes to read"... "Flat config is harder for human eyes but easy for simple scripts"
>> For learning what configs are enabled, cassandra.yaml isn't the best interface as it may not reflect the actual configs; we can better expose this in CQL and/or Sidecar
>> What should our default example cassandra.yaml file use (flat or nested)?  Currently default shows nested
>> When projecting the Config into CQL, we may want to consider UDTs to represent the complex types
>> Current limitations in CQL make nested structures hard to work with, it may be worth wild to expand CQL support for nested structures.
>>
>> I also took a quick stab at enhancing our cassandra.yaml logic to: 1) be reusable outside of yaml parsing, 2) support setters (we currently do, but setters must be snake case… I fixed that)…, 3) support both nested and structured, 4) support ignoring fields in a consistent way (Settings vtable will include things SnakeYAML won’t and visa-versa).
>>
>> https://github.com/apache/cassandra/pull/1335 <https://github.com/apache/cassandra/pull/1335>.  This PR is NOT a final ready to merge thing, but instead a POC to show how we can solve a lot of the core problems in a consistent and reusable manner.
>>
>> The following cassandra.yaml was used to show both worlds would work fine in the config (and compliment each other)
>>
>> track_warnings:
>> enabled: true
>> # nested relative to the local level (TrackWarnings)
>> coordinator_read_size.warn_threshold_kb: 1024
>> local_read_size.abort_threshold_kb: 1024
>> row_index_size:
>>   warn_threshold_kb: 1024
>>   abort_threshold_kb: 1024
>> # nested relative to the top level
>> track_warnings.coordinator_read_size.abort_threshold_kb: 42
>>
>> For the “Settings” vtable, a new Loader interface was added to get all the properties, and Properties.flatten would turn every property into a “flatten” version (isScalar (isPrimitive or not hasSubProperties) or isCollection).  This doesn’t solve 100% of the issues that vtable has (types such as Duration would need additional translation as they are Scalar but need a translation from String -> Duration), and doesn’t solve the fact the table currently uses “_”.
>>
>>> On Nov 29, 2021, at 10:11 AM, benedict@apache.org wrote:
>>>
>>> I meant to imply we should improve our UDT usability to support this kind of querying, essentially – but that if we support a simple text->property setup we might want to offer LIKE support so we can search them (via simple filtering, not any index) – which is actually pretty easy to provide.
>>>
>>> I think we should aim to provide users all the facilities they need to interact with config via vtables. If the user requires external tooling, it suggests a weakness in CQL that we should address, and maybe help the user in other scenario too…
>>>
>>> From: Joseph Lynch <jo...@gmail.com>
>>> Date: Monday, 29 November 2021 at 17:32
>>> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
>>> Subject: Re: [DISCUSS] Nested YAML configs for new features
>>> On Mon, Nov 29, 2021 at 11:51 AM benedict@apache.org
>>> <be...@apache.org> wrote:
>>>>
>>>> Maybe we can make our query language more expressive 😊
>>>>
>>>> We might anyway want to introduce e.g. a LIKE filtering option to find/discover flattened config parameters?
>>>
>>> This sounds more complicated than just having the settings virtual
>>> table return text (dot encoded) -> text (json) and probably not even
>>> that much more useful. A full table scan on the settings table could
>>> return all top level keys (strings before the first dot) and if we
>>> just return a valid json string then users can bring their own
>>> querying capabilities via jq [1], or one line of code in almost any
>>> programming language (especially python, perl, etc ...).
>>>
>>> Alternatively if we want to modify the grammar it seems supporting
>>> structured data querying on text fields would maybe be more preferable
>>> to LIKE since you could get what you want without a grammar change and
>>> if we could generalize to any text column it would be amazingly useful
>>> elsewhere to users. For example, we could emulate jq's query syntax in
>>> the select which is, imo, best-in-class for quickly querying into
>>> nearest structures. Assuming a key (text) -> value (json) schema:
>>>
>>> 'a' -> "{'b': [{'c': {'d': 4}}]}",
>>>
>>> SELECT json(value).b.0.c.d FROM settings WHERE key = 'a';
>>>
>>> To have exactly jq syntax (but harder to parse) it would be:
>>>
>>> SELECT json(value).b[0].c.d FROM settings WHERE key = 'a';
>>>
>>> Since we're not indexing the structured data in any way, filtering
>>> before selection probably doesn't give us much performance improvement
>>> as we'd still have to parse the whole text field in most cases.
>>>
>>> -Joey
>>>
>>> [1] https://stedolan.github.io/jq/
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Nested YAML configs for new features

Posted by David Capwell <dc...@apple.com.INVALID>.

>  but I would hate to repeat the mistakes of our past by evolving the config in a new direction without any coherent overarching design.

At the start I asked to keep the thread local to new features, but to more flesh out an “overarching design” maybe we should increase the “desired” scope to be “feature” (and leave non-features to CASSANDRA-15234 - Standardise config and JVM parameters)?  Aka, do we think the following is more ideal (configs scoped to a feature)

hinted_handoff:
  enabled: true
  disabled_datacenters:
    - DC1
    - DC2
  max_window: 3h
  flush_period: 10s
  max_file_size: 128mb
  compression:
    class_name: LZ4Compressor
    parameters:
      a: b

track_warnings:
  enabled: true
  local_read_size:
    warn_threshold: 1mb
    abort_threshold: 10mb
  coordinator_read_size:
    warn_threshold: 5mb
    abort_threshold: 20mb


OR

# I had to rename hint configs as there was 0 consistent naming
hinted_handoff_enabled: true
hinted_handoff_disabled_datacenters:
  - 'DC1'
  - 'DC2'
hinted_handoff_max_window: 3h
hinted_handoff_max_file_size: 128mb
hinted_handoff_flush_period: 10s
hinted_handoff_compression:
  class_name: LZ4Compressor
  parameters:
    a: b

track_warnings_enabled: true
track_warnings_local_read_size_warn_threshold: 1mb
track_warnings_local_read_size_abort_threshold: 10mb
track_warnings_coordinator_read_size_warn_threshold: 5mb
track_warnings_coordinator_read_size_abort_threshold: 20mb


The main issue I have with flat structure is that we have no way to enforce standard naming; if you look at the hint example there were at least 3 naming conventions (CASSANDRA-15234 is to clean this up, but can we actually maintain that?).  And one of the core reasons track_warnings went nested was that warn/abort some times became warn/fail and threshold some times was thresholds…. By embracing nested structure we can actually enforce consistency, with flat we have no way to maintain consistency.

Additionally by embracing the nested structure we can accept a flat one as well (PR in CASSANDRA-17166 shows this working) if users desire it; so we get the consistency of nested, and the “grep” benefits of flat.


> On Nov 29, 2021, at 2:17 PM, benedict@apache.org wrote:
> 
> If we’re thinking of moving towards nested configuration, then before employing the approach further we would ideally consider what a fully nested config looks like for the project. Ekaterina has done a lot to clean up inconsistent naming, but I would hate to repeat the mistakes of our past by evolving the config in a new direction without any coherent overarching design.
> 
> In case anyone missed it in the earlier discussion, this was my attempt to prototype a nested config: https://github.com/belliottsmith/cassandra/blob/5f80d1c0d38873b7a27dc137656d8b81f8e6bbd7/conf/cassandra_nocomment.yaml
> 
> I don’t have any specific attachment to it, but settling on some approximate scheme would be helpful IMO.
> 
> From: David Capwell <dc...@apple.com.INVALID>
> Date: Monday, 29 November 2021 at 20:38
> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> Subject: Re: [DISCUSS] Nested YAML configs for new features
>> What should our default example cassandra.yaml file use (flat or nested)?  Currently default shows nested
> 
> Was told this statement was confusing, so trying to clarify.  At the moment we do not allow a nested config to be expressed in any way outside of nesting it (excluding YAML’s ability to inline objects), so if we did allow flat config representation of nested configs, then this would be a brand new feature; we currently show the nested structure in cassandra.yaml
> 
>> On Nov 29, 2021, at 11:58 AM, David Capwell <dc...@apple.com.INVALID> wrote:
>> 
>> Thanks everyone for the comments, I hope below is a good summary of all the talking points?
>> 
>> We already use nested configs (networking, seed provider, commit log/hint compression, back pressure, etc.)
>> Flat configs are easier for grep, but can be solved with grep -A/-B and/or yq
>> It would be possible to support flat versions of our configs in cassandra.yaml (in addition to the nested versions)
>> "Settings" vtable currently uses the "_" separator (example of encryption/audit log).  Switching to "." Would be a change in behavior which may impact some users
>> "." Separator for nested configs are common in other systems (yq, elastic search, etc.)
>> "Structured / nested config is easier for human eyes to read"... "Flat config is harder for human eyes but easy for simple scripts"
>> For learning what configs are enabled, cassandra.yaml isn't the best interface as it may not reflect the actual configs; we can better expose this in CQL and/or Sidecar
>> What should our default example cassandra.yaml file use (flat or nested)?  Currently default shows nested
>> When projecting the Config into CQL, we may want to consider UDTs to represent the complex types
>> Current limitations in CQL make nested structures hard to work with, it may be worth wild to expand CQL support for nested structures.
>> 
>> I also took a quick stab at enhancing our cassandra.yaml logic to: 1) be reusable outside of yaml parsing, 2) support setters (we currently do, but setters must be snake case… I fixed that)…, 3) support both nested and structured, 4) support ignoring fields in a consistent way (Settings vtable will include things SnakeYAML won’t and visa-versa).
>> 
>> https://github.com/apache/cassandra/pull/1335 <https://github.com/apache/cassandra/pull/1335>.  This PR is NOT a final ready to merge thing, but instead a POC to show how we can solve a lot of the core problems in a consistent and reusable manner.
>> 
>> The following cassandra.yaml was used to show both worlds would work fine in the config (and compliment each other)
>> 
>> track_warnings:
>> enabled: true
>> # nested relative to the local level (TrackWarnings)
>> coordinator_read_size.warn_threshold_kb: 1024
>> local_read_size.abort_threshold_kb: 1024
>> row_index_size:
>>   warn_threshold_kb: 1024
>>   abort_threshold_kb: 1024
>> # nested relative to the top level
>> track_warnings.coordinator_read_size.abort_threshold_kb: 42
>> 
>> For the “Settings” vtable, a new Loader interface was added to get all the properties, and Properties.flatten would turn every property into a “flatten” version (isScalar (isPrimitive or not hasSubProperties) or isCollection).  This doesn’t solve 100% of the issues that vtable has (types such as Duration would need additional translation as they are Scalar but need a translation from String -> Duration), and doesn’t solve the fact the table currently uses “_”.
>> 
>>> On Nov 29, 2021, at 10:11 AM, benedict@apache.org wrote:
>>> 
>>> I meant to imply we should improve our UDT usability to support this kind of querying, essentially – but that if we support a simple text->property setup we might want to offer LIKE support so we can search them (via simple filtering, not any index) – which is actually pretty easy to provide.
>>> 
>>> I think we should aim to provide users all the facilities they need to interact with config via vtables. If the user requires external tooling, it suggests a weakness in CQL that we should address, and maybe help the user in other scenario too…
>>> 
>>> From: Joseph Lynch <jo...@gmail.com>
>>> Date: Monday, 29 November 2021 at 17:32
>>> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
>>> Subject: Re: [DISCUSS] Nested YAML configs for new features
>>> On Mon, Nov 29, 2021 at 11:51 AM benedict@apache.org
>>> <be...@apache.org> wrote:
>>>> 
>>>> Maybe we can make our query language more expressive 😊
>>>> 
>>>> We might anyway want to introduce e.g. a LIKE filtering option to find/discover flattened config parameters?
>>> 
>>> This sounds more complicated than just having the settings virtual
>>> table return text (dot encoded) -> text (json) and probably not even
>>> that much more useful. A full table scan on the settings table could
>>> return all top level keys (strings before the first dot) and if we
>>> just return a valid json string then users can bring their own
>>> querying capabilities via jq [1], or one line of code in almost any
>>> programming language (especially python, perl, etc ...).
>>> 
>>> Alternatively if we want to modify the grammar it seems supporting
>>> structured data querying on text fields would maybe be more preferable
>>> to LIKE since you could get what you want without a grammar change and
>>> if we could generalize to any text column it would be amazingly useful
>>> elsewhere to users. For example, we could emulate jq's query syntax in
>>> the select which is, imo, best-in-class for quickly querying into
>>> nearest structures. Assuming a key (text) -> value (json) schema:
>>> 
>>> 'a' -> "{'b': [{'c': {'d': 4}}]}",
>>> 
>>> SELECT json(value).b.0.c.d FROM settings WHERE key = 'a';
>>> 
>>> To have exactly jq syntax (but harder to parse) it would be:
>>> 
>>> SELECT json(value).b[0].c.d FROM settings WHERE key = 'a';
>>> 
>>> Since we're not indexing the structured data in any way, filtering
>>> before selection probably doesn't give us much performance improvement
>>> as we'd still have to parse the whole text field in most cases.
>>> 
>>> -Joey
>>> 
>>> [1] https://stedolan.github.io/jq/
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Nested YAML configs for new features

Posted by "benedict@apache.org" <be...@apache.org>.

If we’re thinking of moving towards nested configuration, then before employing the approach further we would ideally consider what a fully nested config looks like for the project. Ekaterina has done a lot to clean up inconsistent naming, but I would hate to repeat the mistakes of our past by evolving the config in a new direction without any coherent overarching design.

In case anyone missed it in the earlier discussion, this was my attempt to prototype a nested config: https://github.com/belliottsmith/cassandra/blob/5f80d1c0d38873b7a27dc137656d8b81f8e6bbd7/conf/cassandra_nocomment.yaml

I don’t have any specific attachment to it, but settling on some approximate scheme would be helpful IMO.

From: David Capwell <dc...@apple.com.INVALID>
Date: Monday, 29 November 2021 at 20:38
To: dev@cassandra.apache.org <de...@cassandra.apache.org>
Subject: Re: [DISCUSS] Nested YAML configs for new features
> What should our default example cassandra.yaml file use (flat or nested)?  Currently default shows nested

Was told this statement was confusing, so trying to clarify.  At the moment we do not allow a nested config to be expressed in any way outside of nesting it (excluding YAML’s ability to inline objects), so if we did allow flat config representation of nested configs, then this would be a brand new feature; we currently show the nested structure in cassandra.yaml

> On Nov 29, 2021, at 11:58 AM, David Capwell <dc...@apple.com.INVALID> wrote:
>
> Thanks everyone for the comments, I hope below is a good summary of all the talking points?
>
> We already use nested configs (networking, seed provider, commit log/hint compression, back pressure, etc.)
> Flat configs are easier for grep, but can be solved with grep -A/-B and/or yq
> It would be possible to support flat versions of our configs in cassandra.yaml (in addition to the nested versions)
> "Settings" vtable currently uses the "_" separator (example of encryption/audit log).  Switching to "." Would be a change in behavior which may impact some users
> "." Separator for nested configs are common in other systems (yq, elastic search, etc.)
> "Structured / nested config is easier for human eyes to read"... "Flat config is harder for human eyes but easy for simple scripts"
> For learning what configs are enabled, cassandra.yaml isn't the best interface as it may not reflect the actual configs; we can better expose this in CQL and/or Sidecar
> What should our default example cassandra.yaml file use (flat or nested)?  Currently default shows nested
> When projecting the Config into CQL, we may want to consider UDTs to represent the complex types
> Current limitations in CQL make nested structures hard to work with, it may be worth wild to expand CQL support for nested structures.
>
> I also took a quick stab at enhancing our cassandra.yaml logic to: 1) be reusable outside of yaml parsing, 2) support setters (we currently do, but setters must be snake case… I fixed that)…, 3) support both nested and structured, 4) support ignoring fields in a consistent way (Settings vtable will include things SnakeYAML won’t and visa-versa).
>
> https://github.com/apache/cassandra/pull/1335 <https://github.com/apache/cassandra/pull/1335>.  This PR is NOT a final ready to merge thing, but instead a POC to show how we can solve a lot of the core problems in a consistent and reusable manner.
>
> The following cassandra.yaml was used to show both worlds would work fine in the config (and compliment each other)
>
> track_warnings:
>  enabled: true
>  # nested relative to the local level (TrackWarnings)
>  coordinator_read_size.warn_threshold_kb: 1024
>  local_read_size.abort_threshold_kb: 1024
>  row_index_size:
>    warn_threshold_kb: 1024
>    abort_threshold_kb: 1024
> # nested relative to the top level
> track_warnings.coordinator_read_size.abort_threshold_kb: 42
>
> For the “Settings” vtable, a new Loader interface was added to get all the properties, and Properties.flatten would turn every property into a “flatten” version (isScalar (isPrimitive or not hasSubProperties) or isCollection).  This doesn’t solve 100% of the issues that vtable has (types such as Duration would need additional translation as they are Scalar but need a translation from String -> Duration), and doesn’t solve the fact the table currently uses “_”.
>
>> On Nov 29, 2021, at 10:11 AM, benedict@apache.org wrote:
>>
>> I meant to imply we should improve our UDT usability to support this kind of querying, essentially – but that if we support a simple text->property setup we might want to offer LIKE support so we can search them (via simple filtering, not any index) – which is actually pretty easy to provide.
>>
>> I think we should aim to provide users all the facilities they need to interact with config via vtables. If the user requires external tooling, it suggests a weakness in CQL that we should address, and maybe help the user in other scenario too…
>>
>> From: Joseph Lynch <jo...@gmail.com>
>> Date: Monday, 29 November 2021 at 17:32
>> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
>> Subject: Re: [DISCUSS] Nested YAML configs for new features
>> On Mon, Nov 29, 2021 at 11:51 AM benedict@apache.org
>> <be...@apache.org> wrote:
>>>
>>> Maybe we can make our query language more expressive 😊
>>>
>>> We might anyway want to introduce e.g. a LIKE filtering option to find/discover flattened config parameters?
>>
>> This sounds more complicated than just having the settings virtual
>> table return text (dot encoded) -> text (json) and probably not even
>> that much more useful. A full table scan on the settings table could
>> return all top level keys (strings before the first dot) and if we
>> just return a valid json string then users can bring their own
>> querying capabilities via jq [1], or one line of code in almost any
>> programming language (especially python, perl, etc ...).
>>
>> Alternatively if we want to modify the grammar it seems supporting
>> structured data querying on text fields would maybe be more preferable
>> to LIKE since you could get what you want without a grammar change and
>> if we could generalize to any text column it would be amazingly useful
>> elsewhere to users. For example, we could emulate jq's query syntax in
>> the select which is, imo, best-in-class for quickly querying into
>> nearest structures. Assuming a key (text) -> value (json) schema:
>>
>> 'a' -> "{'b': [{'c': {'d': 4}}]}",
>>
>> SELECT json(value).b.0.c.d FROM settings WHERE key = 'a';
>>
>> To have exactly jq syntax (but harder to parse) it would be:
>>
>> SELECT json(value).b[0].c.d FROM settings WHERE key = 'a';
>>
>> Since we're not indexing the structured data in any way, filtering
>> before selection probably doesn't give us much performance improvement
>> as we'd still have to parse the whole text field in most cases.
>>
>> -Joey
>>
>> [1] https://stedolan.github.io/jq/
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Nested YAML configs for new features

Posted by David Capwell <dc...@apple.com.INVALID>.

> What should our default example cassandra.yaml file use (flat or nested)?  Currently default shows nested

Was told this statement was confusing, so trying to clarify.  At the moment we do not allow a nested config to be expressed in any way outside of nesting it (excluding YAML’s ability to inline objects), so if we did allow flat config representation of nested configs, then this would be a brand new feature; we currently show the nested structure in cassandra.yaml

> On Nov 29, 2021, at 11:58 AM, David Capwell <dc...@apple.com.INVALID> wrote:
> 
> Thanks everyone for the comments, I hope below is a good summary of all the talking points?
> 
> We already use nested configs (networking, seed provider, commit log/hint compression, back pressure, etc.)
> Flat configs are easier for grep, but can be solved with grep -A/-B and/or yq
> It would be possible to support flat versions of our configs in cassandra.yaml (in addition to the nested versions)
> "Settings" vtable currently uses the "_" separator (example of encryption/audit log).  Switching to "." Would be a change in behavior which may impact some users
> "." Separator for nested configs are common in other systems (yq, elastic search, etc.)
> "Structured / nested config is easier for human eyes to read"... "Flat config is harder for human eyes but easy for simple scripts"
> For learning what configs are enabled, cassandra.yaml isn't the best interface as it may not reflect the actual configs; we can better expose this in CQL and/or Sidecar
> What should our default example cassandra.yaml file use (flat or nested)?  Currently default shows nested
> When projecting the Config into CQL, we may want to consider UDTs to represent the complex types
> Current limitations in CQL make nested structures hard to work with, it may be worth wild to expand CQL support for nested structures.
> 
> I also took a quick stab at enhancing our cassandra.yaml logic to: 1) be reusable outside of yaml parsing, 2) support setters (we currently do, but setters must be snake case… I fixed that)…, 3) support both nested and structured, 4) support ignoring fields in a consistent way (Settings vtable will include things SnakeYAML won’t and visa-versa).
> 
> https://github.com/apache/cassandra/pull/1335 <https://github.com/apache/cassandra/pull/1335>.  This PR is NOT a final ready to merge thing, but instead a POC to show how we can solve a lot of the core problems in a consistent and reusable manner.
> 
> The following cassandra.yaml was used to show both worlds would work fine in the config (and compliment each other)
> 
> track_warnings:
>  enabled: true
>  # nested relative to the local level (TrackWarnings)
>  coordinator_read_size.warn_threshold_kb: 1024
>  local_read_size.abort_threshold_kb: 1024
>  row_index_size:
>    warn_threshold_kb: 1024
>    abort_threshold_kb: 1024
> # nested relative to the top level
> track_warnings.coordinator_read_size.abort_threshold_kb: 42
> 
> For the “Settings” vtable, a new Loader interface was added to get all the properties, and Properties.flatten would turn every property into a “flatten” version (isScalar (isPrimitive or not hasSubProperties) or isCollection).  This doesn’t solve 100% of the issues that vtable has (types such as Duration would need additional translation as they are Scalar but need a translation from String -> Duration), and doesn’t solve the fact the table currently uses “_”.
> 
>> On Nov 29, 2021, at 10:11 AM, benedict@apache.org wrote:
>> 
>> I meant to imply we should improve our UDT usability to support this kind of querying, essentially – but that if we support a simple text->property setup we might want to offer LIKE support so we can search them (via simple filtering, not any index) – which is actually pretty easy to provide.
>> 
>> I think we should aim to provide users all the facilities they need to interact with config via vtables. If the user requires external tooling, it suggests a weakness in CQL that we should address, and maybe help the user in other scenario too…
>> 
>> From: Joseph Lynch <jo...@gmail.com>
>> Date: Monday, 29 November 2021 at 17:32
>> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
>> Subject: Re: [DISCUSS] Nested YAML configs for new features
>> On Mon, Nov 29, 2021 at 11:51 AM benedict@apache.org
>> <be...@apache.org> wrote:
>>> 
>>> Maybe we can make our query language more expressive 😊
>>> 
>>> We might anyway want to introduce e.g. a LIKE filtering option to find/discover flattened config parameters?
>> 
>> This sounds more complicated than just having the settings virtual
>> table return text (dot encoded) -> text (json) and probably not even
>> that much more useful. A full table scan on the settings table could
>> return all top level keys (strings before the first dot) and if we
>> just return a valid json string then users can bring their own
>> querying capabilities via jq [1], or one line of code in almost any
>> programming language (especially python, perl, etc ...).
>> 
>> Alternatively if we want to modify the grammar it seems supporting
>> structured data querying on text fields would maybe be more preferable
>> to LIKE since you could get what you want without a grammar change and
>> if we could generalize to any text column it would be amazingly useful
>> elsewhere to users. For example, we could emulate jq's query syntax in
>> the select which is, imo, best-in-class for quickly querying into
>> nearest structures. Assuming a key (text) -> value (json) schema:
>> 
>> 'a' -> "{'b': [{'c': {'d': 4}}]}",
>> 
>> SELECT json(value).b.0.c.d FROM settings WHERE key = 'a';
>> 
>> To have exactly jq syntax (but harder to parse) it would be:
>> 
>> SELECT json(value).b[0].c.d FROM settings WHERE key = 'a';
>> 
>> Since we're not indexing the structured data in any way, filtering
>> before selection probably doesn't give us much performance improvement
>> as we'd still have to parse the whole text field in most cases.
>> 
>> -Joey
>> 
>> [1] https://stedolan.github.io/jq/
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Nested YAML configs for new features

Posted by David Capwell <dc...@apple.com.INVALID>.

Thanks everyone for the comments, I hope below is a good summary of all the talking points?

We already use nested configs (networking, seed provider, commit log/hint compression, back pressure, etc.)
Flat configs are easier for grep, but can be solved with grep -A/-B and/or yq
It would be possible to support flat versions of our configs in cassandra.yaml (in addition to the nested versions)
"Settings" vtable currently uses the "_" separator (example of encryption/audit log).  Switching to "." Would be a change in behavior which may impact some users
"." Separator for nested configs are common in other systems (yq, elastic search, etc.)
"Structured / nested config is easier for human eyes to read"... "Flat config is harder for human eyes but easy for simple scripts"
For learning what configs are enabled, cassandra.yaml isn't the best interface as it may not reflect the actual configs; we can better expose this in CQL and/or Sidecar
What should our default example cassandra.yaml file use (flat or nested)?  Currently default shows nested
When projecting the Config into CQL, we may want to consider UDTs to represent the complex types
Current limitations in CQL make nested structures hard to work with, it may be worth wild to expand CQL support for nested structures.

I also took a quick stab at enhancing our cassandra.yaml logic to: 1) be reusable outside of yaml parsing, 2) support setters (we currently do, but setters must be snake case… I fixed that)…, 3) support both nested and structured, 4) support ignoring fields in a consistent way (Settings vtable will include things SnakeYAML won’t and visa-versa).

https://github.com/apache/cassandra/pull/1335 <https://github.com/apache/cassandra/pull/1335>.  This PR is NOT a final ready to merge thing, but instead a POC to show how we can solve a lot of the core problems in a consistent and reusable manner.

The following cassandra.yaml was used to show both worlds would work fine in the config (and compliment each other)

track_warnings:
  enabled: true
  # nested relative to the local level (TrackWarnings)
  coordinator_read_size.warn_threshold_kb: 1024
  local_read_size.abort_threshold_kb: 1024
  row_index_size:
    warn_threshold_kb: 1024
    abort_threshold_kb: 1024
# nested relative to the top level
track_warnings.coordinator_read_size.abort_threshold_kb: 42

For the “Settings” vtable, a new Loader interface was added to get all the properties, and Properties.flatten would turn every property into a “flatten” version (isScalar (isPrimitive or not hasSubProperties) or isCollection).  This doesn’t solve 100% of the issues that vtable has (types such as Duration would need additional translation as they are Scalar but need a translation from String -> Duration), and doesn’t solve the fact the table currently uses “_”.

> On Nov 29, 2021, at 10:11 AM, benedict@apache.org wrote:
> 
> I meant to imply we should improve our UDT usability to support this kind of querying, essentially – but that if we support a simple text->property setup we might want to offer LIKE support so we can search them (via simple filtering, not any index) – which is actually pretty easy to provide.
> 
> I think we should aim to provide users all the facilities they need to interact with config via vtables. If the user requires external tooling, it suggests a weakness in CQL that we should address, and maybe help the user in other scenario too…
> 
> From: Joseph Lynch <jo...@gmail.com>
> Date: Monday, 29 November 2021 at 17:32
> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> Subject: Re: [DISCUSS] Nested YAML configs for new features
> On Mon, Nov 29, 2021 at 11:51 AM benedict@apache.org
> <be...@apache.org> wrote:
>> 
>> Maybe we can make our query language more expressive 😊
>> 
>> We might anyway want to introduce e.g. a LIKE filtering option to find/discover flattened config parameters?
> 
> This sounds more complicated than just having the settings virtual
> table return text (dot encoded) -> text (json) and probably not even
> that much more useful. A full table scan on the settings table could
> return all top level keys (strings before the first dot) and if we
> just return a valid json string then users can bring their own
> querying capabilities via jq [1], or one line of code in almost any
> programming language (especially python, perl, etc ...).
> 
> Alternatively if we want to modify the grammar it seems supporting
> structured data querying on text fields would maybe be more preferable
> to LIKE since you could get what you want without a grammar change and
> if we could generalize to any text column it would be amazingly useful
> elsewhere to users. For example, we could emulate jq's query syntax in
> the select which is, imo, best-in-class for quickly querying into
> nearest structures. Assuming a key (text) -> value (json) schema:
> 
> 'a' -> "{'b': [{'c': {'d': 4}}]}",
> 
> SELECT json(value).b.0.c.d FROM settings WHERE key = 'a';
> 
> To have exactly jq syntax (but harder to parse) it would be:
> 
> SELECT json(value).b[0].c.d FROM settings WHERE key = 'a';
> 
> Since we're not indexing the structured data in any way, filtering
> before selection probably doesn't give us much performance improvement
> as we'd still have to parse the whole text field in most cases.
> 
> -Joey
> 
> [1] https://stedolan.github.io/jq/
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Nested YAML configs for new features

Posted by "benedict@apache.org" <be...@apache.org>.

I meant to imply we should improve our UDT usability to support this kind of querying, essentially – but that if we support a simple text->property setup we might want to offer LIKE support so we can search them (via simple filtering, not any index) – which is actually pretty easy to provide.

I think we should aim to provide users all the facilities they need to interact with config via vtables. If the user requires external tooling, it suggests a weakness in CQL that we should address, and maybe help the user in other scenario too…

From: Joseph Lynch <jo...@gmail.com>
Date: Monday, 29 November 2021 at 17:32
To: dev@cassandra.apache.org <de...@cassandra.apache.org>
Subject: Re: [DISCUSS] Nested YAML configs for new features
On Mon, Nov 29, 2021 at 11:51 AM benedict@apache.org
<be...@apache.org> wrote:
>
> Maybe we can make our query language more expressive 😊
>
> We might anyway want to introduce e.g. a LIKE filtering option to find/discover flattened config parameters?

This sounds more complicated than just having the settings virtual
table return text (dot encoded) -> text (json) and probably not even
that much more useful. A full table scan on the settings table could
return all top level keys (strings before the first dot) and if we
just return a valid json string then users can bring their own
querying capabilities via jq [1], or one line of code in almost any
programming language (especially python, perl, etc ...).

Alternatively if we want to modify the grammar it seems supporting
structured data querying on text fields would maybe be more preferable
to LIKE since you could get what you want without a grammar change and
if we could generalize to any text column it would be amazingly useful
elsewhere to users. For example, we could emulate jq's query syntax in
the select which is, imo, best-in-class for quickly querying into
nearest structures. Assuming a key (text) -> value (json) schema:

'a' -> "{'b': [{'c': {'d': 4}}]}",

SELECT json(value).b.0.c.d FROM settings WHERE key = 'a';

To have exactly jq syntax (but harder to parse) it would be:

SELECT json(value).b[0].c.d FROM settings WHERE key = 'a';

Since we're not indexing the structured data in any way, filtering
before selection probably doesn't give us much performance improvement
as we'd still have to parse the whole text field in most cases.

-Joey

[1] https://stedolan.github.io/jq/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Nested YAML configs for new features

Posted by Joseph Lynch <jo...@gmail.com>.

On Mon, Nov 29, 2021 at 11:51 AM benedict@apache.org
<be...@apache.org> wrote:
>
> Maybe we can make our query language more expressive 😊
>
> We might anyway want to introduce e.g. a LIKE filtering option to find/discover flattened config parameters?

This sounds more complicated than just having the settings virtual
table return text (dot encoded) -> text (json) and probably not even
that much more useful. A full table scan on the settings table could
return all top level keys (strings before the first dot) and if we
just return a valid json string then users can bring their own
querying capabilities via jq [1], or one line of code in almost any
programming language (especially python, perl, etc ...).

Alternatively if we want to modify the grammar it seems supporting
structured data querying on text fields would maybe be more preferable
to LIKE since you could get what you want without a grammar change and
if we could generalize to any text column it would be amazingly useful
elsewhere to users. For example, we could emulate jq's query syntax in
the select which is, imo, best-in-class for quickly querying into
nearest structures. Assuming a key (text) -> value (json) schema:

'a' -> "{'b': [{'c': {'d': 4}}]}",

SELECT json(value).b.0.c.d FROM settings WHERE key = 'a';

To have exactly jq syntax (but harder to parse) it would be:

SELECT json(value).b[0].c.d FROM settings WHERE key = 'a';

Since we're not indexing the structured data in any way, filtering
before selection probably doesn't give us much performance improvement
as we'd still have to parse the whole text field in most cases.

-Joey

[1] https://stedolan.github.io/jq/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Nested YAML configs for new features

Posted by Benjamin Lerer <b....@gmail.com>.

>
> We might anyway want to introduce e.g. a LIKE filtering option to
> find/discover flattened config parameters?


+100

Le lun. 29 nov. 2021 à 17:51, benedict@apache.org <be...@apache.org> a
écrit :

> Maybe we can make our query language more expressive 😊
>
> We might anyway want to introduce e.g. a LIKE filtering option to
> find/discover flattened config parameters?
>
> From: Benjamin Lerer <b....@gmail.com>
> Date: Monday, 29 November 2021 at 16:41
> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> Subject: Re: [DISCUSS] Nested YAML configs for new features
> >
> > I don’t think it’s necessarily a requirement that we use the flattened
> > version in vtables. At the very least we can make use of sets, lists,
> etc.
> > But we can probably also use UDTs if this improves clarity.
>
>
> In my opinion part of the issue is on the query side. How do we select a
> nested set or a specific set easily? UDTs are not great for this type of
> queries. For collection we can use CONTAINS and element or range selection
> but insertion might be the problem.
>
> Le lun. 29 nov. 2021 à 17:23, Bowen Song <bo...@bso.ng.invalid> a écrit :
>
> > In ElasticSearch, the default is a flattened format with almost all
> > lines commented out. See
> >
> >
> https://github.com/elastic/elasticsearch/blob/master/distribution/src/config/elasticsearch.yml
> >
> > I guess they chose to do that because user can uncomment individual
> > lines to make changes. In a structured config file, the user will have
> > to uncomment all lines containing the parent keys to get it work. For
> > example, if someone wants to set the config keyABB to a non-default
> > value, they will have to correctly uncomment 3 lines: keyA, keyAB and
> > keyABB, which can be annoying and could easily maker a mistake. If any
> > of the first two keys is not uncommented, the YAML file will still be
> > valid but the config like keyX.keyAB.keyABB might just get silently
> > ignored by the database.
> >
> >     keyX:
> >        keyY:
> >          keyZ: value
> >     # keyA:
> >     #   keyAA:
> >     #     key AAA: value
> >     #   keyAB:
> >     #     keyABA: value
> >     #     keyABB: value
> >
> > On 29/11/2021 15:54, Benjamin Lerer wrote:
> > > I do not think that supporting both options is an issue. The settings
> > > virtual table would have to use the flattened version.
> > > If we support both formats, the question would be: what should be the
> one
> > > used by default in the configuration file?
> > >
> > > Le ven. 26 nov. 2021 à 15:40,benedict@apache.org  <benedict@apache.org
> >
> > a
> > > écrit :
> > >
> > >> This is the approach I favour for config files also. We had a much
> less
> > >> engaged discussion on this topic only a few months ago, so glad to see
> > more
> > >> people getting involved now.
> > >>
> > >> I would however personally prefer to see the configuration file slowly
> > >> deprecated (if perhaps never retired), in favour of virtual tables, so
> > that
> > >> operators may easily set configurations for the entire cluster.
> Ideally
> > it
> > >> would be possible to specify configuration per cluster, per DC and per
> > >> node, with the most specific configuration applying I would like to
> see
> > a
> > >> similar hierarchy for Keyspace, Table and Per-Query options. Ideally
> > only
> > >> the barest minimum number of options would be necessary to supply in a
> > >> config file, and only on first launch – seed nodes, for instance.
> > >>
> > >> So whatever design we employ here, we should IMO be aiming for it to
> be
> > >> compatible with a CQL representation also.
> > >>
> > >>
> > >> From: Bowen Song<bo...@bso.ng.INVALID>
> > >> Date: Wednesday, 24 November 2021 at 18:15
> > >> To:dev@cassandra.apache.org  <de...@cassandra.apache.org>
> > >> Subject: Re: [DISCUSS] Nested YAML configs for new features
> > >> Since you mentioned ElasticSearch, I'm actually pretty happy with
> their
> > >> config file syntax. It allows the user to completely flatten out the
> > >> entire config file. To give people who isn't familiar with
> ElasticSearch
> > >> an idea, here is a config file we use:
> > >>
> > >>      cluster.name: foobar
> > >>
> > >>      node.remote_cluster_client: false
> > >>      node.name: "foo.example.com"
> > >>      node.master: true
> > >>      node.data: true
> > >>      node.ingest: true
> > >>      node.ml: false
> > >>
> > >>      xpack.ml.enabled: false
> > >>      xpack.security.enabled: false
> > >>      xpack.security.audit.enabled: false
> > >>      xpack.watcher.enabled: false
> > >>
> > >>      action.auto_create_index: "+.,-*"
> > >>
> > >>      network.host: _global_
> > >>
> > >>      discovery.zen.hosts_provider: file
> > >>      discovery.zen.minimum_master_nodes: 2
> > >>
> > >>      http.publish_host: "foo.example.com"
> > >>      http.publish_port: 443
> > >>      http.bind_host: 127.0.0.1
> > >>
> > >>      transport.publish_host: "bar.example.com"
> > >>      transport.bind_host: 0.0.0.0
> > >>
> > >>      indices.fielddata.cache.size: 1GB
> > >>      indices.breaker.total.use_real_memory: false
> > >>
> > >>      path.logs: /var/log/elasticsearch
> > >>      path.data: /var/lib/elasticsearch/data
> > >>
> > >> As you can see we can use the flat (grep-able) syntax for everything.
> > >> This is also human readable because we can group options together by
> > >> inserting empty lines between them.
> > >>
> > >> The equivalent of the above in a structured syntax will be:
> > >>
> > >>      cluster:
> > >>           name: foobar
> > >>
> > >>      node:
> > >>           remote_cluster_client: false
> > >>           name: "foo.example.com"
> > >>           master: true
> > >>           data: true
> > >>           ingest: true
> > >>           ml: false
> > >>
> > >>      xpack:
> > >>           ml:
> > >>               enabled: false
> > >>           security:
> > >>               enabled: false
> > >>               audit:
> > >>                   enabled: false
> > >>           watcher:
> > >>               enabled: false
> > >>
> > >>      action:
> > >>           auto_create_index: "+.,-*"
> > >>
> > >>      network:
> > >>           host: _global_
> > >>
> > >>      discovery:
> > >>           zen:
> > >>               hosts_provider: file
> > >>               minimum_master_nodes: 2
> > >>
> > >>      http:
> > >>           publish_host: "foo.example.com"
> > >>           publish_port: 443
> > >>           bind_host: 127.0.0.1
> > >>
> > >>      transport:
> > >>           publish_host: "bar.example.com"
> > >>           bind_host: 0.0.0.0
> > >>
> > >>      indices:
> > >>           fielddata:
> > >>               cache:
> > >>                   size: 1GB
> > >>      indices:
> > >>           breaker:
> > >>               total:
> > >>                   use_real_memory: false
> > >>
> > >>      path:
> > >>           logs: /var/log/elasticsearch
> > >>           data: /var/lib/elasticsearch/data
> > >>
> > >> This may be easier to read for some people, but it is a total
> nightmare
> > >> for "grep" - so many keys have identical names, such as "enabled".
> > >>
> > >> Also, for the virtual tables, it would be a lot easier to represent
> > >> individual values in a virtual table when the config is flat and keys
> > >> are unique. The virtual tables would need to either support the
> encoding
> > >> and decoding of the structured config into a flat structure, or use
> JSON
> > >> encoded string value. The use of JSON would make querying individual
> > >> value much harder.
> > >>
> > >> On 22/11/2021 16:16, Joseph Lynch wrote:
> > >>> Isn't one of the primary reasons to have a YAML configuration instead
> > >>> of a properties file is to allow typed and structured (implies
> nested)
> > >>> configuration? I think it makes a lot of sense to group related
> > >>> configuration options (e.g. a feature) into a typed class when we're
> > >>> talking about more than one or two related options.
> > >>>
> > >>> It's pretty standard elsewhere in the JVM ecosystem to encode YAMLs
> to
> > >>> period encoded key->value pairs when required (usually when providing
> > >>> a property or override layer), Spring and Elasticsearch yamls both
> > >>> come to mind. It seems pretty reasonable to support dot encoding and
> > >>> decoding, for example {"a": {"b": 12}} -> '"a.b": 12'.
> > >>>
> > >>> Regarding quickly telling what configuration a node is running I
> think
> > >>> we should lean on virtual tables for "what is the current
> > >>> configuration" now that we have them, as others have said the written
> > >>> cassandra.yaml is not necessarily the current configuration ... and
> > >>> also grep -C or -A exist for this reason.
> > >>>
> > >>> -Joey
> > >>>
> > >>> On Mon, Nov 22, 2021 at 4:14 AM Benjamin Lerer<bl...@apache.org>
> > >> wrote:
> > >>>> I do not have a strong opinion for one or the other but wanted to
> > raise
> > >> the
> > >>>> issue I see with the "Settings" virtual table.
> > >>>>
> > >>>> Currently the "Settings" virtual table converts nested options into
> > flat
> > >>>> options using a "_" separator. For those options it allows a user to
> > >> query
> > >>>> the all set of options through some hack.
> > >>>> If we decide to move to more nesting (more than one level), it seems
> > to
> > >> me
> > >>>> that we need to change the way this table is behaving and how we can
> > >> query
> > >>>> its data.
> > >>>>
> > >>>> We would need to start using "." as a nesting separator to ensure
> that
> > >>>> things are consistent between the configuration and the table and
> add
> > >>>> support for LIKE restrictions for filtering queries to allow
> operators
> > >> to
> > >>>> be able to select the precise set of settings that the operator is
> > >> looking
> > >>>> for.
> > >>>>
> > >>>> Doing so is not really complicated in itself but might impact some
> > >> users.
> > >>>> Le ven. 19 nov. 2021 à 22:39, David Capwell<dcapwell@apple.com
> > .invalid>
> > >> a
> > >>>> écrit :
> > >>>>
> > >>>>>> it is really handy to grep
> > >>>>>> cassandra.yaml on some config key and you know the value
> instantly.
> > >>>>> You can still do that
> > >>>>>
> > >>>>> $ grep -A2 coordinator_read_size conf/cassandra.yaml
> > >>>>> #     coordinator_read_size:
> > >>>>> #         warn_threshold_kb: 0
> > >>>>> #         abort_threshold_kb: 0
> > >>>>>
> > >>>>> I was also arguing we should support nested and flat, so if your
> > infra
> > >>>>> works better with flat then you could use
> > >>>>>
> > >>>>> track_warnings.coordinator_read_size.warn_threshold_kb: 0
> > >>>>> track_warnings.coordinator_read_size.abort_threshold_kb: 0
> > >>>>>
> > >>>>>> On Nov 19, 2021, at 1:34 PM, David Capwell<dc...@apple.com>
> > >> wrote:
> > >>>>>>> With the flat structure it turns into properties file - would it
> be
> > >>>>>>> possible to support both formats - nested yaml and flat
> properties?
> > >>>>>> For majority of our configs yes, but there are a subset where flat
> > >>>>> properties is annoying
> > >>>>>> hinted_handoff_disabled_datacenters - set type, so you could do
> > >>>>> hinted_handoff_disabled_datacenters=“a,b,c,d” but we would need to
> > deal
> > >>>>> with separators as the format doesn’t support
> > >>>>>> seed_provider.parameters - this is a map type… so would need to do
> > >>>>> something like seed_provider.parameters=“{\”a\”: \a\”}” …. Maybe we
> > >> special
> > >>>>> case maps as dynamic fields?  Then seed_provider.parameters.a=a?
> We
> > >> have
> > >>>>> ParameterizedClass all over the code
> > >>>>>> So, as long as we define how to deal with java collections; we
> could
> > >> in
> > >>>>> theory support properties files (not arguing for that in this
> thread)
> > >> as
> > >>>>> well as system properties.
> > >>>>>>> On Nov 19, 2021, at 1:22 PM, Jacek Lewandowski <
> > >>>>> lewandowski.jacek@gmail.com> wrote:
> > >>>>>>> With the flat structure it turns into properties file - would it
> be
> > >>>>>>> possible to support both formats - nested yaml and flat
> properties?
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> - - -- --- ----- -------- -------------
> > >>>>>>> Jacek Lewandowski
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Fri, Nov 19, 2021 at 10:08 PM Caleb Rackliffe <
> > >>>>> calebrackliffe@gmail.com>
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> If it's nested, "track_warnings" would still work if you're
> > grepping
> > >>>>> around
> > >>>>>>>> vim or less.
> > >>>>>>>>
> > >>>>>>>> I'd have to concede the point about grep output, although there
> > are
> > >>>>> tools
> > >>>>>>>> likehttps://github.com/kislyuk/yq  that could probably be bent
> to
> > >> do
> > >>>>> what
> > >>>>>>>> you want.
> > >>>>>>>>
> > >>>>>>>> On Fri, Nov 19, 2021 at 1:08 PM Stefan Miklosovic <
> > >>>>>>>> stefan.miklosovic@instaclustr.com> wrote:
> > >>>>>>>>
> > >>>>>>>>> Hi David,
> > >>>>>>>>>
> > >>>>>>>>> while I do not oppose nested structure, it is really handy to
> > grep
> > >>>>>>>>> cassandra.yaml on some config key and you know the value
> > instantly.
> > >>>>>>>>> This is not possible when it is nested (easily & fastly) as it
> is
> > >> on
> > >>>>>>>>> two lines. Or maybe my grepping is just not advanced enough to
> > >> cover
> > >>>>>>>>> this case? If it is flat, I can just grep "track_warnings" and
> I
> > >> have
> > >>>>>>>>> them all.
> > >>>>>>>>>
> > >>>>>>>>> Can you elaborate on your last bullet point? Parsing layer ...
> > >> What do
> > >>>>>>>>> you mean specifically?
> > >>>>>>>>>
> > >>>>>>>>> Thanks
> > >>>>>>>>>
> > >>>>>>>>> On Fri, 19 Nov 2021 at 19:36, David Capwell<dcapwell@gmail.com
> >
> > >>>>> wrote:
> > >>>>>>>>>> This has been brought up in a few tickets, so pushing to the
> dev
> > >>>>> list.
> > >>>>>>>>>> CASSANDRA-15234 - Standardise config and JVM parameters
> > >>>>>>>>>> CASSANDRA-16896 - hard/soft limits for queries
> > >>>>>>>>>> CASSANDRA-17147 - Guardrails prototype
> > >>>>>>>>>>
> > >>>>>>>>>> In short, do we as a project wish to move "new features" into
> > >> nested
> > >>>>>>>>>> YAML when the feature has "enough" to justify the nesting?  I
> > >> would
> > >>>>>>>>>> really like to focus this discussion on new features rather
> than
> > >>>>>>>>>> retroactively grouping (leaving that to CASSANDRA-15234), as
> > >> there is
> > >>>>>>>>>> already a place to talk about that.
> > >>>>>>>>>>
> > >>>>>>>>>> To get things started, let's start with the track-warning
> > feature
> > >>>>>>>>>> (hard/soft limits for queries), currently the configs look as
> > >> follows
> > >>>>>>>>>> (assuming 15234)
> > >>>>>>>>>>
> > >>>>>>>>>> track_warnings:
> > >>>>>>>>>>     enabled: true
> > >>>>>>>>>>     coordinator_read_size:
> > >>>>>>>>>>         warn_threshold: 10kb
> > >>>>>>>>>>         abort_threshold: 1mb
> > >>>>>>>>>>     local_read_size:
> > >>>>>>>>>>         warn_threshold: 10kb
> > >>>>>>>>>>         abort_threshold: 1mb
> > >>>>>>>>>>     row_index_size:
> > >>>>>>>>>>         warn_threshold: 100mb
> > >>>>>>>>>>         abort_threshold: 1gb
> > >>>>>>>>>>
> > >>>>>>>>>> or should this be "flat"
> > >>>>>>>>>>
> > >>>>>>>>>> track_warnings_enabled: true
> > >>>>>>>>>> track_warnings_coordinator_read_size_warn_threshold: 10kb
> > >>>>>>>>>> track_warnings_coordinator_read_size_abort_threshold: 1mb
> > >>>>>>>>>> track_warnings_local_read_size_warn_threshold: 10kb
> > >>>>>>>>>> track_warnings_local_read_size_abort_threshold: 1mb
> > >>>>>>>>>> track_warnings_row_index_size_warn_threshold: 100mb
> > >>>>>>>>>> track_warnings_row_index_size_abort_threshold: 1gb
> > >>>>>>>>>>
> > >>>>>>>>>> For me I prefer nested for a few reasons
> > >>>>>>>>>> * easier to enforce consistency as the configs can use shared
> > >> types;
> > >>>>>>>>>> in the track warnings patch I had mismatches cross configs
> (warn
> > >> vs
> > >>>>>>>>>> warns, fail vs abort, etc.) before going nested, now
> everything
> > >>>>> reuses
> > >>>>>>>>>> the same types
> > >>>>>>>>>> * even though it is longer, things can be more clear how they
> > are
> > >>>>>>>> related
> > >>>>>>>>>> * parsing layer can add support for mixed or purely flat
> > >> depending on
> > >>>>>>>>>> user preference (example:
> > >>>>>>>>>> track_warnings.row_index_size.abort_threshold, using the '.'
> > >> notation
> > >>>>>>>>>> to represent nested structures)
> > >>>>>>>>>>
> > >>>>>>>>>> Thoughts?
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >> ---------------------------------------------------------------------
> > >>>>>>>>>> To unsubscribe,e-mail:dev-unsubscribe@cassandra.apache.org
> > >>>>>>>>>> For additional commands,e-mail:dev-help@cassandra.apache.org
> > >>>>>>>>>>
> > >> ---------------------------------------------------------------------
> > >>>>>>>>> To unsubscribe,e-mail:dev-unsubscribe@cassandra.apache.org
> > >>>>>>>>> For additional commands,e-mail:dev-help@cassandra.apache.org
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>
> ---------------------------------------------------------------------
> > >>>>> To unsubscribe,e-mail:dev-unsubscribe@cassandra.apache.org
> > >>>>> For additional commands,e-mail:dev-help@cassandra.apache.org
> > >>>>>
> > >>>>>
> > >>> ---------------------------------------------------------------------
> > >>> To unsubscribe,e-mail:dev-unsubscribe@cassandra.apache.org
> > >>> For additional commands,e-mail:dev-help@cassandra.apache.org
> > >>>
>

Re: [DISCUSS] Nested YAML configs for new features

Posted by "benedict@apache.org" <be...@apache.org>.

Maybe we can make our query language more expressive 😊

We might anyway want to introduce e.g. a LIKE filtering option to find/discover flattened config parameters?

From: Benjamin Lerer <b....@gmail.com>
Date: Monday, 29 November 2021 at 16:41
To: dev@cassandra.apache.org <de...@cassandra.apache.org>
Subject: Re: [DISCUSS] Nested YAML configs for new features
>
> I don’t think it’s necessarily a requirement that we use the flattened
> version in vtables. At the very least we can make use of sets, lists, etc.
> But we can probably also use UDTs if this improves clarity.


In my opinion part of the issue is on the query side. How do we select a
nested set or a specific set easily? UDTs are not great for this type of
queries. For collection we can use CONTAINS and element or range selection
but insertion might be the problem.

Le lun. 29 nov. 2021 à 17:23, Bowen Song <bo...@bso.ng.invalid> a écrit :

> In ElasticSearch, the default is a flattened format with almost all
> lines commented out. See
>
> https://github.com/elastic/elasticsearch/blob/master/distribution/src/config/elasticsearch.yml
>
> I guess they chose to do that because user can uncomment individual
> lines to make changes. In a structured config file, the user will have
> to uncomment all lines containing the parent keys to get it work. For
> example, if someone wants to set the config keyABB to a non-default
> value, they will have to correctly uncomment 3 lines: keyA, keyAB and
> keyABB, which can be annoying and could easily maker a mistake. If any
> of the first two keys is not uncommented, the YAML file will still be
> valid but the config like keyX.keyAB.keyABB might just get silently
> ignored by the database.
>
>     keyX:
>        keyY:
>          keyZ: value
>     # keyA:
>     #   keyAA:
>     #     key AAA: value
>     #   keyAB:
>     #     keyABA: value
>     #     keyABB: value
>
> On 29/11/2021 15:54, Benjamin Lerer wrote:
> > I do not think that supporting both options is an issue. The settings
> > virtual table would have to use the flattened version.
> > If we support both formats, the question would be: what should be the one
> > used by default in the configuration file?
> >
> > Le ven. 26 nov. 2021 à 15:40,benedict@apache.org  <be...@apache.org>
> a
> > écrit :
> >
> >> This is the approach I favour for config files also. We had a much less
> >> engaged discussion on this topic only a few months ago, so glad to see
> more
> >> people getting involved now.
> >>
> >> I would however personally prefer to see the configuration file slowly
> >> deprecated (if perhaps never retired), in favour of virtual tables, so
> that
> >> operators may easily set configurations for the entire cluster. Ideally
> it
> >> would be possible to specify configuration per cluster, per DC and per
> >> node, with the most specific configuration applying I would like to see
> a
> >> similar hierarchy for Keyspace, Table and Per-Query options. Ideally
> only
> >> the barest minimum number of options would be necessary to supply in a
> >> config file, and only on first launch – seed nodes, for instance.
> >>
> >> So whatever design we employ here, we should IMO be aiming for it to be
> >> compatible with a CQL representation also.
> >>
> >>
> >> From: Bowen Song<bo...@bso.ng.INVALID>
> >> Date: Wednesday, 24 November 2021 at 18:15
> >> To:dev@cassandra.apache.org  <de...@cassandra.apache.org>
> >> Subject: Re: [DISCUSS] Nested YAML configs for new features
> >> Since you mentioned ElasticSearch, I'm actually pretty happy with their
> >> config file syntax. It allows the user to completely flatten out the
> >> entire config file. To give people who isn't familiar with ElasticSearch
> >> an idea, here is a config file we use:
> >>
> >>      cluster.name: foobar
> >>
> >>      node.remote_cluster_client: false
> >>      node.name: "foo.example.com"
> >>      node.master: true
> >>      node.data: true
> >>      node.ingest: true
> >>      node.ml: false
> >>
> >>      xpack.ml.enabled: false
> >>      xpack.security.enabled: false
> >>      xpack.security.audit.enabled: false
> >>      xpack.watcher.enabled: false
> >>
> >>      action.auto_create_index: "+.,-*"
> >>
> >>      network.host: _global_
> >>
> >>      discovery.zen.hosts_provider: file
> >>      discovery.zen.minimum_master_nodes: 2
> >>
> >>      http.publish_host: "foo.example.com"
> >>      http.publish_port: 443
> >>      http.bind_host: 127.0.0.1
> >>
> >>      transport.publish_host: "bar.example.com"
> >>      transport.bind_host: 0.0.0.0
> >>
> >>      indices.fielddata.cache.size: 1GB
> >>      indices.breaker.total.use_real_memory: false
> >>
> >>      path.logs: /var/log/elasticsearch
> >>      path.data: /var/lib/elasticsearch/data
> >>
> >> As you can see we can use the flat (grep-able) syntax for everything.
> >> This is also human readable because we can group options together by
> >> inserting empty lines between them.
> >>
> >> The equivalent of the above in a structured syntax will be:
> >>
> >>      cluster:
> >>           name: foobar
> >>
> >>      node:
> >>           remote_cluster_client: false
> >>           name: "foo.example.com"
> >>           master: true
> >>           data: true
> >>           ingest: true
> >>           ml: false
> >>
> >>      xpack:
> >>           ml:
> >>               enabled: false
> >>           security:
> >>               enabled: false
> >>               audit:
> >>                   enabled: false
> >>           watcher:
> >>               enabled: false
> >>
> >>      action:
> >>           auto_create_index: "+.,-*"
> >>
> >>      network:
> >>           host: _global_
> >>
> >>      discovery:
> >>           zen:
> >>               hosts_provider: file
> >>               minimum_master_nodes: 2
> >>
> >>      http:
> >>           publish_host: "foo.example.com"
> >>           publish_port: 443
> >>           bind_host: 127.0.0.1
> >>
> >>      transport:
> >>           publish_host: "bar.example.com"
> >>           bind_host: 0.0.0.0
> >>
> >>      indices:
> >>           fielddata:
> >>               cache:
> >>                   size: 1GB
> >>      indices:
> >>           breaker:
> >>               total:
> >>                   use_real_memory: false
> >>
> >>      path:
> >>           logs: /var/log/elasticsearch
> >>           data: /var/lib/elasticsearch/data
> >>
> >> This may be easier to read for some people, but it is a total nightmare
> >> for "grep" - so many keys have identical names, such as "enabled".
> >>
> >> Also, for the virtual tables, it would be a lot easier to represent
> >> individual values in a virtual table when the config is flat and keys
> >> are unique. The virtual tables would need to either support the encoding
> >> and decoding of the structured config into a flat structure, or use JSON
> >> encoded string value. The use of JSON would make querying individual
> >> value much harder.
> >>
> >> On 22/11/2021 16:16, Joseph Lynch wrote:
> >>> Isn't one of the primary reasons to have a YAML configuration instead
> >>> of a properties file is to allow typed and structured (implies nested)
> >>> configuration? I think it makes a lot of sense to group related
> >>> configuration options (e.g. a feature) into a typed class when we're
> >>> talking about more than one or two related options.
> >>>
> >>> It's pretty standard elsewhere in the JVM ecosystem to encode YAMLs to
> >>> period encoded key->value pairs when required (usually when providing
> >>> a property or override layer), Spring and Elasticsearch yamls both
> >>> come to mind. It seems pretty reasonable to support dot encoding and
> >>> decoding, for example {"a": {"b": 12}} -> '"a.b": 12'.
> >>>
> >>> Regarding quickly telling what configuration a node is running I think
> >>> we should lean on virtual tables for "what is the current
> >>> configuration" now that we have them, as others have said the written
> >>> cassandra.yaml is not necessarily the current configuration ... and
> >>> also grep -C or -A exist for this reason.
> >>>
> >>> -Joey
> >>>
> >>> On Mon, Nov 22, 2021 at 4:14 AM Benjamin Lerer<bl...@apache.org>
> >> wrote:
> >>>> I do not have a strong opinion for one or the other but wanted to
> raise
> >> the
> >>>> issue I see with the "Settings" virtual table.
> >>>>
> >>>> Currently the "Settings" virtual table converts nested options into
> flat
> >>>> options using a "_" separator. For those options it allows a user to
> >> query
> >>>> the all set of options through some hack.
> >>>> If we decide to move to more nesting (more than one level), it seems
> to
> >> me
> >>>> that we need to change the way this table is behaving and how we can
> >> query
> >>>> its data.
> >>>>
> >>>> We would need to start using "." as a nesting separator to ensure that
> >>>> things are consistent between the configuration and the table and add
> >>>> support for LIKE restrictions for filtering queries to allow operators
> >> to
> >>>> be able to select the precise set of settings that the operator is
> >> looking
> >>>> for.
> >>>>
> >>>> Doing so is not really complicated in itself but might impact some
> >> users.
> >>>> Le ven. 19 nov. 2021 à 22:39, David Capwell<dcapwell@apple.com
> .invalid>
> >> a
> >>>> écrit :
> >>>>
> >>>>>> it is really handy to grep
> >>>>>> cassandra.yaml on some config key and you know the value instantly.
> >>>>> You can still do that
> >>>>>
> >>>>> $ grep -A2 coordinator_read_size conf/cassandra.yaml
> >>>>> #     coordinator_read_size:
> >>>>> #         warn_threshold_kb: 0
> >>>>> #         abort_threshold_kb: 0
> >>>>>
> >>>>> I was also arguing we should support nested and flat, so if your
> infra
> >>>>> works better with flat then you could use
> >>>>>
> >>>>> track_warnings.coordinator_read_size.warn_threshold_kb: 0
> >>>>> track_warnings.coordinator_read_size.abort_threshold_kb: 0
> >>>>>
> >>>>>> On Nov 19, 2021, at 1:34 PM, David Capwell<dc...@apple.com>
> >> wrote:
> >>>>>>> With the flat structure it turns into properties file - would it be
> >>>>>>> possible to support both formats - nested yaml and flat properties?
> >>>>>> For majority of our configs yes, but there are a subset where flat
> >>>>> properties is annoying
> >>>>>> hinted_handoff_disabled_datacenters - set type, so you could do
> >>>>> hinted_handoff_disabled_datacenters=“a,b,c,d” but we would need to
> deal
> >>>>> with separators as the format doesn’t support
> >>>>>> seed_provider.parameters - this is a map type… so would need to do
> >>>>> something like seed_provider.parameters=“{\”a\”: \a\”}” …. Maybe we
> >> special
> >>>>> case maps as dynamic fields?  Then seed_provider.parameters.a=a?  We
> >> have
> >>>>> ParameterizedClass all over the code
> >>>>>> So, as long as we define how to deal with java collections; we could
> >> in
> >>>>> theory support properties files (not arguing for that in this thread)
> >> as
> >>>>> well as system properties.
> >>>>>>> On Nov 19, 2021, at 1:22 PM, Jacek Lewandowski <
> >>>>> lewandowski.jacek@gmail.com> wrote:
> >>>>>>> With the flat structure it turns into properties file - would it be
> >>>>>>> possible to support both formats - nested yaml and flat properties?
> >>>>>>>
> >>>>>>>
> >>>>>>> - - -- --- ----- -------- -------------
> >>>>>>> Jacek Lewandowski
> >>>>>>>
> >>>>>>>
> >>>>>>> On Fri, Nov 19, 2021 at 10:08 PM Caleb Rackliffe <
> >>>>> calebrackliffe@gmail.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> If it's nested, "track_warnings" would still work if you're
> grepping
> >>>>> around
> >>>>>>>> vim or less.
> >>>>>>>>
> >>>>>>>> I'd have to concede the point about grep output, although there
> are
> >>>>> tools
> >>>>>>>> likehttps://github.com/kislyuk/yq  that could probably be bent to
> >> do
> >>>>> what
> >>>>>>>> you want.
> >>>>>>>>
> >>>>>>>> On Fri, Nov 19, 2021 at 1:08 PM Stefan Miklosovic <
> >>>>>>>> stefan.miklosovic@instaclustr.com> wrote:
> >>>>>>>>
> >>>>>>>>> Hi David,
> >>>>>>>>>
> >>>>>>>>> while I do not oppose nested structure, it is really handy to
> grep
> >>>>>>>>> cassandra.yaml on some config key and you know the value
> instantly.
> >>>>>>>>> This is not possible when it is nested (easily & fastly) as it is
> >> on
> >>>>>>>>> two lines. Or maybe my grepping is just not advanced enough to
> >> cover
> >>>>>>>>> this case? If it is flat, I can just grep "track_warnings" and I
> >> have
> >>>>>>>>> them all.
> >>>>>>>>>
> >>>>>>>>> Can you elaborate on your last bullet point? Parsing layer ...
> >> What do
> >>>>>>>>> you mean specifically?
> >>>>>>>>>
> >>>>>>>>> Thanks
> >>>>>>>>>
> >>>>>>>>> On Fri, 19 Nov 2021 at 19:36, David Capwell<dc...@gmail.com>
> >>>>> wrote:
> >>>>>>>>>> This has been brought up in a few tickets, so pushing to the dev
> >>>>> list.
> >>>>>>>>>> CASSANDRA-15234 - Standardise config and JVM parameters
> >>>>>>>>>> CASSANDRA-16896 - hard/soft limits for queries
> >>>>>>>>>> CASSANDRA-17147 - Guardrails prototype
> >>>>>>>>>>
> >>>>>>>>>> In short, do we as a project wish to move "new features" into
> >> nested
> >>>>>>>>>> YAML when the feature has "enough" to justify the nesting?  I
> >> would
> >>>>>>>>>> really like to focus this discussion on new features rather than
> >>>>>>>>>> retroactively grouping (leaving that to CASSANDRA-15234), as
> >> there is
> >>>>>>>>>> already a place to talk about that.
> >>>>>>>>>>
> >>>>>>>>>> To get things started, let's start with the track-warning
> feature
> >>>>>>>>>> (hard/soft limits for queries), currently the configs look as
> >> follows
> >>>>>>>>>> (assuming 15234)
> >>>>>>>>>>
> >>>>>>>>>> track_warnings:
> >>>>>>>>>>     enabled: true
> >>>>>>>>>>     coordinator_read_size:
> >>>>>>>>>>         warn_threshold: 10kb
> >>>>>>>>>>         abort_threshold: 1mb
> >>>>>>>>>>     local_read_size:
> >>>>>>>>>>         warn_threshold: 10kb
> >>>>>>>>>>         abort_threshold: 1mb
> >>>>>>>>>>     row_index_size:
> >>>>>>>>>>         warn_threshold: 100mb
> >>>>>>>>>>         abort_threshold: 1gb
> >>>>>>>>>>
> >>>>>>>>>> or should this be "flat"
> >>>>>>>>>>
> >>>>>>>>>> track_warnings_enabled: true
> >>>>>>>>>> track_warnings_coordinator_read_size_warn_threshold: 10kb
> >>>>>>>>>> track_warnings_coordinator_read_size_abort_threshold: 1mb
> >>>>>>>>>> track_warnings_local_read_size_warn_threshold: 10kb
> >>>>>>>>>> track_warnings_local_read_size_abort_threshold: 1mb
> >>>>>>>>>> track_warnings_row_index_size_warn_threshold: 100mb
> >>>>>>>>>> track_warnings_row_index_size_abort_threshold: 1gb
> >>>>>>>>>>
> >>>>>>>>>> For me I prefer nested for a few reasons
> >>>>>>>>>> * easier to enforce consistency as the configs can use shared
> >> types;
> >>>>>>>>>> in the track warnings patch I had mismatches cross configs (warn
> >> vs
> >>>>>>>>>> warns, fail vs abort, etc.) before going nested, now everything
> >>>>> reuses
> >>>>>>>>>> the same types
> >>>>>>>>>> * even though it is longer, things can be more clear how they
> are
> >>>>>>>> related
> >>>>>>>>>> * parsing layer can add support for mixed or purely flat
> >> depending on
> >>>>>>>>>> user preference (example:
> >>>>>>>>>> track_warnings.row_index_size.abort_threshold, using the '.'
> >> notation
> >>>>>>>>>> to represent nested structures)
> >>>>>>>>>>
> >>>>>>>>>> Thoughts?
> >>>>>>>>>>
> >>>>>>>>>>
> >> ---------------------------------------------------------------------
> >>>>>>>>>> To unsubscribe,e-mail:dev-unsubscribe@cassandra.apache.org
> >>>>>>>>>> For additional commands,e-mail:dev-help@cassandra.apache.org
> >>>>>>>>>>
> >> ---------------------------------------------------------------------
> >>>>>>>>> To unsubscribe,e-mail:dev-unsubscribe@cassandra.apache.org
> >>>>>>>>> For additional commands,e-mail:dev-help@cassandra.apache.org
> >>>>>>>>>
> >>>>>>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe,e-mail:dev-unsubscribe@cassandra.apache.org
> >>>>> For additional commands,e-mail:dev-help@cassandra.apache.org
> >>>>>
> >>>>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe,e-mail:dev-unsubscribe@cassandra.apache.org
> >>> For additional commands,e-mail:dev-help@cassandra.apache.org
> >>>

Re: [DISCUSS] Nested YAML configs for new features

Posted by Benjamin Lerer <b....@gmail.com>.

>
> I don’t think it’s necessarily a requirement that we use the flattened
> version in vtables. At the very least we can make use of sets, lists, etc.
> But we can probably also use UDTs if this improves clarity.


In my opinion part of the issue is on the query side. How do we select a
nested set or a specific set easily? UDTs are not great for this type of
queries. For collection we can use CONTAINS and element or range selection
but insertion might be the problem.

Le lun. 29 nov. 2021 à 17:23, Bowen Song <bo...@bso.ng.invalid> a écrit :

> In ElasticSearch, the default is a flattened format with almost all
> lines commented out. See
>
> https://github.com/elastic/elasticsearch/blob/master/distribution/src/config/elasticsearch.yml
>
> I guess they chose to do that because user can uncomment individual
> lines to make changes. In a structured config file, the user will have
> to uncomment all lines containing the parent keys to get it work. For
> example, if someone wants to set the config keyABB to a non-default
> value, they will have to correctly uncomment 3 lines: keyA, keyAB and
> keyABB, which can be annoying and could easily maker a mistake. If any
> of the first two keys is not uncommented, the YAML file will still be
> valid but the config like keyX.keyAB.keyABB might just get silently
> ignored by the database.
>
>     keyX:
>        keyY:
>          keyZ: value
>     # keyA:
>     #   keyAA:
>     #     key AAA: value
>     #   keyAB:
>     #     keyABA: value
>     #     keyABB: value
>
> On 29/11/2021 15:54, Benjamin Lerer wrote:
> > I do not think that supporting both options is an issue. The settings
> > virtual table would have to use the flattened version.
> > If we support both formats, the question would be: what should be the one
> > used by default in the configuration file?
> >
> > Le ven. 26 nov. 2021 à 15:40,benedict@apache.org  <be...@apache.org>
> a
> > écrit :
> >
> >> This is the approach I favour for config files also. We had a much less
> >> engaged discussion on this topic only a few months ago, so glad to see
> more
> >> people getting involved now.
> >>
> >> I would however personally prefer to see the configuration file slowly
> >> deprecated (if perhaps never retired), in favour of virtual tables, so
> that
> >> operators may easily set configurations for the entire cluster. Ideally
> it
> >> would be possible to specify configuration per cluster, per DC and per
> >> node, with the most specific configuration applying I would like to see
> a
> >> similar hierarchy for Keyspace, Table and Per-Query options. Ideally
> only
> >> the barest minimum number of options would be necessary to supply in a
> >> config file, and only on first launch – seed nodes, for instance.
> >>
> >> So whatever design we employ here, we should IMO be aiming for it to be
> >> compatible with a CQL representation also.
> >>
> >>
> >> From: Bowen Song<bo...@bso.ng.INVALID>
> >> Date: Wednesday, 24 November 2021 at 18:15
> >> To:dev@cassandra.apache.org  <de...@cassandra.apache.org>
> >> Subject: Re: [DISCUSS] Nested YAML configs for new features
> >> Since you mentioned ElasticSearch, I'm actually pretty happy with their
> >> config file syntax. It allows the user to completely flatten out the
> >> entire config file. To give people who isn't familiar with ElasticSearch
> >> an idea, here is a config file we use:
> >>
> >>      cluster.name: foobar
> >>
> >>      node.remote_cluster_client: false
> >>      node.name: "foo.example.com"
> >>      node.master: true
> >>      node.data: true
> >>      node.ingest: true
> >>      node.ml: false
> >>
> >>      xpack.ml.enabled: false
> >>      xpack.security.enabled: false
> >>      xpack.security.audit.enabled: false
> >>      xpack.watcher.enabled: false
> >>
> >>      action.auto_create_index: "+.,-*"
> >>
> >>      network.host: _global_
> >>
> >>      discovery.zen.hosts_provider: file
> >>      discovery.zen.minimum_master_nodes: 2
> >>
> >>      http.publish_host: "foo.example.com"
> >>      http.publish_port: 443
> >>      http.bind_host: 127.0.0.1
> >>
> >>      transport.publish_host: "bar.example.com"
> >>      transport.bind_host: 0.0.0.0
> >>
> >>      indices.fielddata.cache.size: 1GB
> >>      indices.breaker.total.use_real_memory: false
> >>
> >>      path.logs: /var/log/elasticsearch
> >>      path.data: /var/lib/elasticsearch/data
> >>
> >> As you can see we can use the flat (grep-able) syntax for everything.
> >> This is also human readable because we can group options together by
> >> inserting empty lines between them.
> >>
> >> The equivalent of the above in a structured syntax will be:
> >>
> >>      cluster:
> >>           name: foobar
> >>
> >>      node:
> >>           remote_cluster_client: false
> >>           name: "foo.example.com"
> >>           master: true
> >>           data: true
> >>           ingest: true
> >>           ml: false
> >>
> >>      xpack:
> >>           ml:
> >>               enabled: false
> >>           security:
> >>               enabled: false
> >>               audit:
> >>                   enabled: false
> >>           watcher:
> >>               enabled: false
> >>
> >>      action:
> >>           auto_create_index: "+.,-*"
> >>
> >>      network:
> >>           host: _global_
> >>
> >>      discovery:
> >>           zen:
> >>               hosts_provider: file
> >>               minimum_master_nodes: 2
> >>
> >>      http:
> >>           publish_host: "foo.example.com"
> >>           publish_port: 443
> >>           bind_host: 127.0.0.1
> >>
> >>      transport:
> >>           publish_host: "bar.example.com"
> >>           bind_host: 0.0.0.0
> >>
> >>      indices:
> >>           fielddata:
> >>               cache:
> >>                   size: 1GB
> >>      indices:
> >>           breaker:
> >>               total:
> >>                   use_real_memory: false
> >>
> >>      path:
> >>           logs: /var/log/elasticsearch
> >>           data: /var/lib/elasticsearch/data
> >>
> >> This may be easier to read for some people, but it is a total nightmare
> >> for "grep" - so many keys have identical names, such as "enabled".
> >>
> >> Also, for the virtual tables, it would be a lot easier to represent
> >> individual values in a virtual table when the config is flat and keys
> >> are unique. The virtual tables would need to either support the encoding
> >> and decoding of the structured config into a flat structure, or use JSON
> >> encoded string value. The use of JSON would make querying individual
> >> value much harder.
> >>
> >> On 22/11/2021 16:16, Joseph Lynch wrote:
> >>> Isn't one of the primary reasons to have a YAML configuration instead
> >>> of a properties file is to allow typed and structured (implies nested)
> >>> configuration? I think it makes a lot of sense to group related
> >>> configuration options (e.g. a feature) into a typed class when we're
> >>> talking about more than one or two related options.
> >>>
> >>> It's pretty standard elsewhere in the JVM ecosystem to encode YAMLs to
> >>> period encoded key->value pairs when required (usually when providing
> >>> a property or override layer), Spring and Elasticsearch yamls both
> >>> come to mind. It seems pretty reasonable to support dot encoding and
> >>> decoding, for example {"a": {"b": 12}} -> '"a.b": 12'.
> >>>
> >>> Regarding quickly telling what configuration a node is running I think
> >>> we should lean on virtual tables for "what is the current
> >>> configuration" now that we have them, as others have said the written
> >>> cassandra.yaml is not necessarily the current configuration ... and
> >>> also grep -C or -A exist for this reason.
> >>>
> >>> -Joey
> >>>
> >>> On Mon, Nov 22, 2021 at 4:14 AM Benjamin Lerer<bl...@apache.org>
> >> wrote:
> >>>> I do not have a strong opinion for one or the other but wanted to
> raise
> >> the
> >>>> issue I see with the "Settings" virtual table.
> >>>>
> >>>> Currently the "Settings" virtual table converts nested options into
> flat
> >>>> options using a "_" separator. For those options it allows a user to
> >> query
> >>>> the all set of options through some hack.
> >>>> If we decide to move to more nesting (more than one level), it seems
> to
> >> me
> >>>> that we need to change the way this table is behaving and how we can
> >> query
> >>>> its data.
> >>>>
> >>>> We would need to start using "." as a nesting separator to ensure that
> >>>> things are consistent between the configuration and the table and add
> >>>> support for LIKE restrictions for filtering queries to allow operators
> >> to
> >>>> be able to select the precise set of settings that the operator is
> >> looking
> >>>> for.
> >>>>
> >>>> Doing so is not really complicated in itself but might impact some
> >> users.
> >>>> Le ven. 19 nov. 2021 à 22:39, David Capwell<dcapwell@apple.com
> .invalid>
> >> a
> >>>> écrit :
> >>>>
> >>>>>> it is really handy to grep
> >>>>>> cassandra.yaml on some config key and you know the value instantly.
> >>>>> You can still do that
> >>>>>
> >>>>> $ grep -A2 coordinator_read_size conf/cassandra.yaml
> >>>>> #     coordinator_read_size:
> >>>>> #         warn_threshold_kb: 0
> >>>>> #         abort_threshold_kb: 0
> >>>>>
> >>>>> I was also arguing we should support nested and flat, so if your
> infra
> >>>>> works better with flat then you could use
> >>>>>
> >>>>> track_warnings.coordinator_read_size.warn_threshold_kb: 0
> >>>>> track_warnings.coordinator_read_size.abort_threshold_kb: 0
> >>>>>
> >>>>>> On Nov 19, 2021, at 1:34 PM, David Capwell<dc...@apple.com>
> >> wrote:
> >>>>>>> With the flat structure it turns into properties file - would it be
> >>>>>>> possible to support both formats - nested yaml and flat properties?
> >>>>>> For majority of our configs yes, but there are a subset where flat
> >>>>> properties is annoying
> >>>>>> hinted_handoff_disabled_datacenters - set type, so you could do
> >>>>> hinted_handoff_disabled_datacenters=“a,b,c,d” but we would need to
> deal
> >>>>> with separators as the format doesn’t support
> >>>>>> seed_provider.parameters - this is a map type… so would need to do
> >>>>> something like seed_provider.parameters=“{\”a\”: \a\”}” …. Maybe we
> >> special
> >>>>> case maps as dynamic fields?  Then seed_provider.parameters.a=a?  We
> >> have
> >>>>> ParameterizedClass all over the code
> >>>>>> So, as long as we define how to deal with java collections; we could
> >> in
> >>>>> theory support properties files (not arguing for that in this thread)
> >> as
> >>>>> well as system properties.
> >>>>>>> On Nov 19, 2021, at 1:22 PM, Jacek Lewandowski <
> >>>>> lewandowski.jacek@gmail.com> wrote:
> >>>>>>> With the flat structure it turns into properties file - would it be
> >>>>>>> possible to support both formats - nested yaml and flat properties?
> >>>>>>>
> >>>>>>>
> >>>>>>> - - -- --- ----- -------- -------------
> >>>>>>> Jacek Lewandowski
> >>>>>>>
> >>>>>>>
> >>>>>>> On Fri, Nov 19, 2021 at 10:08 PM Caleb Rackliffe <
> >>>>> calebrackliffe@gmail.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> If it's nested, "track_warnings" would still work if you're
> grepping
> >>>>> around
> >>>>>>>> vim or less.
> >>>>>>>>
> >>>>>>>> I'd have to concede the point about grep output, although there
> are
> >>>>> tools
> >>>>>>>> likehttps://github.com/kislyuk/yq  that could probably be bent to
> >> do
> >>>>> what
> >>>>>>>> you want.
> >>>>>>>>
> >>>>>>>> On Fri, Nov 19, 2021 at 1:08 PM Stefan Miklosovic <
> >>>>>>>> stefan.miklosovic@instaclustr.com> wrote:
> >>>>>>>>
> >>>>>>>>> Hi David,
> >>>>>>>>>
> >>>>>>>>> while I do not oppose nested structure, it is really handy to
> grep
> >>>>>>>>> cassandra.yaml on some config key and you know the value
> instantly.
> >>>>>>>>> This is not possible when it is nested (easily & fastly) as it is
> >> on
> >>>>>>>>> two lines. Or maybe my grepping is just not advanced enough to
> >> cover
> >>>>>>>>> this case? If it is flat, I can just grep "track_warnings" and I
> >> have
> >>>>>>>>> them all.
> >>>>>>>>>
> >>>>>>>>> Can you elaborate on your last bullet point? Parsing layer ...
> >> What do
> >>>>>>>>> you mean specifically?
> >>>>>>>>>
> >>>>>>>>> Thanks
> >>>>>>>>>
> >>>>>>>>> On Fri, 19 Nov 2021 at 19:36, David Capwell<dc...@gmail.com>
> >>>>> wrote:
> >>>>>>>>>> This has been brought up in a few tickets, so pushing to the dev
> >>>>> list.
> >>>>>>>>>> CASSANDRA-15234 - Standardise config and JVM parameters
> >>>>>>>>>> CASSANDRA-16896 - hard/soft limits for queries
> >>>>>>>>>> CASSANDRA-17147 - Guardrails prototype
> >>>>>>>>>>
> >>>>>>>>>> In short, do we as a project wish to move "new features" into
> >> nested
> >>>>>>>>>> YAML when the feature has "enough" to justify the nesting?  I
> >> would
> >>>>>>>>>> really like to focus this discussion on new features rather than
> >>>>>>>>>> retroactively grouping (leaving that to CASSANDRA-15234), as
> >> there is
> >>>>>>>>>> already a place to talk about that.
> >>>>>>>>>>
> >>>>>>>>>> To get things started, let's start with the track-warning
> feature
> >>>>>>>>>> (hard/soft limits for queries), currently the configs look as
> >> follows
> >>>>>>>>>> (assuming 15234)
> >>>>>>>>>>
> >>>>>>>>>> track_warnings:
> >>>>>>>>>>     enabled: true
> >>>>>>>>>>     coordinator_read_size:
> >>>>>>>>>>         warn_threshold: 10kb
> >>>>>>>>>>         abort_threshold: 1mb
> >>>>>>>>>>     local_read_size:
> >>>>>>>>>>         warn_threshold: 10kb
> >>>>>>>>>>         abort_threshold: 1mb
> >>>>>>>>>>     row_index_size:
> >>>>>>>>>>         warn_threshold: 100mb
> >>>>>>>>>>         abort_threshold: 1gb
> >>>>>>>>>>
> >>>>>>>>>> or should this be "flat"
> >>>>>>>>>>
> >>>>>>>>>> track_warnings_enabled: true
> >>>>>>>>>> track_warnings_coordinator_read_size_warn_threshold: 10kb
> >>>>>>>>>> track_warnings_coordinator_read_size_abort_threshold: 1mb
> >>>>>>>>>> track_warnings_local_read_size_warn_threshold: 10kb
> >>>>>>>>>> track_warnings_local_read_size_abort_threshold: 1mb
> >>>>>>>>>> track_warnings_row_index_size_warn_threshold: 100mb
> >>>>>>>>>> track_warnings_row_index_size_abort_threshold: 1gb
> >>>>>>>>>>
> >>>>>>>>>> For me I prefer nested for a few reasons
> >>>>>>>>>> * easier to enforce consistency as the configs can use shared
> >> types;
> >>>>>>>>>> in the track warnings patch I had mismatches cross configs (warn
> >> vs
> >>>>>>>>>> warns, fail vs abort, etc.) before going nested, now everything
> >>>>> reuses
> >>>>>>>>>> the same types
> >>>>>>>>>> * even though it is longer, things can be more clear how they
> are
> >>>>>>>> related
> >>>>>>>>>> * parsing layer can add support for mixed or purely flat
> >> depending on
> >>>>>>>>>> user preference (example:
> >>>>>>>>>> track_warnings.row_index_size.abort_threshold, using the '.'
> >> notation
> >>>>>>>>>> to represent nested structures)
> >>>>>>>>>>
> >>>>>>>>>> Thoughts?
> >>>>>>>>>>
> >>>>>>>>>>
> >> ---------------------------------------------------------------------
> >>>>>>>>>> To unsubscribe,e-mail:dev-unsubscribe@cassandra.apache.org
> >>>>>>>>>> For additional commands,e-mail:dev-help@cassandra.apache.org
> >>>>>>>>>>
> >> ---------------------------------------------------------------------
> >>>>>>>>> To unsubscribe,e-mail:dev-unsubscribe@cassandra.apache.org
> >>>>>>>>> For additional commands,e-mail:dev-help@cassandra.apache.org
> >>>>>>>>>
> >>>>>>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe,e-mail:dev-unsubscribe@cassandra.apache.org
> >>>>> For additional commands,e-mail:dev-help@cassandra.apache.org
> >>>>>
> >>>>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe,e-mail:dev-unsubscribe@cassandra.apache.org
> >>> For additional commands,e-mail:dev-help@cassandra.apache.org
> >>>

Re: [DISCUSS] Nested YAML configs for new features

Posted by Bowen Song <bo...@bso.ng.INVALID>.

In ElasticSearch, the default is a flattened format with almost all 
lines commented out. See 
https://github.com/elastic/elasticsearch/blob/master/distribution/src/config/elasticsearch.yml

I guess they chose to do that because user can uncomment individual 
lines to make changes. In a structured config file, the user will have 
to uncomment all lines containing the parent keys to get it work. For 
example, if someone wants to set the config keyABB to a non-default 
value, they will have to correctly uncomment 3 lines: keyA, keyAB and 
keyABB, which can be annoying and could easily maker a mistake. If any 
of the first two keys is not uncommented, the YAML file will still be 
valid but the config like keyX.keyAB.keyABB might just get silently 
ignored by the database.

    keyX:
       keyY:
         keyZ: value
    # keyA:
    #   keyAA:
    #     key AAA: value
    #   keyAB:
    #     keyABA: value
    #     keyABB: value

On 29/11/2021 15:54, Benjamin Lerer wrote:
> I do not think that supporting both options is an issue. The settings
> virtual table would have to use the flattened version.
> If we support both formats, the question would be: what should be the one
> used by default in the configuration file?
>
> Le ven. 26 nov. 2021 à 15:40,benedict@apache.org  <be...@apache.org>  a
> écrit :
>
>> This is the approach I favour for config files also. We had a much less
>> engaged discussion on this topic only a few months ago, so glad to see more
>> people getting involved now.
>>
>> I would however personally prefer to see the configuration file slowly
>> deprecated (if perhaps never retired), in favour of virtual tables, so that
>> operators may easily set configurations for the entire cluster. Ideally it
>> would be possible to specify configuration per cluster, per DC and per
>> node, with the most specific configuration applying I would like to see a
>> similar hierarchy for Keyspace, Table and Per-Query options. Ideally only
>> the barest minimum number of options would be necessary to supply in a
>> config file, and only on first launch – seed nodes, for instance.
>>
>> So whatever design we employ here, we should IMO be aiming for it to be
>> compatible with a CQL representation also.
>>
>>
>> From: Bowen Song<bo...@bso.ng.INVALID>
>> Date: Wednesday, 24 November 2021 at 18:15
>> To:dev@cassandra.apache.org  <de...@cassandra.apache.org>
>> Subject: Re: [DISCUSS] Nested YAML configs for new features
>> Since you mentioned ElasticSearch, I'm actually pretty happy with their
>> config file syntax. It allows the user to completely flatten out the
>> entire config file. To give people who isn't familiar with ElasticSearch
>> an idea, here is a config file we use:
>>
>>      cluster.name: foobar
>>
>>      node.remote_cluster_client: false
>>      node.name: "foo.example.com"
>>      node.master: true
>>      node.data: true
>>      node.ingest: true
>>      node.ml: false
>>
>>      xpack.ml.enabled: false
>>      xpack.security.enabled: false
>>      xpack.security.audit.enabled: false
>>      xpack.watcher.enabled: false
>>
>>      action.auto_create_index: "+.,-*"
>>
>>      network.host: _global_
>>
>>      discovery.zen.hosts_provider: file
>>      discovery.zen.minimum_master_nodes: 2
>>
>>      http.publish_host: "foo.example.com"
>>      http.publish_port: 443
>>      http.bind_host: 127.0.0.1
>>
>>      transport.publish_host: "bar.example.com"
>>      transport.bind_host: 0.0.0.0
>>
>>      indices.fielddata.cache.size: 1GB
>>      indices.breaker.total.use_real_memory: false
>>
>>      path.logs: /var/log/elasticsearch
>>      path.data: /var/lib/elasticsearch/data
>>
>> As you can see we can use the flat (grep-able) syntax for everything.
>> This is also human readable because we can group options together by
>> inserting empty lines between them.
>>
>> The equivalent of the above in a structured syntax will be:
>>
>>      cluster:
>>           name: foobar
>>
>>      node:
>>           remote_cluster_client: false
>>           name: "foo.example.com"
>>           master: true
>>           data: true
>>           ingest: true
>>           ml: false
>>
>>      xpack:
>>           ml:
>>               enabled: false
>>           security:
>>               enabled: false
>>               audit:
>>                   enabled: false
>>           watcher:
>>               enabled: false
>>
>>      action:
>>           auto_create_index: "+.,-*"
>>
>>      network:
>>           host: _global_
>>
>>      discovery:
>>           zen:
>>               hosts_provider: file
>>               minimum_master_nodes: 2
>>
>>      http:
>>           publish_host: "foo.example.com"
>>           publish_port: 443
>>           bind_host: 127.0.0.1
>>
>>      transport:
>>           publish_host: "bar.example.com"
>>           bind_host: 0.0.0.0
>>
>>      indices:
>>           fielddata:
>>               cache:
>>                   size: 1GB
>>      indices:
>>           breaker:
>>               total:
>>                   use_real_memory: false
>>
>>      path:
>>           logs: /var/log/elasticsearch
>>           data: /var/lib/elasticsearch/data
>>
>> This may be easier to read for some people, but it is a total nightmare
>> for "grep" - so many keys have identical names, such as "enabled".
>>
>> Also, for the virtual tables, it would be a lot easier to represent
>> individual values in a virtual table when the config is flat and keys
>> are unique. The virtual tables would need to either support the encoding
>> and decoding of the structured config into a flat structure, or use JSON
>> encoded string value. The use of JSON would make querying individual
>> value much harder.
>>
>> On 22/11/2021 16:16, Joseph Lynch wrote:
>>> Isn't one of the primary reasons to have a YAML configuration instead
>>> of a properties file is to allow typed and structured (implies nested)
>>> configuration? I think it makes a lot of sense to group related
>>> configuration options (e.g. a feature) into a typed class when we're
>>> talking about more than one or two related options.
>>>
>>> It's pretty standard elsewhere in the JVM ecosystem to encode YAMLs to
>>> period encoded key->value pairs when required (usually when providing
>>> a property or override layer), Spring and Elasticsearch yamls both
>>> come to mind. It seems pretty reasonable to support dot encoding and
>>> decoding, for example {"a": {"b": 12}} -> '"a.b": 12'.
>>>
>>> Regarding quickly telling what configuration a node is running I think
>>> we should lean on virtual tables for "what is the current
>>> configuration" now that we have them, as others have said the written
>>> cassandra.yaml is not necessarily the current configuration ... and
>>> also grep -C or -A exist for this reason.
>>>
>>> -Joey
>>>
>>> On Mon, Nov 22, 2021 at 4:14 AM Benjamin Lerer<bl...@apache.org>
>> wrote:
>>>> I do not have a strong opinion for one or the other but wanted to raise
>> the
>>>> issue I see with the "Settings" virtual table.
>>>>
>>>> Currently the "Settings" virtual table converts nested options into flat
>>>> options using a "_" separator. For those options it allows a user to
>> query
>>>> the all set of options through some hack.
>>>> If we decide to move to more nesting (more than one level), it seems to
>> me
>>>> that we need to change the way this table is behaving and how we can
>> query
>>>> its data.
>>>>
>>>> We would need to start using "." as a nesting separator to ensure that
>>>> things are consistent between the configuration and the table and add
>>>> support for LIKE restrictions for filtering queries to allow operators
>> to
>>>> be able to select the precise set of settings that the operator is
>> looking
>>>> for.
>>>>
>>>> Doing so is not really complicated in itself but might impact some
>> users.
>>>> Le ven. 19 nov. 2021 à 22:39, David Capwell<dc...@apple.com.invalid>
>> a
>>>> écrit :
>>>>
>>>>>> it is really handy to grep
>>>>>> cassandra.yaml on some config key and you know the value instantly.
>>>>> You can still do that
>>>>>
>>>>> $ grep -A2 coordinator_read_size conf/cassandra.yaml
>>>>> #     coordinator_read_size:
>>>>> #         warn_threshold_kb: 0
>>>>> #         abort_threshold_kb: 0
>>>>>
>>>>> I was also arguing we should support nested and flat, so if your infra
>>>>> works better with flat then you could use
>>>>>
>>>>> track_warnings.coordinator_read_size.warn_threshold_kb: 0
>>>>> track_warnings.coordinator_read_size.abort_threshold_kb: 0
>>>>>
>>>>>> On Nov 19, 2021, at 1:34 PM, David Capwell<dc...@apple.com>
>> wrote:
>>>>>>> With the flat structure it turns into properties file - would it be
>>>>>>> possible to support both formats - nested yaml and flat properties?
>>>>>> For majority of our configs yes, but there are a subset where flat
>>>>> properties is annoying
>>>>>> hinted_handoff_disabled_datacenters - set type, so you could do
>>>>> hinted_handoff_disabled_datacenters=“a,b,c,d” but we would need to deal
>>>>> with separators as the format doesn’t support
>>>>>> seed_provider.parameters - this is a map type… so would need to do
>>>>> something like seed_provider.parameters=“{\”a\”: \a\”}” …. Maybe we
>> special
>>>>> case maps as dynamic fields?  Then seed_provider.parameters.a=a?  We
>> have
>>>>> ParameterizedClass all over the code
>>>>>> So, as long as we define how to deal with java collections; we could
>> in
>>>>> theory support properties files (not arguing for that in this thread)
>> as
>>>>> well as system properties.
>>>>>>> On Nov 19, 2021, at 1:22 PM, Jacek Lewandowski <
>>>>> lewandowski.jacek@gmail.com> wrote:
>>>>>>> With the flat structure it turns into properties file - would it be
>>>>>>> possible to support both formats - nested yaml and flat properties?
>>>>>>>
>>>>>>>
>>>>>>> - - -- --- ----- -------- -------------
>>>>>>> Jacek Lewandowski
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Nov 19, 2021 at 10:08 PM Caleb Rackliffe <
>>>>> calebrackliffe@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> If it's nested, "track_warnings" would still work if you're grepping
>>>>> around
>>>>>>>> vim or less.
>>>>>>>>
>>>>>>>> I'd have to concede the point about grep output, although there are
>>>>> tools
>>>>>>>> likehttps://github.com/kislyuk/yq  that could probably be bent to
>> do
>>>>> what
>>>>>>>> you want.
>>>>>>>>
>>>>>>>> On Fri, Nov 19, 2021 at 1:08 PM Stefan Miklosovic <
>>>>>>>> stefan.miklosovic@instaclustr.com> wrote:
>>>>>>>>
>>>>>>>>> Hi David,
>>>>>>>>>
>>>>>>>>> while I do not oppose nested structure, it is really handy to grep
>>>>>>>>> cassandra.yaml on some config key and you know the value instantly.
>>>>>>>>> This is not possible when it is nested (easily & fastly) as it is
>> on
>>>>>>>>> two lines. Or maybe my grepping is just not advanced enough to
>> cover
>>>>>>>>> this case? If it is flat, I can just grep "track_warnings" and I
>> have
>>>>>>>>> them all.
>>>>>>>>>
>>>>>>>>> Can you elaborate on your last bullet point? Parsing layer ...
>> What do
>>>>>>>>> you mean specifically?
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>> On Fri, 19 Nov 2021 at 19:36, David Capwell<dc...@gmail.com>
>>>>> wrote:
>>>>>>>>>> This has been brought up in a few tickets, so pushing to the dev
>>>>> list.
>>>>>>>>>> CASSANDRA-15234 - Standardise config and JVM parameters
>>>>>>>>>> CASSANDRA-16896 - hard/soft limits for queries
>>>>>>>>>> CASSANDRA-17147 - Guardrails prototype
>>>>>>>>>>
>>>>>>>>>> In short, do we as a project wish to move "new features" into
>> nested
>>>>>>>>>> YAML when the feature has "enough" to justify the nesting?  I
>> would
>>>>>>>>>> really like to focus this discussion on new features rather than
>>>>>>>>>> retroactively grouping (leaving that to CASSANDRA-15234), as
>> there is
>>>>>>>>>> already a place to talk about that.
>>>>>>>>>>
>>>>>>>>>> To get things started, let's start with the track-warning feature
>>>>>>>>>> (hard/soft limits for queries), currently the configs look as
>> follows
>>>>>>>>>> (assuming 15234)
>>>>>>>>>>
>>>>>>>>>> track_warnings:
>>>>>>>>>>     enabled: true
>>>>>>>>>>     coordinator_read_size:
>>>>>>>>>>         warn_threshold: 10kb
>>>>>>>>>>         abort_threshold: 1mb
>>>>>>>>>>     local_read_size:
>>>>>>>>>>         warn_threshold: 10kb
>>>>>>>>>>         abort_threshold: 1mb
>>>>>>>>>>     row_index_size:
>>>>>>>>>>         warn_threshold: 100mb
>>>>>>>>>>         abort_threshold: 1gb
>>>>>>>>>>
>>>>>>>>>> or should this be "flat"
>>>>>>>>>>
>>>>>>>>>> track_warnings_enabled: true
>>>>>>>>>> track_warnings_coordinator_read_size_warn_threshold: 10kb
>>>>>>>>>> track_warnings_coordinator_read_size_abort_threshold: 1mb
>>>>>>>>>> track_warnings_local_read_size_warn_threshold: 10kb
>>>>>>>>>> track_warnings_local_read_size_abort_threshold: 1mb
>>>>>>>>>> track_warnings_row_index_size_warn_threshold: 100mb
>>>>>>>>>> track_warnings_row_index_size_abort_threshold: 1gb
>>>>>>>>>>
>>>>>>>>>> For me I prefer nested for a few reasons
>>>>>>>>>> * easier to enforce consistency as the configs can use shared
>> types;
>>>>>>>>>> in the track warnings patch I had mismatches cross configs (warn
>> vs
>>>>>>>>>> warns, fail vs abort, etc.) before going nested, now everything
>>>>> reuses
>>>>>>>>>> the same types
>>>>>>>>>> * even though it is longer, things can be more clear how they are
>>>>>>>> related
>>>>>>>>>> * parsing layer can add support for mixed or purely flat
>> depending on
>>>>>>>>>> user preference (example:
>>>>>>>>>> track_warnings.row_index_size.abort_threshold, using the '.'
>> notation
>>>>>>>>>> to represent nested structures)
>>>>>>>>>>
>>>>>>>>>> Thoughts?
>>>>>>>>>>
>>>>>>>>>>
>> ---------------------------------------------------------------------
>>>>>>>>>> To unsubscribe,e-mail:dev-unsubscribe@cassandra.apache.org
>>>>>>>>>> For additional commands,e-mail:dev-help@cassandra.apache.org
>>>>>>>>>>
>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe,e-mail:dev-unsubscribe@cassandra.apache.org
>>>>>>>>> For additional commands,e-mail:dev-help@cassandra.apache.org
>>>>>>>>>
>>>>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe,e-mail:dev-unsubscribe@cassandra.apache.org
>>>>> For additional commands,e-mail:dev-help@cassandra.apache.org
>>>>>
>>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe,e-mail:dev-unsubscribe@cassandra.apache.org
>>> For additional commands,e-mail:dev-help@cassandra.apache.org
>>>

Re: [DISCUSS] Nested YAML configs for new features

Posted by Benjamin Lerer <bl...@apache.org>.

I do not think that supporting both options is an issue. The settings
virtual table would have to use the flattened version.
If we support both formats, the question would be: what should be the one
used by default in the configuration file?

Le ven. 26 nov. 2021 à 15:40, benedict@apache.org <be...@apache.org> a
écrit :

> This is the approach I favour for config files also. We had a much less
> engaged discussion on this topic only a few months ago, so glad to see more
> people getting involved now.
>
> I would however personally prefer to see the configuration file slowly
> deprecated (if perhaps never retired), in favour of virtual tables, so that
> operators may easily set configurations for the entire cluster. Ideally it
> would be possible to specify configuration per cluster, per DC and per
> node, with the most specific configuration applying I would like to see a
> similar hierarchy for Keyspace, Table and Per-Query options. Ideally only
> the barest minimum number of options would be necessary to supply in a
> config file, and only on first launch – seed nodes, for instance.
>
> So whatever design we employ here, we should IMO be aiming for it to be
> compatible with a CQL representation also.
>
>
> From: Bowen Song <bo...@bso.ng.INVALID>
> Date: Wednesday, 24 November 2021 at 18:15
> To: dev@cassandra.apache.org <de...@cassandra.apache.org>
> Subject: Re: [DISCUSS] Nested YAML configs for new features
> Since you mentioned ElasticSearch, I'm actually pretty happy with their
> config file syntax. It allows the user to completely flatten out the
> entire config file. To give people who isn't familiar with ElasticSearch
> an idea, here is a config file we use:
>
>     cluster.name: foobar
>
>     node.remote_cluster_client: false
>     node.name: "foo.example.com"
>     node.master: true
>     node.data: true
>     node.ingest: true
>     node.ml: false
>
>     xpack.ml.enabled: false
>     xpack.security.enabled: false
>     xpack.security.audit.enabled: false
>     xpack.watcher.enabled: false
>
>     action.auto_create_index: "+.,-*"
>
>     network.host: _global_
>
>     discovery.zen.hosts_provider: file
>     discovery.zen.minimum_master_nodes: 2
>
>     http.publish_host: "foo.example.com"
>     http.publish_port: 443
>     http.bind_host: 127.0.0.1
>
>     transport.publish_host: "bar.example.com"
>     transport.bind_host: 0.0.0.0
>
>     indices.fielddata.cache.size: 1GB
>     indices.breaker.total.use_real_memory: false
>
>     path.logs: /var/log/elasticsearch
>     path.data: /var/lib/elasticsearch/data
>
> As you can see we can use the flat (grep-able) syntax for everything.
> This is also human readable because we can group options together by
> inserting empty lines between them.
>
> The equivalent of the above in a structured syntax will be:
>
>     cluster:
>          name: foobar
>
>     node:
>          remote_cluster_client: false
>          name: "foo.example.com"
>          master: true
>          data: true
>          ingest: true
>          ml: false
>
>     xpack:
>          ml:
>              enabled: false
>          security:
>              enabled: false
>              audit:
>                  enabled: false
>          watcher:
>              enabled: false
>
>     action:
>          auto_create_index: "+.,-*"
>
>     network:
>          host: _global_
>
>     discovery:
>          zen:
>              hosts_provider: file
>              minimum_master_nodes: 2
>
>     http:
>          publish_host: "foo.example.com"
>          publish_port: 443
>          bind_host: 127.0.0.1
>
>     transport:
>          publish_host: "bar.example.com"
>          bind_host: 0.0.0.0
>
>     indices:
>          fielddata:
>              cache:
>                  size: 1GB
>     indices:
>          breaker:
>              total:
>                  use_real_memory: false
>
>     path:
>          logs: /var/log/elasticsearch
>          data: /var/lib/elasticsearch/data
>
> This may be easier to read for some people, but it is a total nightmare
> for "grep" - so many keys have identical names, such as "enabled".
>
> Also, for the virtual tables, it would be a lot easier to represent
> individual values in a virtual table when the config is flat and keys
> are unique. The virtual tables would need to either support the encoding
> and decoding of the structured config into a flat structure, or use JSON
> encoded string value. The use of JSON would make querying individual
> value much harder.
>
> On 22/11/2021 16:16, Joseph Lynch wrote:
> > Isn't one of the primary reasons to have a YAML configuration instead
> > of a properties file is to allow typed and structured (implies nested)
> > configuration? I think it makes a lot of sense to group related
> > configuration options (e.g. a feature) into a typed class when we're
> > talking about more than one or two related options.
> >
> > It's pretty standard elsewhere in the JVM ecosystem to encode YAMLs to
> > period encoded key->value pairs when required (usually when providing
> > a property or override layer), Spring and Elasticsearch yamls both
> > come to mind. It seems pretty reasonable to support dot encoding and
> > decoding, for example {"a": {"b": 12}} -> '"a.b": 12'.
> >
> > Regarding quickly telling what configuration a node is running I think
> > we should lean on virtual tables for "what is the current
> > configuration" now that we have them, as others have said the written
> > cassandra.yaml is not necessarily the current configuration ... and
> > also grep -C or -A exist for this reason.
> >
> > -Joey
> >
> > On Mon, Nov 22, 2021 at 4:14 AM Benjamin Lerer<bl...@apache.org>
> wrote:
> >> I do not have a strong opinion for one or the other but wanted to raise
> the
> >> issue I see with the "Settings" virtual table.
> >>
> >> Currently the "Settings" virtual table converts nested options into flat
> >> options using a "_" separator. For those options it allows a user to
> query
> >> the all set of options through some hack.
> >> If we decide to move to more nesting (more than one level), it seems to
> me
> >> that we need to change the way this table is behaving and how we can
> query
> >> its data.
> >>
> >> We would need to start using "." as a nesting separator to ensure that
> >> things are consistent between the configuration and the table and add
> >> support for LIKE restrictions for filtering queries to allow operators
> to
> >> be able to select the precise set of settings that the operator is
> looking
> >> for.
> >>
> >> Doing so is not really complicated in itself but might impact some
> users.
> >>
> >> Le ven. 19 nov. 2021 à 22:39, David Capwell<dc...@apple.com.invalid>
> a
> >> écrit :
> >>
> >>>> it is really handy to grep
> >>>> cassandra.yaml on some config key and you know the value instantly.
> >>> You can still do that
> >>>
> >>> $ grep -A2 coordinator_read_size conf/cassandra.yaml
> >>> #     coordinator_read_size:
> >>> #         warn_threshold_kb: 0
> >>> #         abort_threshold_kb: 0
> >>>
> >>> I was also arguing we should support nested and flat, so if your infra
> >>> works better with flat then you could use
> >>>
> >>> track_warnings.coordinator_read_size.warn_threshold_kb: 0
> >>> track_warnings.coordinator_read_size.abort_threshold_kb: 0
> >>>
> >>>> On Nov 19, 2021, at 1:34 PM, David Capwell<dc...@apple.com>
> wrote:
> >>>>
> >>>>> With the flat structure it turns into properties file - would it be
> >>>>> possible to support both formats - nested yaml and flat properties?
> >>>>
> >>>> For majority of our configs yes, but there are a subset where flat
> >>> properties is annoying
> >>>> hinted_handoff_disabled_datacenters - set type, so you could do
> >>> hinted_handoff_disabled_datacenters=“a,b,c,d” but we would need to deal
> >>> with separators as the format doesn’t support
> >>>> seed_provider.parameters - this is a map type… so would need to do
> >>> something like seed_provider.parameters=“{\”a\”: \a\”}” …. Maybe we
> special
> >>> case maps as dynamic fields?  Then seed_provider.parameters.a=a?  We
> have
> >>> ParameterizedClass all over the code
> >>>> So, as long as we define how to deal with java collections; we could
> in
> >>> theory support properties files (not arguing for that in this thread)
> as
> >>> well as system properties.
> >>>>
> >>>>> On Nov 19, 2021, at 1:22 PM, Jacek Lewandowski <
> >>> lewandowski.jacek@gmail.com> wrote:
> >>>>> With the flat structure it turns into properties file - would it be
> >>>>> possible to support both formats - nested yaml and flat properties?
> >>>>>
> >>>>>
> >>>>> - - -- --- ----- -------- -------------
> >>>>> Jacek Lewandowski
> >>>>>
> >>>>>
> >>>>> On Fri, Nov 19, 2021 at 10:08 PM Caleb Rackliffe <
> >>> calebrackliffe@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> If it's nested, "track_warnings" would still work if you're grepping
> >>> around
> >>>>>> vim or less.
> >>>>>>
> >>>>>> I'd have to concede the point about grep output, although there are
> >>> tools
> >>>>>> likehttps://github.com/kislyuk/yq  that could probably be bent to
> do
> >>> what
> >>>>>> you want.
> >>>>>>
> >>>>>> On Fri, Nov 19, 2021 at 1:08 PM Stefan Miklosovic <
> >>>>>> stefan.miklosovic@instaclustr.com> wrote:
> >>>>>>
> >>>>>>> Hi David,
> >>>>>>>
> >>>>>>> while I do not oppose nested structure, it is really handy to grep
> >>>>>>> cassandra.yaml on some config key and you know the value instantly.
> >>>>>>> This is not possible when it is nested (easily & fastly) as it is
> on
> >>>>>>> two lines. Or maybe my grepping is just not advanced enough to
> cover
> >>>>>>> this case? If it is flat, I can just grep "track_warnings" and I
> have
> >>>>>>> them all.
> >>>>>>>
> >>>>>>> Can you elaborate on your last bullet point? Parsing layer ...
> What do
> >>>>>>> you mean specifically?
> >>>>>>>
> >>>>>>> Thanks
> >>>>>>>
> >>>>>>> On Fri, 19 Nov 2021 at 19:36, David Capwell<dc...@gmail.com>
> >>> wrote:
> >>>>>>>> This has been brought up in a few tickets, so pushing to the dev
> >>> list.
> >>>>>>>> CASSANDRA-15234 - Standardise config and JVM parameters
> >>>>>>>> CASSANDRA-16896 - hard/soft limits for queries
> >>>>>>>> CASSANDRA-17147 - Guardrails prototype
> >>>>>>>>
> >>>>>>>> In short, do we as a project wish to move "new features" into
> nested
> >>>>>>>> YAML when the feature has "enough" to justify the nesting?  I
> would
> >>>>>>>> really like to focus this discussion on new features rather than
> >>>>>>>> retroactively grouping (leaving that to CASSANDRA-15234), as
> there is
> >>>>>>>> already a place to talk about that.
> >>>>>>>>
> >>>>>>>> To get things started, let's start with the track-warning feature
> >>>>>>>> (hard/soft limits for queries), currently the configs look as
> follows
> >>>>>>>> (assuming 15234)
> >>>>>>>>
> >>>>>>>> track_warnings:
> >>>>>>>>    enabled: true
> >>>>>>>>    coordinator_read_size:
> >>>>>>>>        warn_threshold: 10kb
> >>>>>>>>        abort_threshold: 1mb
> >>>>>>>>    local_read_size:
> >>>>>>>>        warn_threshold: 10kb
> >>>>>>>>        abort_threshold: 1mb
> >>>>>>>>    row_index_size:
> >>>>>>>>        warn_threshold: 100mb
> >>>>>>>>        abort_threshold: 1gb
> >>>>>>>>
> >>>>>>>> or should this be "flat"
> >>>>>>>>
> >>>>>>>> track_warnings_enabled: true
> >>>>>>>> track_warnings_coordinator_read_size_warn_threshold: 10kb
> >>>>>>>> track_warnings_coordinator_read_size_abort_threshold: 1mb
> >>>>>>>> track_warnings_local_read_size_warn_threshold: 10kb
> >>>>>>>> track_warnings_local_read_size_abort_threshold: 1mb
> >>>>>>>> track_warnings_row_index_size_warn_threshold: 100mb
> >>>>>>>> track_warnings_row_index_size_abort_threshold: 1gb
> >>>>>>>>
> >>>>>>>> For me I prefer nested for a few reasons
> >>>>>>>> * easier to enforce consistency as the configs can use shared
> types;
> >>>>>>>> in the track warnings patch I had mismatches cross configs (warn
> vs
> >>>>>>>> warns, fail vs abort, etc.) before going nested, now everything
> >>> reuses
> >>>>>>>> the same types
> >>>>>>>> * even though it is longer, things can be more clear how they are
> >>>>>> related
> >>>>>>>> * parsing layer can add support for mixed or purely flat
> depending on
> >>>>>>>> user preference (example:
> >>>>>>>> track_warnings.row_index_size.abort_threshold, using the '.'
> notation
> >>>>>>>> to represent nested structures)
> >>>>>>>>
> >>>>>>>> Thoughts?
> >>>>>>>>
> >>>>>>>>
> ---------------------------------------------------------------------
> >>>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
> >>>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
> >>>>>>>>
> >>>>>>>
> ---------------------------------------------------------------------
> >>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
> >>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
> >>>>>>>
> >>>>>>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
> >>> For additional commands, e-mail:dev-help@cassandra.apache.org
> >>>
> >>>
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail:dev-help@cassandra.apache.org
> >
>

Re: [DISCUSS] Nested YAML configs for new features

Posted by "benedict@apache.org" <be...@apache.org>.

This is the approach I favour for config files also. We had a much less engaged discussion on this topic only a few months ago, so glad to see more people getting involved now.

I would however personally prefer to see the configuration file slowly deprecated (if perhaps never retired), in favour of virtual tables, so that operators may easily set configurations for the entire cluster. Ideally it would be possible to specify configuration per cluster, per DC and per node, with the most specific configuration applying I would like to see a similar hierarchy for Keyspace, Table and Per-Query options. Ideally only the barest minimum number of options would be necessary to supply in a config file, and only on first launch – seed nodes, for instance.

So whatever design we employ here, we should IMO be aiming for it to be compatible with a CQL representation also.


From: Bowen Song <bo...@bso.ng.INVALID>
Date: Wednesday, 24 November 2021 at 18:15
To: dev@cassandra.apache.org <de...@cassandra.apache.org>
Subject: Re: [DISCUSS] Nested YAML configs for new features
Since you mentioned ElasticSearch, I'm actually pretty happy with their
config file syntax. It allows the user to completely flatten out the
entire config file. To give people who isn't familiar with ElasticSearch
an idea, here is a config file we use:

    cluster.name: foobar

    node.remote_cluster_client: false
    node.name: "foo.example.com"
    node.master: true
    node.data: true
    node.ingest: true
    node.ml: false

    xpack.ml.enabled: false
    xpack.security.enabled: false
    xpack.security.audit.enabled: false
    xpack.watcher.enabled: false

    action.auto_create_index: "+.,-*"

    network.host: _global_

    discovery.zen.hosts_provider: file
    discovery.zen.minimum_master_nodes: 2

    http.publish_host: "foo.example.com"
    http.publish_port: 443
    http.bind_host: 127.0.0.1

    transport.publish_host: "bar.example.com"
    transport.bind_host: 0.0.0.0

    indices.fielddata.cache.size: 1GB
    indices.breaker.total.use_real_memory: false

    path.logs: /var/log/elasticsearch
    path.data: /var/lib/elasticsearch/data

As you can see we can use the flat (grep-able) syntax for everything.
This is also human readable because we can group options together by
inserting empty lines between them.

The equivalent of the above in a structured syntax will be:

    cluster:
         name: foobar

    node:
         remote_cluster_client: false
         name: "foo.example.com"
         master: true
         data: true
         ingest: true
         ml: false

    xpack:
         ml:
             enabled: false
         security:
             enabled: false
             audit:
                 enabled: false
         watcher:
             enabled: false

    action:
         auto_create_index: "+.,-*"

    network:
         host: _global_

    discovery:
         zen:
             hosts_provider: file
             minimum_master_nodes: 2

    http:
         publish_host: "foo.example.com"
         publish_port: 443
         bind_host: 127.0.0.1

    transport:
         publish_host: "bar.example.com"
         bind_host: 0.0.0.0

    indices:
         fielddata:
             cache:
                 size: 1GB
    indices:
         breaker:
             total:
                 use_real_memory: false

    path:
         logs: /var/log/elasticsearch
         data: /var/lib/elasticsearch/data

This may be easier to read for some people, but it is a total nightmare
for "grep" - so many keys have identical names, such as "enabled".

Also, for the virtual tables, it would be a lot easier to represent
individual values in a virtual table when the config is flat and keys
are unique. The virtual tables would need to either support the encoding
and decoding of the structured config into a flat structure, or use JSON
encoded string value. The use of JSON would make querying individual
value much harder.

On 22/11/2021 16:16, Joseph Lynch wrote:
> Isn't one of the primary reasons to have a YAML configuration instead
> of a properties file is to allow typed and structured (implies nested)
> configuration? I think it makes a lot of sense to group related
> configuration options (e.g. a feature) into a typed class when we're
> talking about more than one or two related options.
>
> It's pretty standard elsewhere in the JVM ecosystem to encode YAMLs to
> period encoded key->value pairs when required (usually when providing
> a property or override layer), Spring and Elasticsearch yamls both
> come to mind. It seems pretty reasonable to support dot encoding and
> decoding, for example {"a": {"b": 12}} -> '"a.b": 12'.
>
> Regarding quickly telling what configuration a node is running I think
> we should lean on virtual tables for "what is the current
> configuration" now that we have them, as others have said the written
> cassandra.yaml is not necessarily the current configuration ... and
> also grep -C or -A exist for this reason.
>
> -Joey
>
> On Mon, Nov 22, 2021 at 4:14 AM Benjamin Lerer<bl...@apache.org>  wrote:
>> I do not have a strong opinion for one or the other but wanted to raise the
>> issue I see with the "Settings" virtual table.
>>
>> Currently the "Settings" virtual table converts nested options into flat
>> options using a "_" separator. For those options it allows a user to query
>> the all set of options through some hack.
>> If we decide to move to more nesting (more than one level), it seems to me
>> that we need to change the way this table is behaving and how we can query
>> its data.
>>
>> We would need to start using "." as a nesting separator to ensure that
>> things are consistent between the configuration and the table and add
>> support for LIKE restrictions for filtering queries to allow operators to
>> be able to select the precise set of settings that the operator is looking
>> for.
>>
>> Doing so is not really complicated in itself but might impact some users.
>>
>> Le ven. 19 nov. 2021 à 22:39, David Capwell<dc...@apple.com.invalid>  a
>> écrit :
>>
>>>> it is really handy to grep
>>>> cassandra.yaml on some config key and you know the value instantly.
>>> You can still do that
>>>
>>> $ grep -A2 coordinator_read_size conf/cassandra.yaml
>>> #     coordinator_read_size:
>>> #         warn_threshold_kb: 0
>>> #         abort_threshold_kb: 0
>>>
>>> I was also arguing we should support nested and flat, so if your infra
>>> works better with flat then you could use
>>>
>>> track_warnings.coordinator_read_size.warn_threshold_kb: 0
>>> track_warnings.coordinator_read_size.abort_threshold_kb: 0
>>>
>>>> On Nov 19, 2021, at 1:34 PM, David Capwell<dc...@apple.com>  wrote:
>>>>
>>>>> With the flat structure it turns into properties file - would it be
>>>>> possible to support both formats - nested yaml and flat properties?
>>>>
>>>> For majority of our configs yes, but there are a subset where flat
>>> properties is annoying
>>>> hinted_handoff_disabled_datacenters - set type, so you could do
>>> hinted_handoff_disabled_datacenters=“a,b,c,d” but we would need to deal
>>> with separators as the format doesn’t support
>>>> seed_provider.parameters - this is a map type… so would need to do
>>> something like seed_provider.parameters=“{\”a\”: \a\”}” …. Maybe we special
>>> case maps as dynamic fields?  Then seed_provider.parameters.a=a?  We have
>>> ParameterizedClass all over the code
>>>> So, as long as we define how to deal with java collections; we could in
>>> theory support properties files (not arguing for that in this thread) as
>>> well as system properties.
>>>>
>>>>> On Nov 19, 2021, at 1:22 PM, Jacek Lewandowski <
>>> lewandowski.jacek@gmail.com> wrote:
>>>>> With the flat structure it turns into properties file - would it be
>>>>> possible to support both formats - nested yaml and flat properties?
>>>>>
>>>>>
>>>>> - - -- --- ----- -------- -------------
>>>>> Jacek Lewandowski
>>>>>
>>>>>
>>>>> On Fri, Nov 19, 2021 at 10:08 PM Caleb Rackliffe <
>>> calebrackliffe@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> If it's nested, "track_warnings" would still work if you're grepping
>>> around
>>>>>> vim or less.
>>>>>>
>>>>>> I'd have to concede the point about grep output, although there are
>>> tools
>>>>>> likehttps://github.com/kislyuk/yq  that could probably be bent to do
>>> what
>>>>>> you want.
>>>>>>
>>>>>> On Fri, Nov 19, 2021 at 1:08 PM Stefan Miklosovic <
>>>>>> stefan.miklosovic@instaclustr.com> wrote:
>>>>>>
>>>>>>> Hi David,
>>>>>>>
>>>>>>> while I do not oppose nested structure, it is really handy to grep
>>>>>>> cassandra.yaml on some config key and you know the value instantly.
>>>>>>> This is not possible when it is nested (easily & fastly) as it is on
>>>>>>> two lines. Or maybe my grepping is just not advanced enough to cover
>>>>>>> this case? If it is flat, I can just grep "track_warnings" and I have
>>>>>>> them all.
>>>>>>>
>>>>>>> Can you elaborate on your last bullet point? Parsing layer ... What do
>>>>>>> you mean specifically?
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> On Fri, 19 Nov 2021 at 19:36, David Capwell<dc...@gmail.com>
>>> wrote:
>>>>>>>> This has been brought up in a few tickets, so pushing to the dev
>>> list.
>>>>>>>> CASSANDRA-15234 - Standardise config and JVM parameters
>>>>>>>> CASSANDRA-16896 - hard/soft limits for queries
>>>>>>>> CASSANDRA-17147 - Guardrails prototype
>>>>>>>>
>>>>>>>> In short, do we as a project wish to move "new features" into nested
>>>>>>>> YAML when the feature has "enough" to justify the nesting?  I would
>>>>>>>> really like to focus this discussion on new features rather than
>>>>>>>> retroactively grouping (leaving that to CASSANDRA-15234), as there is
>>>>>>>> already a place to talk about that.
>>>>>>>>
>>>>>>>> To get things started, let's start with the track-warning feature
>>>>>>>> (hard/soft limits for queries), currently the configs look as follows
>>>>>>>> (assuming 15234)
>>>>>>>>
>>>>>>>> track_warnings:
>>>>>>>>    enabled: true
>>>>>>>>    coordinator_read_size:
>>>>>>>>        warn_threshold: 10kb
>>>>>>>>        abort_threshold: 1mb
>>>>>>>>    local_read_size:
>>>>>>>>        warn_threshold: 10kb
>>>>>>>>        abort_threshold: 1mb
>>>>>>>>    row_index_size:
>>>>>>>>        warn_threshold: 100mb
>>>>>>>>        abort_threshold: 1gb
>>>>>>>>
>>>>>>>> or should this be "flat"
>>>>>>>>
>>>>>>>> track_warnings_enabled: true
>>>>>>>> track_warnings_coordinator_read_size_warn_threshold: 10kb
>>>>>>>> track_warnings_coordinator_read_size_abort_threshold: 1mb
>>>>>>>> track_warnings_local_read_size_warn_threshold: 10kb
>>>>>>>> track_warnings_local_read_size_abort_threshold: 1mb
>>>>>>>> track_warnings_row_index_size_warn_threshold: 100mb
>>>>>>>> track_warnings_row_index_size_abort_threshold: 1gb
>>>>>>>>
>>>>>>>> For me I prefer nested for a few reasons
>>>>>>>> * easier to enforce consistency as the configs can use shared types;
>>>>>>>> in the track warnings patch I had mismatches cross configs (warn vs
>>>>>>>> warns, fail vs abort, etc.) before going nested, now everything
>>> reuses
>>>>>>>> the same types
>>>>>>>> * even though it is longer, things can be more clear how they are
>>>>>> related
>>>>>>>> * parsing layer can add support for mixed or purely flat depending on
>>>>>>>> user preference (example:
>>>>>>>> track_warnings.row_index_size.abort_threshold, using the '.' notation
>>>>>>>> to represent nested structures)
>>>>>>>>
>>>>>>>> Thoughts?
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>>>>>
>>>>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>
>>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail:dev-help@cassandra.apache.org
>

Re: [DISCUSS] Nested YAML configs for new features

Posted by Bowen Song <bo...@bso.ng.INVALID>.

Since you mentioned ElasticSearch, I'm actually pretty happy with their 
config file syntax. It allows the user to completely flatten out the 
entire config file. To give people who isn't familiar with ElasticSearch 
an idea, here is a config file we use:

    cluster.name: foobar

    node.remote_cluster_client: false
    node.name: "foo.example.com"
    node.master: true
    node.data: true
    node.ingest: true
    node.ml: false

    xpack.ml.enabled: false
    xpack.security.enabled: false
    xpack.security.audit.enabled: false
    xpack.watcher.enabled: false

    action.auto_create_index: "+.,-*"

    network.host: _global_

    discovery.zen.hosts_provider: file
    discovery.zen.minimum_master_nodes: 2

    http.publish_host: "foo.example.com"
    http.publish_port: 443
    http.bind_host: 127.0.0.1

    transport.publish_host: "bar.example.com"
    transport.bind_host: 0.0.0.0

    indices.fielddata.cache.size: 1GB
    indices.breaker.total.use_real_memory: false

    path.logs: /var/log/elasticsearch
    path.data: /var/lib/elasticsearch/data

As you can see we can use the flat (grep-able) syntax for everything. 
This is also human readable because we can group options together by 
inserting empty lines between them.

The equivalent of the above in a structured syntax will be:

    cluster:
         name: foobar

    node:
         remote_cluster_client: false
         name: "foo.example.com"
         master: true
         data: true
         ingest: true
         ml: false

    xpack:
         ml:
             enabled: false
         security:
             enabled: false
             audit:
                 enabled: false
         watcher:
             enabled: false

    action:
         auto_create_index: "+.,-*"

    network:
         host: _global_

    discovery:
         zen:
             hosts_provider: file
             minimum_master_nodes: 2

    http:
         publish_host: "foo.example.com"
         publish_port: 443
         bind_host: 127.0.0.1

    transport:
         publish_host: "bar.example.com"
         bind_host: 0.0.0.0

    indices:
         fielddata:
             cache:
                 size: 1GB
    indices:
         breaker:
             total:
                 use_real_memory: false

    path:
         logs: /var/log/elasticsearch
         data: /var/lib/elasticsearch/data

This may be easier to read for some people, but it is a total nightmare 
for "grep" - so many keys have identical names, such as "enabled".

Also, for the virtual tables, it would be a lot easier to represent 
individual values in a virtual table when the config is flat and keys 
are unique. The virtual tables would need to either support the encoding 
and decoding of the structured config into a flat structure, or use JSON 
encoded string value. The use of JSON would make querying individual 
value much harder.

On 22/11/2021 16:16, Joseph Lynch wrote:
> Isn't one of the primary reasons to have a YAML configuration instead
> of a properties file is to allow typed and structured (implies nested)
> configuration? I think it makes a lot of sense to group related
> configuration options (e.g. a feature) into a typed class when we're
> talking about more than one or two related options.
>
> It's pretty standard elsewhere in the JVM ecosystem to encode YAMLs to
> period encoded key->value pairs when required (usually when providing
> a property or override layer), Spring and Elasticsearch yamls both
> come to mind. It seems pretty reasonable to support dot encoding and
> decoding, for example {"a": {"b": 12}} -> '"a.b": 12'.
>
> Regarding quickly telling what configuration a node is running I think
> we should lean on virtual tables for "what is the current
> configuration" now that we have them, as others have said the written
> cassandra.yaml is not necessarily the current configuration ... and
> also grep -C or -A exist for this reason.
>
> -Joey
>
> On Mon, Nov 22, 2021 at 4:14 AM Benjamin Lerer<bl...@apache.org>  wrote:
>> I do not have a strong opinion for one or the other but wanted to raise the
>> issue I see with the "Settings" virtual table.
>>
>> Currently the "Settings" virtual table converts nested options into flat
>> options using a "_" separator. For those options it allows a user to query
>> the all set of options through some hack.
>> If we decide to move to more nesting (more than one level), it seems to me
>> that we need to change the way this table is behaving and how we can query
>> its data.
>>
>> We would need to start using "." as a nesting separator to ensure that
>> things are consistent between the configuration and the table and add
>> support for LIKE restrictions for filtering queries to allow operators to
>> be able to select the precise set of settings that the operator is looking
>> for.
>>
>> Doing so is not really complicated in itself but might impact some users.
>>
>> Le ven. 19 nov. 2021 à 22:39, David Capwell<dc...@apple.com.invalid>  a
>> écrit :
>>
>>>> it is really handy to grep
>>>> cassandra.yaml on some config key and you know the value instantly.
>>> You can still do that
>>>
>>> $ grep -A2 coordinator_read_size conf/cassandra.yaml
>>> #     coordinator_read_size:
>>> #         warn_threshold_kb: 0
>>> #         abort_threshold_kb: 0
>>>
>>> I was also arguing we should support nested and flat, so if your infra
>>> works better with flat then you could use
>>>
>>> track_warnings.coordinator_read_size.warn_threshold_kb: 0
>>> track_warnings.coordinator_read_size.abort_threshold_kb: 0
>>>
>>>> On Nov 19, 2021, at 1:34 PM, David Capwell<dc...@apple.com>  wrote:
>>>>
>>>>> With the flat structure it turns into properties file - would it be
>>>>> possible to support both formats - nested yaml and flat properties?
>>>>
>>>> For majority of our configs yes, but there are a subset where flat
>>> properties is annoying
>>>> hinted_handoff_disabled_datacenters - set type, so you could do
>>> hinted_handoff_disabled_datacenters=“a,b,c,d” but we would need to deal
>>> with separators as the format doesn’t support
>>>> seed_provider.parameters - this is a map type… so would need to do
>>> something like seed_provider.parameters=“{\”a\”: \a\”}” …. Maybe we special
>>> case maps as dynamic fields?  Then seed_provider.parameters.a=a?  We have
>>> ParameterizedClass all over the code
>>>> So, as long as we define how to deal with java collections; we could in
>>> theory support properties files (not arguing for that in this thread) as
>>> well as system properties.
>>>>
>>>>> On Nov 19, 2021, at 1:22 PM, Jacek Lewandowski <
>>> lewandowski.jacek@gmail.com> wrote:
>>>>> With the flat structure it turns into properties file - would it be
>>>>> possible to support both formats - nested yaml and flat properties?
>>>>>
>>>>>
>>>>> - - -- --- ----- -------- -------------
>>>>> Jacek Lewandowski
>>>>>
>>>>>
>>>>> On Fri, Nov 19, 2021 at 10:08 PM Caleb Rackliffe <
>>> calebrackliffe@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> If it's nested, "track_warnings" would still work if you're grepping
>>> around
>>>>>> vim or less.
>>>>>>
>>>>>> I'd have to concede the point about grep output, although there are
>>> tools
>>>>>> likehttps://github.com/kislyuk/yq  that could probably be bent to do
>>> what
>>>>>> you want.
>>>>>>
>>>>>> On Fri, Nov 19, 2021 at 1:08 PM Stefan Miklosovic <
>>>>>> stefan.miklosovic@instaclustr.com> wrote:
>>>>>>
>>>>>>> Hi David,
>>>>>>>
>>>>>>> while I do not oppose nested structure, it is really handy to grep
>>>>>>> cassandra.yaml on some config key and you know the value instantly.
>>>>>>> This is not possible when it is nested (easily & fastly) as it is on
>>>>>>> two lines. Or maybe my grepping is just not advanced enough to cover
>>>>>>> this case? If it is flat, I can just grep "track_warnings" and I have
>>>>>>> them all.
>>>>>>>
>>>>>>> Can you elaborate on your last bullet point? Parsing layer ... What do
>>>>>>> you mean specifically?
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> On Fri, 19 Nov 2021 at 19:36, David Capwell<dc...@gmail.com>
>>> wrote:
>>>>>>>> This has been brought up in a few tickets, so pushing to the dev
>>> list.
>>>>>>>> CASSANDRA-15234 - Standardise config and JVM parameters
>>>>>>>> CASSANDRA-16896 - hard/soft limits for queries
>>>>>>>> CASSANDRA-17147 - Guardrails prototype
>>>>>>>>
>>>>>>>> In short, do we as a project wish to move "new features" into nested
>>>>>>>> YAML when the feature has "enough" to justify the nesting?  I would
>>>>>>>> really like to focus this discussion on new features rather than
>>>>>>>> retroactively grouping (leaving that to CASSANDRA-15234), as there is
>>>>>>>> already a place to talk about that.
>>>>>>>>
>>>>>>>> To get things started, let's start with the track-warning feature
>>>>>>>> (hard/soft limits for queries), currently the configs look as follows
>>>>>>>> (assuming 15234)
>>>>>>>>
>>>>>>>> track_warnings:
>>>>>>>>    enabled: true
>>>>>>>>    coordinator_read_size:
>>>>>>>>        warn_threshold: 10kb
>>>>>>>>        abort_threshold: 1mb
>>>>>>>>    local_read_size:
>>>>>>>>        warn_threshold: 10kb
>>>>>>>>        abort_threshold: 1mb
>>>>>>>>    row_index_size:
>>>>>>>>        warn_threshold: 100mb
>>>>>>>>        abort_threshold: 1gb
>>>>>>>>
>>>>>>>> or should this be "flat"
>>>>>>>>
>>>>>>>> track_warnings_enabled: true
>>>>>>>> track_warnings_coordinator_read_size_warn_threshold: 10kb
>>>>>>>> track_warnings_coordinator_read_size_abort_threshold: 1mb
>>>>>>>> track_warnings_local_read_size_warn_threshold: 10kb
>>>>>>>> track_warnings_local_read_size_abort_threshold: 1mb
>>>>>>>> track_warnings_row_index_size_warn_threshold: 100mb
>>>>>>>> track_warnings_row_index_size_abort_threshold: 1gb
>>>>>>>>
>>>>>>>> For me I prefer nested for a few reasons
>>>>>>>> * easier to enforce consistency as the configs can use shared types;
>>>>>>>> in the track warnings patch I had mismatches cross configs (warn vs
>>>>>>>> warns, fail vs abort, etc.) before going nested, now everything
>>> reuses
>>>>>>>> the same types
>>>>>>>> * even though it is longer, things can be more clear how they are
>>>>>> related
>>>>>>>> * parsing layer can add support for mixed or purely flat depending on
>>>>>>>> user preference (example:
>>>>>>>> track_warnings.row_index_size.abort_threshold, using the '.' notation
>>>>>>>> to represent nested structures)
>>>>>>>>
>>>>>>>> Thoughts?
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>>>>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>>>>>
>>>>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
>>> For additional commands, e-mail:dev-help@cassandra.apache.org
>>>
>>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail:dev-help@cassandra.apache.org
>

Re: [DISCUSS] Nested YAML configs for new features

Posted by Joseph Lynch <jo...@gmail.com>.

On Wed, Nov 24, 2021 at 5:55 AM Jacek Lewandowski
<le...@gmail.com> wrote:
>
> I am just wondering how to represent in properties things like lists of
> non-scalar values?
>

In my experience properties are not sufficient for complex
configuration sorta for this reason, that's why using structured YAML
(or any structured configuration language) is so much more powerful
than a properties file. I think if we leaned into structured
configuration we'd have mostly maps of maps pointing to scalars which
are well addressed by dot encoding.

Dot encoding only works down to the first non scalar/object leaf node
and then the value needs to be structured. So a list of maps for
example would be in the value, for example in {"a": {"b": 4, "c":
[{"d": 3}, {"d": 2}]}} you'd be able to query for 'a.b' -> 4 or
'a.b.c' -> [{"d": 3}, {"d": 2}]. Single scalar values are valid JSON
so if we have to have a text -> text encoding I'd go for the key is
the dot encoded key and the value is the JSON encoded value, that's
maybe the easiest way to generically represent complex structured
configuration in a flat key->value mapping.

I think Elasticsearch's live reconfiguration API [1] which accepts dot
encoded JSON and merges with on disk YAML and Puppet's Hiera
configuration language [2] which allows you to index into YAMLs using
dot encoding are some great interfaces for us to study. The latter
even allows the user to query into lists by using a number as the key
(similar to jq[3] except without the square brackets) so you could ask
for 'a.b.c.0' and get back {"d": 3}.

-Joey

[1] https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-update-settings.html
[2] https://puppet.com/docs/puppet/6/function.html#get
[3] https://stedolan.github.io/jq/manual/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Nested YAML configs for new features

Posted by Jacek Lewandowski <le...@gmail.com>.

I am just wondering how to represent in properties things like lists of
non-scalar values?


- - -- --- ----- -------- -------------
Jacek Lewandowski


On Mon, Nov 22, 2021 at 5:16 PM Joseph Lynch <jo...@gmail.com> wrote:

> Isn't one of the primary reasons to have a YAML configuration instead
> of a properties file is to allow typed and structured (implies nested)
> configuration? I think it makes a lot of sense to group related
> configuration options (e.g. a feature) into a typed class when we're
> talking about more than one or two related options.
>
> It's pretty standard elsewhere in the JVM ecosystem to encode YAMLs to
> period encoded key->value pairs when required (usually when providing
> a property or override layer), Spring and Elasticsearch yamls both
> come to mind. It seems pretty reasonable to support dot encoding and
> decoding, for example {"a": {"b": 12}} -> '"a.b": 12'.
>
> Regarding quickly telling what configuration a node is running I think
> we should lean on virtual tables for "what is the current
> configuration" now that we have them, as others have said the written
> cassandra.yaml is not necessarily the current configuration ... and
> also grep -C or -A exist for this reason.
>
> -Joey
>
> On Mon, Nov 22, 2021 at 4:14 AM Benjamin Lerer <bl...@apache.org> wrote:
> >
> > I do not have a strong opinion for one or the other but wanted to raise
> the
> > issue I see with the "Settings" virtual table.
> >
> > Currently the "Settings" virtual table converts nested options into flat
> > options using a "_" separator. For those options it allows a user to
> query
> > the all set of options through some hack.
> > If we decide to move to more nesting (more than one level), it seems to
> me
> > that we need to change the way this table is behaving and how we can
> query
> > its data.
> >
> > We would need to start using "." as a nesting separator to ensure that
> > things are consistent between the configuration and the table and add
> > support for LIKE restrictions for filtering queries to allow operators to
> > be able to select the precise set of settings that the operator is
> looking
> > for.
> >
> > Doing so is not really complicated in itself but might impact some users.
> >
> > Le ven. 19 nov. 2021 à 22:39, David Capwell <dc...@apple.com.invalid>
> a
> > écrit :
> >
> > > > it is really handy to grep
> > > > cassandra.yaml on some config key and you know the value instantly.
> > >
> > > You can still do that
> > >
> > > $ grep -A2 coordinator_read_size conf/cassandra.yaml
> > > #     coordinator_read_size:
> > > #         warn_threshold_kb: 0
> > > #         abort_threshold_kb: 0
> > >
> > > I was also arguing we should support nested and flat, so if your infra
> > > works better with flat then you could use
> > >
> > > track_warnings.coordinator_read_size.warn_threshold_kb: 0
> > > track_warnings.coordinator_read_size.abort_threshold_kb: 0
> > >
> > > > On Nov 19, 2021, at 1:34 PM, David Capwell <dc...@apple.com>
> wrote:
> > > >
> > > >> With the flat structure it turns into properties file - would it be
> > > >> possible to support both formats - nested yaml and flat properties?
> > > >
> > > >
> > > > For majority of our configs yes, but there are a subset where flat
> > > properties is annoying
> > > >
> > > > hinted_handoff_disabled_datacenters - set type, so you could do
> > > hinted_handoff_disabled_datacenters=“a,b,c,d” but we would need to deal
> > > with separators as the format doesn’t support
> > > > seed_provider.parameters - this is a map type… so would need to do
> > > something like seed_provider.parameters=“{\”a\”: \a\”}” …. Maybe we
> special
> > > case maps as dynamic fields?  Then seed_provider.parameters.a=a?  We
> have
> > > ParameterizedClass all over the code
> > > >
> > > > So, as long as we define how to deal with java collections; we could
> in
> > > theory support properties files (not arguing for that in this thread)
> as
> > > well as system properties.
> > > >
> > > >
> > > >> On Nov 19, 2021, at 1:22 PM, Jacek Lewandowski <
> > > lewandowski.jacek@gmail.com> wrote:
> > > >>
> > > >> With the flat structure it turns into properties file - would it be
> > > >> possible to support both formats - nested yaml and flat properties?
> > > >>
> > > >>
> > > >> - - -- --- ----- -------- -------------
> > > >> Jacek Lewandowski
> > > >>
> > > >>
> > > >> On Fri, Nov 19, 2021 at 10:08 PM Caleb Rackliffe <
> > > calebrackliffe@gmail.com>
> > > >> wrote:
> > > >>
> > > >>> If it's nested, "track_warnings" would still work if you're
> grepping
> > > around
> > > >>> vim or less.
> > > >>>
> > > >>> I'd have to concede the point about grep output, although there are
> > > tools
> > > >>> like https://github.com/kislyuk/yq that could probably be bent to
> do
> > > what
> > > >>> you want.
> > > >>>
> > > >>> On Fri, Nov 19, 2021 at 1:08 PM Stefan Miklosovic <
> > > >>> stefan.miklosovic@instaclustr.com> wrote:
> > > >>>
> > > >>>> Hi David,
> > > >>>>
> > > >>>> while I do not oppose nested structure, it is really handy to grep
> > > >>>> cassandra.yaml on some config key and you know the value
> instantly.
> > > >>>> This is not possible when it is nested (easily & fastly) as it is
> on
> > > >>>> two lines. Or maybe my grepping is just not advanced enough to
> cover
> > > >>>> this case? If it is flat, I can just grep "track_warnings" and I
> have
> > > >>>> them all.
> > > >>>>
> > > >>>> Can you elaborate on your last bullet point? Parsing layer ...
> What do
> > > >>>> you mean specifically?
> > > >>>>
> > > >>>> Thanks
> > > >>>>
> > > >>>> On Fri, 19 Nov 2021 at 19:36, David Capwell <dc...@gmail.com>
> > > wrote:
> > > >>>>>
> > > >>>>> This has been brought up in a few tickets, so pushing to the dev
> > > list.
> > > >>>>>
> > > >>>>> CASSANDRA-15234 - Standardise config and JVM parameters
> > > >>>>> CASSANDRA-16896 - hard/soft limits for queries
> > > >>>>> CASSANDRA-17147 - Guardrails prototype
> > > >>>>>
> > > >>>>> In short, do we as a project wish to move "new features" into
> nested
> > > >>>>> YAML when the feature has "enough" to justify the nesting?  I
> would
> > > >>>>> really like to focus this discussion on new features rather than
> > > >>>>> retroactively grouping (leaving that to CASSANDRA-15234), as
> there is
> > > >>>>> already a place to talk about that.
> > > >>>>>
> > > >>>>> To get things started, let's start with the track-warning feature
> > > >>>>> (hard/soft limits for queries), currently the configs look as
> follows
> > > >>>>> (assuming 15234)
> > > >>>>>
> > > >>>>> track_warnings:
> > > >>>>>   enabled: true
> > > >>>>>   coordinator_read_size:
> > > >>>>>       warn_threshold: 10kb
> > > >>>>>       abort_threshold: 1mb
> > > >>>>>   local_read_size:
> > > >>>>>       warn_threshold: 10kb
> > > >>>>>       abort_threshold: 1mb
> > > >>>>>   row_index_size:
> > > >>>>>       warn_threshold: 100mb
> > > >>>>>       abort_threshold: 1gb
> > > >>>>>
> > > >>>>> or should this be "flat"
> > > >>>>>
> > > >>>>> track_warnings_enabled: true
> > > >>>>> track_warnings_coordinator_read_size_warn_threshold: 10kb
> > > >>>>> track_warnings_coordinator_read_size_abort_threshold: 1mb
> > > >>>>> track_warnings_local_read_size_warn_threshold: 10kb
> > > >>>>> track_warnings_local_read_size_abort_threshold: 1mb
> > > >>>>> track_warnings_row_index_size_warn_threshold: 100mb
> > > >>>>> track_warnings_row_index_size_abort_threshold: 1gb
> > > >>>>>
> > > >>>>> For me I prefer nested for a few reasons
> > > >>>>> * easier to enforce consistency as the configs can use shared
> types;
> > > >>>>> in the track warnings patch I had mismatches cross configs (warn
> vs
> > > >>>>> warns, fail vs abort, etc.) before going nested, now everything
> > > reuses
> > > >>>>> the same types
> > > >>>>> * even though it is longer, things can be more clear how they are
> > > >>> related
> > > >>>>> * parsing layer can add support for mixed or purely flat
> depending on
> > > >>>>> user preference (example:
> > > >>>>> track_warnings.row_index_size.abort_threshold, using the '.'
> notation
> > > >>>>> to represent nested structures)
> > > >>>>>
> > > >>>>> Thoughts?
> > > >>>>>
> > > >>>>>
> ---------------------------------------------------------------------
> > > >>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > >>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> > > >>>>>
> > > >>>>
> > > >>>>
> ---------------------------------------------------------------------
> > > >>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > >>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> > > >>>>
> > > >>>>
> > > >>>
> > > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > For additional commands, e-mail: dev-help@cassandra.apache.org
> > >
> > >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: [DISCUSS] Nested YAML configs for new features

Posted by Joseph Lynch <jo...@gmail.com>.

Isn't one of the primary reasons to have a YAML configuration instead
of a properties file is to allow typed and structured (implies nested)
configuration? I think it makes a lot of sense to group related
configuration options (e.g. a feature) into a typed class when we're
talking about more than one or two related options.

It's pretty standard elsewhere in the JVM ecosystem to encode YAMLs to
period encoded key->value pairs when required (usually when providing
a property or override layer), Spring and Elasticsearch yamls both
come to mind. It seems pretty reasonable to support dot encoding and
decoding, for example {"a": {"b": 12}} -> '"a.b": 12'.

Regarding quickly telling what configuration a node is running I think
we should lean on virtual tables for "what is the current
configuration" now that we have them, as others have said the written
cassandra.yaml is not necessarily the current configuration ... and
also grep -C or -A exist for this reason.

-Joey

On Mon, Nov 22, 2021 at 4:14 AM Benjamin Lerer <bl...@apache.org> wrote:
>
> I do not have a strong opinion for one or the other but wanted to raise the
> issue I see with the "Settings" virtual table.
>
> Currently the "Settings" virtual table converts nested options into flat
> options using a "_" separator. For those options it allows a user to query
> the all set of options through some hack.
> If we decide to move to more nesting (more than one level), it seems to me
> that we need to change the way this table is behaving and how we can query
> its data.
>
> We would need to start using "." as a nesting separator to ensure that
> things are consistent between the configuration and the table and add
> support for LIKE restrictions for filtering queries to allow operators to
> be able to select the precise set of settings that the operator is looking
> for.
>
> Doing so is not really complicated in itself but might impact some users.
>
> Le ven. 19 nov. 2021 à 22:39, David Capwell <dc...@apple.com.invalid> a
> écrit :
>
> > > it is really handy to grep
> > > cassandra.yaml on some config key and you know the value instantly.
> >
> > You can still do that
> >
> > $ grep -A2 coordinator_read_size conf/cassandra.yaml
> > #     coordinator_read_size:
> > #         warn_threshold_kb: 0
> > #         abort_threshold_kb: 0
> >
> > I was also arguing we should support nested and flat, so if your infra
> > works better with flat then you could use
> >
> > track_warnings.coordinator_read_size.warn_threshold_kb: 0
> > track_warnings.coordinator_read_size.abort_threshold_kb: 0
> >
> > > On Nov 19, 2021, at 1:34 PM, David Capwell <dc...@apple.com> wrote:
> > >
> > >> With the flat structure it turns into properties file - would it be
> > >> possible to support both formats - nested yaml and flat properties?
> > >
> > >
> > > For majority of our configs yes, but there are a subset where flat
> > properties is annoying
> > >
> > > hinted_handoff_disabled_datacenters - set type, so you could do
> > hinted_handoff_disabled_datacenters=“a,b,c,d” but we would need to deal
> > with separators as the format doesn’t support
> > > seed_provider.parameters - this is a map type… so would need to do
> > something like seed_provider.parameters=“{\”a\”: \a\”}” …. Maybe we special
> > case maps as dynamic fields?  Then seed_provider.parameters.a=a?  We have
> > ParameterizedClass all over the code
> > >
> > > So, as long as we define how to deal with java collections; we could in
> > theory support properties files (not arguing for that in this thread) as
> > well as system properties.
> > >
> > >
> > >> On Nov 19, 2021, at 1:22 PM, Jacek Lewandowski <
> > lewandowski.jacek@gmail.com> wrote:
> > >>
> > >> With the flat structure it turns into properties file - would it be
> > >> possible to support both formats - nested yaml and flat properties?
> > >>
> > >>
> > >> - - -- --- ----- -------- -------------
> > >> Jacek Lewandowski
> > >>
> > >>
> > >> On Fri, Nov 19, 2021 at 10:08 PM Caleb Rackliffe <
> > calebrackliffe@gmail.com>
> > >> wrote:
> > >>
> > >>> If it's nested, "track_warnings" would still work if you're grepping
> > around
> > >>> vim or less.
> > >>>
> > >>> I'd have to concede the point about grep output, although there are
> > tools
> > >>> like https://github.com/kislyuk/yq that could probably be bent to do
> > what
> > >>> you want.
> > >>>
> > >>> On Fri, Nov 19, 2021 at 1:08 PM Stefan Miklosovic <
> > >>> stefan.miklosovic@instaclustr.com> wrote:
> > >>>
> > >>>> Hi David,
> > >>>>
> > >>>> while I do not oppose nested structure, it is really handy to grep
> > >>>> cassandra.yaml on some config key and you know the value instantly.
> > >>>> This is not possible when it is nested (easily & fastly) as it is on
> > >>>> two lines. Or maybe my grepping is just not advanced enough to cover
> > >>>> this case? If it is flat, I can just grep "track_warnings" and I have
> > >>>> them all.
> > >>>>
> > >>>> Can you elaborate on your last bullet point? Parsing layer ... What do
> > >>>> you mean specifically?
> > >>>>
> > >>>> Thanks
> > >>>>
> > >>>> On Fri, 19 Nov 2021 at 19:36, David Capwell <dc...@gmail.com>
> > wrote:
> > >>>>>
> > >>>>> This has been brought up in a few tickets, so pushing to the dev
> > list.
> > >>>>>
> > >>>>> CASSANDRA-15234 - Standardise config and JVM parameters
> > >>>>> CASSANDRA-16896 - hard/soft limits for queries
> > >>>>> CASSANDRA-17147 - Guardrails prototype
> > >>>>>
> > >>>>> In short, do we as a project wish to move "new features" into nested
> > >>>>> YAML when the feature has "enough" to justify the nesting?  I would
> > >>>>> really like to focus this discussion on new features rather than
> > >>>>> retroactively grouping (leaving that to CASSANDRA-15234), as there is
> > >>>>> already a place to talk about that.
> > >>>>>
> > >>>>> To get things started, let's start with the track-warning feature
> > >>>>> (hard/soft limits for queries), currently the configs look as follows
> > >>>>> (assuming 15234)
> > >>>>>
> > >>>>> track_warnings:
> > >>>>>   enabled: true
> > >>>>>   coordinator_read_size:
> > >>>>>       warn_threshold: 10kb
> > >>>>>       abort_threshold: 1mb
> > >>>>>   local_read_size:
> > >>>>>       warn_threshold: 10kb
> > >>>>>       abort_threshold: 1mb
> > >>>>>   row_index_size:
> > >>>>>       warn_threshold: 100mb
> > >>>>>       abort_threshold: 1gb
> > >>>>>
> > >>>>> or should this be "flat"
> > >>>>>
> > >>>>> track_warnings_enabled: true
> > >>>>> track_warnings_coordinator_read_size_warn_threshold: 10kb
> > >>>>> track_warnings_coordinator_read_size_abort_threshold: 1mb
> > >>>>> track_warnings_local_read_size_warn_threshold: 10kb
> > >>>>> track_warnings_local_read_size_abort_threshold: 1mb
> > >>>>> track_warnings_row_index_size_warn_threshold: 100mb
> > >>>>> track_warnings_row_index_size_abort_threshold: 1gb
> > >>>>>
> > >>>>> For me I prefer nested for a few reasons
> > >>>>> * easier to enforce consistency as the configs can use shared types;
> > >>>>> in the track warnings patch I had mismatches cross configs (warn vs
> > >>>>> warns, fail vs abort, etc.) before going nested, now everything
> > reuses
> > >>>>> the same types
> > >>>>> * even though it is longer, things can be more clear how they are
> > >>> related
> > >>>>> * parsing layer can add support for mixed or purely flat depending on
> > >>>>> user preference (example:
> > >>>>> track_warnings.row_index_size.abort_threshold, using the '.' notation
> > >>>>> to represent nested structures)
> > >>>>>
> > >>>>> Thoughts?
> > >>>>>
> > >>>>> ---------------------------------------------------------------------
> > >>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > >>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> > >>>>>
> > >>>>
> > >>>> ---------------------------------------------------------------------
> > >>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > >>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> > >>>>
> > >>>>
> > >>>
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Nested YAML configs for new features

Posted by Benjamin Lerer <bl...@apache.org>.

I do not have a strong opinion for one or the other but wanted to raise the
issue I see with the "Settings" virtual table.

Currently the "Settings" virtual table converts nested options into flat
options using a "_" separator. For those options it allows a user to query
the all set of options through some hack.
If we decide to move to more nesting (more than one level), it seems to me
that we need to change the way this table is behaving and how we can query
its data.

We would need to start using "." as a nesting separator to ensure that
things are consistent between the configuration and the table and add
support for LIKE restrictions for filtering queries to allow operators to
be able to select the precise set of settings that the operator is looking
for.

Doing so is not really complicated in itself but might impact some users.

Le ven. 19 nov. 2021 à 22:39, David Capwell <dc...@apple.com.invalid> a
écrit :

> > it is really handy to grep
> > cassandra.yaml on some config key and you know the value instantly.
>
> You can still do that
>
> $ grep -A2 coordinator_read_size conf/cassandra.yaml
> #     coordinator_read_size:
> #         warn_threshold_kb: 0
> #         abort_threshold_kb: 0
>
> I was also arguing we should support nested and flat, so if your infra
> works better with flat then you could use
>
> track_warnings.coordinator_read_size.warn_threshold_kb: 0
> track_warnings.coordinator_read_size.abort_threshold_kb: 0
>
> > On Nov 19, 2021, at 1:34 PM, David Capwell <dc...@apple.com> wrote:
> >
> >> With the flat structure it turns into properties file - would it be
> >> possible to support both formats - nested yaml and flat properties?
> >
> >
> > For majority of our configs yes, but there are a subset where flat
> properties is annoying
> >
> > hinted_handoff_disabled_datacenters - set type, so you could do
> hinted_handoff_disabled_datacenters=“a,b,c,d” but we would need to deal
> with separators as the format doesn’t support
> > seed_provider.parameters - this is a map type… so would need to do
> something like seed_provider.parameters=“{\”a\”: \a\”}” …. Maybe we special
> case maps as dynamic fields?  Then seed_provider.parameters.a=a?  We have
> ParameterizedClass all over the code
> >
> > So, as long as we define how to deal with java collections; we could in
> theory support properties files (not arguing for that in this thread) as
> well as system properties.
> >
> >
> >> On Nov 19, 2021, at 1:22 PM, Jacek Lewandowski <
> lewandowski.jacek@gmail.com> wrote:
> >>
> >> With the flat structure it turns into properties file - would it be
> >> possible to support both formats - nested yaml and flat properties?
> >>
> >>
> >> - - -- --- ----- -------- -------------
> >> Jacek Lewandowski
> >>
> >>
> >> On Fri, Nov 19, 2021 at 10:08 PM Caleb Rackliffe <
> calebrackliffe@gmail.com>
> >> wrote:
> >>
> >>> If it's nested, "track_warnings" would still work if you're grepping
> around
> >>> vim or less.
> >>>
> >>> I'd have to concede the point about grep output, although there are
> tools
> >>> like https://github.com/kislyuk/yq that could probably be bent to do
> what
> >>> you want.
> >>>
> >>> On Fri, Nov 19, 2021 at 1:08 PM Stefan Miklosovic <
> >>> stefan.miklosovic@instaclustr.com> wrote:
> >>>
> >>>> Hi David,
> >>>>
> >>>> while I do not oppose nested structure, it is really handy to grep
> >>>> cassandra.yaml on some config key and you know the value instantly.
> >>>> This is not possible when it is nested (easily & fastly) as it is on
> >>>> two lines. Or maybe my grepping is just not advanced enough to cover
> >>>> this case? If it is flat, I can just grep "track_warnings" and I have
> >>>> them all.
> >>>>
> >>>> Can you elaborate on your last bullet point? Parsing layer ... What do
> >>>> you mean specifically?
> >>>>
> >>>> Thanks
> >>>>
> >>>> On Fri, 19 Nov 2021 at 19:36, David Capwell <dc...@gmail.com>
> wrote:
> >>>>>
> >>>>> This has been brought up in a few tickets, so pushing to the dev
> list.
> >>>>>
> >>>>> CASSANDRA-15234 - Standardise config and JVM parameters
> >>>>> CASSANDRA-16896 - hard/soft limits for queries
> >>>>> CASSANDRA-17147 - Guardrails prototype
> >>>>>
> >>>>> In short, do we as a project wish to move "new features" into nested
> >>>>> YAML when the feature has "enough" to justify the nesting?  I would
> >>>>> really like to focus this discussion on new features rather than
> >>>>> retroactively grouping (leaving that to CASSANDRA-15234), as there is
> >>>>> already a place to talk about that.
> >>>>>
> >>>>> To get things started, let's start with the track-warning feature
> >>>>> (hard/soft limits for queries), currently the configs look as follows
> >>>>> (assuming 15234)
> >>>>>
> >>>>> track_warnings:
> >>>>>   enabled: true
> >>>>>   coordinator_read_size:
> >>>>>       warn_threshold: 10kb
> >>>>>       abort_threshold: 1mb
> >>>>>   local_read_size:
> >>>>>       warn_threshold: 10kb
> >>>>>       abort_threshold: 1mb
> >>>>>   row_index_size:
> >>>>>       warn_threshold: 100mb
> >>>>>       abort_threshold: 1gb
> >>>>>
> >>>>> or should this be "flat"
> >>>>>
> >>>>> track_warnings_enabled: true
> >>>>> track_warnings_coordinator_read_size_warn_threshold: 10kb
> >>>>> track_warnings_coordinator_read_size_abort_threshold: 1mb
> >>>>> track_warnings_local_read_size_warn_threshold: 10kb
> >>>>> track_warnings_local_read_size_abort_threshold: 1mb
> >>>>> track_warnings_row_index_size_warn_threshold: 100mb
> >>>>> track_warnings_row_index_size_abort_threshold: 1gb
> >>>>>
> >>>>> For me I prefer nested for a few reasons
> >>>>> * easier to enforce consistency as the configs can use shared types;
> >>>>> in the track warnings patch I had mismatches cross configs (warn vs
> >>>>> warns, fail vs abort, etc.) before going nested, now everything
> reuses
> >>>>> the same types
> >>>>> * even though it is longer, things can be more clear how they are
> >>> related
> >>>>> * parsing layer can add support for mixed or purely flat depending on
> >>>>> user preference (example:
> >>>>> track_warnings.row_index_size.abort_threshold, using the '.' notation
> >>>>> to represent nested structures)
> >>>>>
> >>>>> Thoughts?
> >>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>>
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> >>>> For additional commands, e-mail: dev-help@cassandra.apache.org
> >>>>
> >>>>
> >>>
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: [DISCUSS] Nested YAML configs for new features

Posted by David Capwell <dc...@apple.com.INVALID>.

> it is really handy to grep
> cassandra.yaml on some config key and you know the value instantly.

You can still do that

$ grep -A2 coordinator_read_size conf/cassandra.yaml
#     coordinator_read_size:
#         warn_threshold_kb: 0
#         abort_threshold_kb: 0

I was also arguing we should support nested and flat, so if your infra works better with flat then you could use

track_warnings.coordinator_read_size.warn_threshold_kb: 0
track_warnings.coordinator_read_size.abort_threshold_kb: 0

> On Nov 19, 2021, at 1:34 PM, David Capwell <dc...@apple.com> wrote:
> 
>> With the flat structure it turns into properties file - would it be
>> possible to support both formats - nested yaml and flat properties?
> 
> 
> For majority of our configs yes, but there are a subset where flat properties is annoying
> 
> hinted_handoff_disabled_datacenters - set type, so you could do hinted_handoff_disabled_datacenters=“a,b,c,d” but we would need to deal with separators as the format doesn’t support
> seed_provider.parameters - this is a map type… so would need to do something like seed_provider.parameters=“{\”a\”: \a\”}” …. Maybe we special case maps as dynamic fields?  Then seed_provider.parameters.a=a?  We have ParameterizedClass all over the code
> 
> So, as long as we define how to deal with java collections; we could in theory support properties files (not arguing for that in this thread) as well as system properties.
> 
> 
>> On Nov 19, 2021, at 1:22 PM, Jacek Lewandowski <le...@gmail.com> wrote:
>> 
>> With the flat structure it turns into properties file - would it be
>> possible to support both formats - nested yaml and flat properties?
>> 
>> 
>> - - -- --- ----- -------- -------------
>> Jacek Lewandowski
>> 
>> 
>> On Fri, Nov 19, 2021 at 10:08 PM Caleb Rackliffe <ca...@gmail.com>
>> wrote:
>> 
>>> If it's nested, "track_warnings" would still work if you're grepping around
>>> vim or less.
>>> 
>>> I'd have to concede the point about grep output, although there are tools
>>> like https://github.com/kislyuk/yq that could probably be bent to do what
>>> you want.
>>> 
>>> On Fri, Nov 19, 2021 at 1:08 PM Stefan Miklosovic <
>>> stefan.miklosovic@instaclustr.com> wrote:
>>> 
>>>> Hi David,
>>>> 
>>>> while I do not oppose nested structure, it is really handy to grep
>>>> cassandra.yaml on some config key and you know the value instantly.
>>>> This is not possible when it is nested (easily & fastly) as it is on
>>>> two lines. Or maybe my grepping is just not advanced enough to cover
>>>> this case? If it is flat, I can just grep "track_warnings" and I have
>>>> them all.
>>>> 
>>>> Can you elaborate on your last bullet point? Parsing layer ... What do
>>>> you mean specifically?
>>>> 
>>>> Thanks
>>>> 
>>>> On Fri, 19 Nov 2021 at 19:36, David Capwell <dc...@gmail.com> wrote:
>>>>> 
>>>>> This has been brought up in a few tickets, so pushing to the dev list.
>>>>> 
>>>>> CASSANDRA-15234 - Standardise config and JVM parameters
>>>>> CASSANDRA-16896 - hard/soft limits for queries
>>>>> CASSANDRA-17147 - Guardrails prototype
>>>>> 
>>>>> In short, do we as a project wish to move "new features" into nested
>>>>> YAML when the feature has "enough" to justify the nesting?  I would
>>>>> really like to focus this discussion on new features rather than
>>>>> retroactively grouping (leaving that to CASSANDRA-15234), as there is
>>>>> already a place to talk about that.
>>>>> 
>>>>> To get things started, let's start with the track-warning feature
>>>>> (hard/soft limits for queries), currently the configs look as follows
>>>>> (assuming 15234)
>>>>> 
>>>>> track_warnings:
>>>>>   enabled: true
>>>>>   coordinator_read_size:
>>>>>       warn_threshold: 10kb
>>>>>       abort_threshold: 1mb
>>>>>   local_read_size:
>>>>>       warn_threshold: 10kb
>>>>>       abort_threshold: 1mb
>>>>>   row_index_size:
>>>>>       warn_threshold: 100mb
>>>>>       abort_threshold: 1gb
>>>>> 
>>>>> or should this be "flat"
>>>>> 
>>>>> track_warnings_enabled: true
>>>>> track_warnings_coordinator_read_size_warn_threshold: 10kb
>>>>> track_warnings_coordinator_read_size_abort_threshold: 1mb
>>>>> track_warnings_local_read_size_warn_threshold: 10kb
>>>>> track_warnings_local_read_size_abort_threshold: 1mb
>>>>> track_warnings_row_index_size_warn_threshold: 100mb
>>>>> track_warnings_row_index_size_abort_threshold: 1gb
>>>>> 
>>>>> For me I prefer nested for a few reasons
>>>>> * easier to enforce consistency as the configs can use shared types;
>>>>> in the track warnings patch I had mismatches cross configs (warn vs
>>>>> warns, fail vs abort, etc.) before going nested, now everything reuses
>>>>> the same types
>>>>> * even though it is longer, things can be more clear how they are
>>> related
>>>>> * parsing layer can add support for mixed or purely flat depending on
>>>>> user preference (example:
>>>>> track_warnings.row_index_size.abort_threshold, using the '.' notation
>>>>> to represent nested structures)
>>>>> 
>>>>> Thoughts?
>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>> 
>>>> 
>>> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Nested YAML configs for new features

Posted by David Capwell <dc...@apple.com.INVALID>.

> With the flat structure it turns into properties file - would it be
> possible to support both formats - nested yaml and flat properties?


For majority of our configs yes, but there are a subset where flat properties is annoying

hinted_handoff_disabled_datacenters - set type, so you could do hinted_handoff_disabled_datacenters=“a,b,c,d” but we would need to deal with separators as the format doesn’t support
seed_provider.parameters - this is a map type… so would need to do something like seed_provider.parameters=“{\”a\”: \a\”}” …. Maybe we special case maps as dynamic fields?  Then seed_provider.parameters.a=a?  We have ParameterizedClass all over the code

So, as long as we define how to deal with java collections; we could in theory support properties files (not arguing for that in this thread) as well as system properties.


> On Nov 19, 2021, at 1:22 PM, Jacek Lewandowski <le...@gmail.com> wrote:
> 
> With the flat structure it turns into properties file - would it be
> possible to support both formats - nested yaml and flat properties?
> 
> 
> - - -- --- ----- -------- -------------
> Jacek Lewandowski
> 
> 
> On Fri, Nov 19, 2021 at 10:08 PM Caleb Rackliffe <ca...@gmail.com>
> wrote:
> 
>> If it's nested, "track_warnings" would still work if you're grepping around
>> vim or less.
>> 
>> I'd have to concede the point about grep output, although there are tools
>> like https://github.com/kislyuk/yq that could probably be bent to do what
>> you want.
>> 
>> On Fri, Nov 19, 2021 at 1:08 PM Stefan Miklosovic <
>> stefan.miklosovic@instaclustr.com> wrote:
>> 
>>> Hi David,
>>> 
>>> while I do not oppose nested structure, it is really handy to grep
>>> cassandra.yaml on some config key and you know the value instantly.
>>> This is not possible when it is nested (easily & fastly) as it is on
>>> two lines. Or maybe my grepping is just not advanced enough to cover
>>> this case? If it is flat, I can just grep "track_warnings" and I have
>>> them all.
>>> 
>>> Can you elaborate on your last bullet point? Parsing layer ... What do
>>> you mean specifically?
>>> 
>>> Thanks
>>> 
>>> On Fri, 19 Nov 2021 at 19:36, David Capwell <dc...@gmail.com> wrote:
>>>> 
>>>> This has been brought up in a few tickets, so pushing to the dev list.
>>>> 
>>>> CASSANDRA-15234 - Standardise config and JVM parameters
>>>> CASSANDRA-16896 - hard/soft limits for queries
>>>> CASSANDRA-17147 - Guardrails prototype
>>>> 
>>>> In short, do we as a project wish to move "new features" into nested
>>>> YAML when the feature has "enough" to justify the nesting?  I would
>>>> really like to focus this discussion on new features rather than
>>>> retroactively grouping (leaving that to CASSANDRA-15234), as there is
>>>> already a place to talk about that.
>>>> 
>>>> To get things started, let's start with the track-warning feature
>>>> (hard/soft limits for queries), currently the configs look as follows
>>>> (assuming 15234)
>>>> 
>>>> track_warnings:
>>>>    enabled: true
>>>>    coordinator_read_size:
>>>>        warn_threshold: 10kb
>>>>        abort_threshold: 1mb
>>>>    local_read_size:
>>>>        warn_threshold: 10kb
>>>>        abort_threshold: 1mb
>>>>    row_index_size:
>>>>        warn_threshold: 100mb
>>>>        abort_threshold: 1gb
>>>> 
>>>> or should this be "flat"
>>>> 
>>>> track_warnings_enabled: true
>>>> track_warnings_coordinator_read_size_warn_threshold: 10kb
>>>> track_warnings_coordinator_read_size_abort_threshold: 1mb
>>>> track_warnings_local_read_size_warn_threshold: 10kb
>>>> track_warnings_local_read_size_abort_threshold: 1mb
>>>> track_warnings_row_index_size_warn_threshold: 100mb
>>>> track_warnings_row_index_size_abort_threshold: 1gb
>>>> 
>>>> For me I prefer nested for a few reasons
>>>> * easier to enforce consistency as the configs can use shared types;
>>>> in the track warnings patch I had mismatches cross configs (warn vs
>>>> warns, fail vs abort, etc.) before going nested, now everything reuses
>>>> the same types
>>>> * even though it is longer, things can be more clear how they are
>> related
>>>> * parsing layer can add support for mixed or purely flat depending on
>>>> user preference (example:
>>>> track_warnings.row_index_size.abort_threshold, using the '.' notation
>>>> to represent nested structures)
>>>> 
>>>> Thoughts?
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>> 
>>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Nested YAML configs for new features

Posted by Jacek Lewandowski <le...@gmail.com>.

With the flat structure it turns into properties file - would it be
possible to support both formats - nested yaml and flat properties?


- - -- --- ----- -------- -------------
Jacek Lewandowski


On Fri, Nov 19, 2021 at 10:08 PM Caleb Rackliffe <ca...@gmail.com>
wrote:

> If it's nested, "track_warnings" would still work if you're grepping around
> vim or less.
>
> I'd have to concede the point about grep output, although there are tools
> like https://github.com/kislyuk/yq that could probably be bent to do what
> you want.
>
> On Fri, Nov 19, 2021 at 1:08 PM Stefan Miklosovic <
> stefan.miklosovic@instaclustr.com> wrote:
>
> > Hi David,
> >
> > while I do not oppose nested structure, it is really handy to grep
> > cassandra.yaml on some config key and you know the value instantly.
> > This is not possible when it is nested (easily & fastly) as it is on
> > two lines. Or maybe my grepping is just not advanced enough to cover
> > this case? If it is flat, I can just grep "track_warnings" and I have
> > them all.
> >
> > Can you elaborate on your last bullet point? Parsing layer ... What do
> > you mean specifically?
> >
> > Thanks
> >
> > On Fri, 19 Nov 2021 at 19:36, David Capwell <dc...@gmail.com> wrote:
> > >
> > > This has been brought up in a few tickets, so pushing to the dev list.
> > >
> > > CASSANDRA-15234 - Standardise config and JVM parameters
> > > CASSANDRA-16896 - hard/soft limits for queries
> > > CASSANDRA-17147 - Guardrails prototype
> > >
> > > In short, do we as a project wish to move "new features" into nested
> > > YAML when the feature has "enough" to justify the nesting?  I would
> > > really like to focus this discussion on new features rather than
> > > retroactively grouping (leaving that to CASSANDRA-15234), as there is
> > > already a place to talk about that.
> > >
> > > To get things started, let's start with the track-warning feature
> > > (hard/soft limits for queries), currently the configs look as follows
> > > (assuming 15234)
> > >
> > > track_warnings:
> > >     enabled: true
> > >     coordinator_read_size:
> > >         warn_threshold: 10kb
> > >         abort_threshold: 1mb
> > >     local_read_size:
> > >         warn_threshold: 10kb
> > >         abort_threshold: 1mb
> > >     row_index_size:
> > >         warn_threshold: 100mb
> > >         abort_threshold: 1gb
> > >
> > > or should this be "flat"
> > >
> > > track_warnings_enabled: true
> > > track_warnings_coordinator_read_size_warn_threshold: 10kb
> > > track_warnings_coordinator_read_size_abort_threshold: 1mb
> > > track_warnings_local_read_size_warn_threshold: 10kb
> > > track_warnings_local_read_size_abort_threshold: 1mb
> > > track_warnings_row_index_size_warn_threshold: 100mb
> > > track_warnings_row_index_size_abort_threshold: 1gb
> > >
> > > For me I prefer nested for a few reasons
> > > * easier to enforce consistency as the configs can use shared types;
> > > in the track warnings patch I had mismatches cross configs (warn vs
> > > warns, fail vs abort, etc.) before going nested, now everything reuses
> > > the same types
> > > * even though it is longer, things can be more clear how they are
> related
> > > * parsing layer can add support for mixed or purely flat depending on
> > > user preference (example:
> > > track_warnings.row_index_size.abort_threshold, using the '.' notation
> > > to represent nested structures)
> > >
> > > Thoughts?
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > > For additional commands, e-mail: dev-help@cassandra.apache.org
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
> >
>

Re: [DISCUSS] Nested YAML configs for new features

Posted by Caleb Rackliffe <ca...@gmail.com>.

I'm on record as early as the comments in CASSANDRA-15234 in support of
nesting, and I think the biggest reason is that the structure it forces on
our config makes it more cohesive and intelligible to those trying to
understand how major features and subsystems work together. It's very easy
to look at our current flat configuration and miss an option that modifies
or in some way governs another.

On the subject of mass-grepping via ssh, I would be careful. We have a
large and growing set of hot-properties, and looking at the YAML files
might not actually reflect how those nodes are currently configured.

On Fri, Nov 19, 2021 at 3:08 PM Caleb Rackliffe <ca...@gmail.com>
wrote:

> If it's nested, "track_warnings" would still work if you're grepping
> around vim or less.
>
> I'd have to concede the point about grep output, although there are tools
> like https://github.com/kislyuk/yq that could probably be bent to do what
> you want.
>
> On Fri, Nov 19, 2021 at 1:08 PM Stefan Miklosovic <
> stefan.miklosovic@instaclustr.com> wrote:
>
>> Hi David,
>>
>> while I do not oppose nested structure, it is really handy to grep
>> cassandra.yaml on some config key and you know the value instantly.
>> This is not possible when it is nested (easily & fastly) as it is on
>> two lines. Or maybe my grepping is just not advanced enough to cover
>> this case? If it is flat, I can just grep "track_warnings" and I have
>> them all.
>>
>> Can you elaborate on your last bullet point? Parsing layer ... What do
>> you mean specifically?
>>
>> Thanks
>>
>> On Fri, 19 Nov 2021 at 19:36, David Capwell <dc...@gmail.com> wrote:
>> >
>> > This has been brought up in a few tickets, so pushing to the dev list.
>> >
>> > CASSANDRA-15234 - Standardise config and JVM parameters
>> > CASSANDRA-16896 - hard/soft limits for queries
>> > CASSANDRA-17147 - Guardrails prototype
>> >
>> > In short, do we as a project wish to move "new features" into nested
>> > YAML when the feature has "enough" to justify the nesting?  I would
>> > really like to focus this discussion on new features rather than
>> > retroactively grouping (leaving that to CASSANDRA-15234), as there is
>> > already a place to talk about that.
>> >
>> > To get things started, let's start with the track-warning feature
>> > (hard/soft limits for queries), currently the configs look as follows
>> > (assuming 15234)
>> >
>> > track_warnings:
>> >     enabled: true
>> >     coordinator_read_size:
>> >         warn_threshold: 10kb
>> >         abort_threshold: 1mb
>> >     local_read_size:
>> >         warn_threshold: 10kb
>> >         abort_threshold: 1mb
>> >     row_index_size:
>> >         warn_threshold: 100mb
>> >         abort_threshold: 1gb
>> >
>> > or should this be "flat"
>> >
>> > track_warnings_enabled: true
>> > track_warnings_coordinator_read_size_warn_threshold: 10kb
>> > track_warnings_coordinator_read_size_abort_threshold: 1mb
>> > track_warnings_local_read_size_warn_threshold: 10kb
>> > track_warnings_local_read_size_abort_threshold: 1mb
>> > track_warnings_row_index_size_warn_threshold: 100mb
>> > track_warnings_row_index_size_abort_threshold: 1gb
>> >
>> > For me I prefer nested for a few reasons
>> > * easier to enforce consistency as the configs can use shared types;
>> > in the track warnings patch I had mismatches cross configs (warn vs
>> > warns, fail vs abort, etc.) before going nested, now everything reuses
>> > the same types
>> > * even though it is longer, things can be more clear how they are
>> related
>> > * parsing layer can add support for mixed or purely flat depending on
>> > user preference (example:
>> > track_warnings.row_index_size.abort_threshold, using the '.' notation
>> > to represent nested structures)
>> >
>> > Thoughts?
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> > For additional commands, e-mail: dev-help@cassandra.apache.org
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>
>>

Re: [DISCUSS] Nested YAML configs for new features

Posted by Caleb Rackliffe <ca...@gmail.com>.

If it's nested, "track_warnings" would still work if you're grepping around
vim or less.

I'd have to concede the point about grep output, although there are tools
like https://github.com/kislyuk/yq that could probably be bent to do what
you want.

On Fri, Nov 19, 2021 at 1:08 PM Stefan Miklosovic <
stefan.miklosovic@instaclustr.com> wrote:

> Hi David,
>
> while I do not oppose nested structure, it is really handy to grep
> cassandra.yaml on some config key and you know the value instantly.
> This is not possible when it is nested (easily & fastly) as it is on
> two lines. Or maybe my grepping is just not advanced enough to cover
> this case? If it is flat, I can just grep "track_warnings" and I have
> them all.
>
> Can you elaborate on your last bullet point? Parsing layer ... What do
> you mean specifically?
>
> Thanks
>
> On Fri, 19 Nov 2021 at 19:36, David Capwell <dc...@gmail.com> wrote:
> >
> > This has been brought up in a few tickets, so pushing to the dev list.
> >
> > CASSANDRA-15234 - Standardise config and JVM parameters
> > CASSANDRA-16896 - hard/soft limits for queries
> > CASSANDRA-17147 - Guardrails prototype
> >
> > In short, do we as a project wish to move "new features" into nested
> > YAML when the feature has "enough" to justify the nesting?  I would
> > really like to focus this discussion on new features rather than
> > retroactively grouping (leaving that to CASSANDRA-15234), as there is
> > already a place to talk about that.
> >
> > To get things started, let's start with the track-warning feature
> > (hard/soft limits for queries), currently the configs look as follows
> > (assuming 15234)
> >
> > track_warnings:
> >     enabled: true
> >     coordinator_read_size:
> >         warn_threshold: 10kb
> >         abort_threshold: 1mb
> >     local_read_size:
> >         warn_threshold: 10kb
> >         abort_threshold: 1mb
> >     row_index_size:
> >         warn_threshold: 100mb
> >         abort_threshold: 1gb
> >
> > or should this be "flat"
> >
> > track_warnings_enabled: true
> > track_warnings_coordinator_read_size_warn_threshold: 10kb
> > track_warnings_coordinator_read_size_abort_threshold: 1mb
> > track_warnings_local_read_size_warn_threshold: 10kb
> > track_warnings_local_read_size_abort_threshold: 1mb
> > track_warnings_row_index_size_warn_threshold: 100mb
> > track_warnings_row_index_size_abort_threshold: 1gb
> >
> > For me I prefer nested for a few reasons
> > * easier to enforce consistency as the configs can use shared types;
> > in the track warnings patch I had mismatches cross configs (warn vs
> > warns, fail vs abort, etc.) before going nested, now everything reuses
> > the same types
> > * even though it is longer, things can be more clear how they are related
> > * parsing layer can add support for mixed or purely flat depending on
> > user preference (example:
> > track_warnings.row_index_size.abort_threshold, using the '.' notation
> > to represent nested structures)
> >
> > Thoughts?
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> > For additional commands, e-mail: dev-help@cassandra.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>
>

Re: [DISCUSS] Nested YAML configs for new features

Posted by Bowen Song <bo...@bso.ng.INVALID>.

I'm with Stefan. I prefer the flat YAML file which I can easily use grep 
to check and confirm the settings on large number of servers with 
parallel-ssh. This will be very hard to do on nested config in a YAML file.

In addition to that, I also use grep in the Cassandra source code to 
locate the relevant files based on the config name. The flat config name 
is long and unique, and this helps me efficiently navigate within the 
source code. I can imagine this is not going to work very well (if it 
works at all) with the nested config name.

p.s.: I'm not a Java developer, it will take me much longer to find the 
relevant code if grep doesn't work in the source code. It is also going 
to be harder for me to understand it if the nested config is turned into 
a Java object/class.

On 19/11/2021 19:07, Stefan Miklosovic wrote:
> Hi David,
>
> while I do not oppose nested structure, it is really handy to grep
> cassandra.yaml on some config key and you know the value instantly.
> This is not possible when it is nested (easily & fastly) as it is on
> two lines. Or maybe my grepping is just not advanced enough to cover
> this case? If it is flat, I can just grep "track_warnings" and I have
> them all.
>
> Can you elaborate on your last bullet point? Parsing layer ... What do
> you mean specifically?
>
> Thanks
>
> On Fri, 19 Nov 2021 at 19:36, David Capwell <dc...@gmail.com> wrote:
>> This has been brought up in a few tickets, so pushing to the dev list.
>>
>> CASSANDRA-15234 - Standardise config and JVM parameters
>> CASSANDRA-16896 - hard/soft limits for queries
>> CASSANDRA-17147 - Guardrails prototype
>>
>> In short, do we as a project wish to move "new features" into nested
>> YAML when the feature has "enough" to justify the nesting?  I would
>> really like to focus this discussion on new features rather than
>> retroactively grouping (leaving that to CASSANDRA-15234), as there is
>> already a place to talk about that.
>>
>> To get things started, let's start with the track-warning feature
>> (hard/soft limits for queries), currently the configs look as follows
>> (assuming 15234)
>>
>> track_warnings:
>>      enabled: true
>>      coordinator_read_size:
>>          warn_threshold: 10kb
>>          abort_threshold: 1mb
>>      local_read_size:
>>          warn_threshold: 10kb
>>          abort_threshold: 1mb
>>      row_index_size:
>>          warn_threshold: 100mb
>>          abort_threshold: 1gb
>>
>> or should this be "flat"
>>
>> track_warnings_enabled: true
>> track_warnings_coordinator_read_size_warn_threshold: 10kb
>> track_warnings_coordinator_read_size_abort_threshold: 1mb
>> track_warnings_local_read_size_warn_threshold: 10kb
>> track_warnings_local_read_size_abort_threshold: 1mb
>> track_warnings_row_index_size_warn_threshold: 100mb
>> track_warnings_row_index_size_abort_threshold: 1gb
>>
>> For me I prefer nested for a few reasons
>> * easier to enforce consistency as the configs can use shared types;
>> in the track warnings patch I had mismatches cross configs (warn vs
>> warns, fail vs abort, etc.) before going nested, now everything reuses
>> the same types
>> * even though it is longer, things can be more clear how they are related
>> * parsing layer can add support for mixed or purely flat depending on
>> user preference (example:
>> track_warnings.row_index_size.abort_threshold, using the '.' notation
>> to represent nested structures)
>>
>> Thoughts?
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org

Re: [DISCUSS] Nested YAML configs for new features

Posted by David Capwell <dc...@apple.com.INVALID>.

In org.apache.cassandra.config.YamlConfigurationLoader (and anything working on translation of configs to flat structures), we can detect this pattern and recursively get the field (similar to walking directories); main change would be in org.apache.cassandra.config.YamlConfigurationLoader.PropertiesChecker#getProperty.  The Property class acts like a Lens (https://hackage.haskell.org/package/lens), so can logically andThen them to build up the property; example

set(config, track_warnings.row_index_size.abort_threshold, 1gb) 

gets converted to

set(get(get(config, track_warnings), row_index_size), abort_thresold, 1gb)

This is an implementation detail so anything working with configs (yaml, vtable, jmx, etc.) have a consistent way of dealing with nested and flat configs.


> On Nov 19, 2021, at 11:07 AM, Stefan Miklosovic <st...@instaclustr.com> wrote:
> 
> Hi David,
> 
> while I do not oppose nested structure, it is really handy to grep
> cassandra.yaml on some config key and you know the value instantly.
> This is not possible when it is nested (easily & fastly) as it is on
> two lines. Or maybe my grepping is just not advanced enough to cover
> this case? If it is flat, I can just grep "track_warnings" and I have
> them all.
> 
> Can you elaborate on your last bullet point? Parsing layer ... What do
> you mean specifically?
> 
> Thanks
> 
> On Fri, 19 Nov 2021 at 19:36, David Capwell <dc...@gmail.com> wrote:
>> 
>> This has been brought up in a few tickets, so pushing to the dev list.
>> 
>> CASSANDRA-15234 - Standardise config and JVM parameters
>> CASSANDRA-16896 - hard/soft limits for queries
>> CASSANDRA-17147 - Guardrails prototype
>> 
>> In short, do we as a project wish to move "new features" into nested
>> YAML when the feature has "enough" to justify the nesting?  I would
>> really like to focus this discussion on new features rather than
>> retroactively grouping (leaving that to CASSANDRA-15234), as there is
>> already a place to talk about that.
>> 
>> To get things started, let's start with the track-warning feature
>> (hard/soft limits for queries), currently the configs look as follows
>> (assuming 15234)
>> 
>> track_warnings:
>>    enabled: true
>>    coordinator_read_size:
>>        warn_threshold: 10kb
>>        abort_threshold: 1mb
>>    local_read_size:
>>        warn_threshold: 10kb
>>        abort_threshold: 1mb
>>    row_index_size:
>>        warn_threshold: 100mb
>>        abort_threshold: 1gb
>> 
>> or should this be "flat"
>> 
>> track_warnings_enabled: true
>> track_warnings_coordinator_read_size_warn_threshold: 10kb
>> track_warnings_coordinator_read_size_abort_threshold: 1mb
>> track_warnings_local_read_size_warn_threshold: 10kb
>> track_warnings_local_read_size_abort_threshold: 1mb
>> track_warnings_row_index_size_warn_threshold: 100mb
>> track_warnings_row_index_size_abort_threshold: 1gb
>> 
>> For me I prefer nested for a few reasons
>> * easier to enforce consistency as the configs can use shared types;
>> in the track warnings patch I had mismatches cross configs (warn vs
>> warns, fail vs abort, etc.) before going nested, now everything reuses
>> the same types
>> * even though it is longer, things can be more clear how they are related
>> * parsing layer can add support for mixed or purely flat depending on
>> user preference (example:
>> track_warnings.row_index_size.abort_threshold, using the '.' notation
>> to represent nested structures)
>> 
>> Thoughts?
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: dev-help@cassandra.apache.org
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>

Re: [DISCUSS] Nested YAML configs for new features

Posted by Stefan Miklosovic <st...@instaclustr.com>.

Hi David,

while I do not oppose nested structure, it is really handy to grep
cassandra.yaml on some config key and you know the value instantly.
This is not possible when it is nested (easily & fastly) as it is on
two lines. Or maybe my grepping is just not advanced enough to cover
this case? If it is flat, I can just grep "track_warnings" and I have
them all.

Can you elaborate on your last bullet point? Parsing layer ... What do
you mean specifically?

Thanks

On Fri, 19 Nov 2021 at 19:36, David Capwell <dc...@gmail.com> wrote:
>
> This has been brought up in a few tickets, so pushing to the dev list.
>
> CASSANDRA-15234 - Standardise config and JVM parameters
> CASSANDRA-16896 - hard/soft limits for queries
> CASSANDRA-17147 - Guardrails prototype
>
> In short, do we as a project wish to move "new features" into nested
> YAML when the feature has "enough" to justify the nesting?  I would
> really like to focus this discussion on new features rather than
> retroactively grouping (leaving that to CASSANDRA-15234), as there is
> already a place to talk about that.
>
> To get things started, let's start with the track-warning feature
> (hard/soft limits for queries), currently the configs look as follows
> (assuming 15234)
>
> track_warnings:
>     enabled: true
>     coordinator_read_size:
>         warn_threshold: 10kb
>         abort_threshold: 1mb
>     local_read_size:
>         warn_threshold: 10kb
>         abort_threshold: 1mb
>     row_index_size:
>         warn_threshold: 100mb
>         abort_threshold: 1gb
>
> or should this be "flat"
>
> track_warnings_enabled: true
> track_warnings_coordinator_read_size_warn_threshold: 10kb
> track_warnings_coordinator_read_size_abort_threshold: 1mb
> track_warnings_local_read_size_warn_threshold: 10kb
> track_warnings_local_read_size_abort_threshold: 1mb
> track_warnings_row_index_size_warn_threshold: 100mb
> track_warnings_row_index_size_abort_threshold: 1gb
>
> For me I prefer nested for a few reasons
> * easier to enforce consistency as the configs can use shared types;
> in the track warnings patch I had mismatches cross configs (warn vs
> warns, fail vs abort, etc.) before going nested, now everything reuses
> the same types
> * even though it is longer, things can be more clear how they are related
> * parsing layer can add support for mixed or purely flat depending on
> user preference (example:
> track_warnings.row_index_size.abort_threshold, using the '.' notation
> to represent nested structures)
>
> Thoughts?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: dev-help@cassandra.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@cassandra.apache.org
For additional commands, e-mail: dev-help@cassandra.apache.org