You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cassandra.apache.org by Ekaterina Dimitrova <e....@gmail.com> on 2022/04/05 21:24:37 UTC

PLEASE READ ME! IMPORTANT!

ATTENTION PLEASE Below email will be long but I believe you will agree it
deserves attention for good reasons. Thank you in advance for your time and
consideration!

Hi everyone,

I am working on the last batch of config parameters to be transferred to
the new types after CASSANDRA-15234 landed.

This led to a few questions and concerns and here I am to raise awareness
and one more time to confirm things for the community to ensure there is no
regression when 4.1 is out.

As the main decisions were taken and implemented two years ago before even
4.0 beta, I would like to ensure that everyone here is aware that in
CASSANDRA-15234 we have backward compatibility to transfer the old format
(only values, no opportunity to change unit) to the new format (value +
unit) but it falls back to the new types at the end. This is not a feature
with a flag to be disabled. We migrated the parameters to the new types in
Config class and in cassandra.yaml. Also, we explicitly say we don’t
support negative values (old and new yaml config) which was not the case
before. We never advertised the usage of negative values but we also did
not prohibit that which we do now as one more improvement in CASSANDRA-15234.
I will add an explicit note in NEWS.txt to be sure no one was expecting to
keep on using negative values with the old yaml if they were doing it for
some unknown reason, old/new - we prohibit on trunk negative values except
special cases.

As part of the documentation I mentioned the Converter class which serves
to do these conversions between old value and new value - load the old
value to the new Config types. There are special cases which I want to
highlight in case someone knows of any behavior not caught by our CI that
might be changed or broken because of those changes(better safe than sorry):

   -

   The new types DataRateSpec, DurationSpec and DataRateSpec accept only
   non-negative values (this will affect both old and new config). Thus as
   part of the converters we have the following special cases:

- MILLIS_DOUBLE_DURATION for commitlog_sync_group_window_in_ms or now
already commitlog_sync_group_window which covers that this was double and
even if we think this was a bug as it is casted later to int here[1],
someone might have been using double and NaN. Disallow less than 0 as we
already said for all types. With the new config types this parameter is
stored as long.

- MILLIS_CUSTOM_DURATION for permissions_update_interval,
roles_update_interval and credentials_update_interval. After CASSANDRA-17431
those will be updated to - old value of “-1” being translated to null. The
setters will be doing that too. Anything below -1 is prohibited and
considered a bug (both with old and new config).

- we are adding NEGATIVE_SECONDS_DURATION as part of C17431 to handle
validation_preview_purge_head_start. Anyone using the old yaml and value <0
will have it translated into 0 seconds. New yaml and format prohibits
values less than 0 and the smallest unit for this parameter is seconds.

- BYTES_CUSTOM_DATA_STORAGE translates “-1” to “null” and prohibits
anything less than -1 with the old yaml and less than 0 with the new one
for native_transport_max_concurrent_requests_in_bytes_per_ip and
native_transport_max_concurrent_requests_in_bytes after C17341. I had also
separate mail for these two as we have some concerns about them for all C*
versions.

   -

   In C17431 I also decided to leave parameters phi_convict_threshold,
   memtable_cleanup_threshold, block_for_peers_timeout_in_secs alone and
   not to migrate them to the new Config types because of various reasons
   explained in the ticket. Please let me know if you have any particular
   suggestions/concerns/thoughts around those.
   -

   DataRateSpec - it was requested to support mebibytes/s, kibibytes/s,
   bytes/s. This made things complicated internally with two of the parameters
   introduced in 4.0 which were in megabits/s and I had to keep the old
   behavior and them being able to still support megabits/s with the old
   format. As the conversion between megabits and mebibytes is not really a
   whole number, I had to store the DataRateSpec in double and not long as in
   the other types in order to ensure accurate precision, etc. We considered
   this being fine because the RateLimiter uses double and we still allow only
   whole numbers to be provided by the users. Please let me know if you see
   any problem. A quick unit test where you can validate how those work  is
   SetGetInterDCStreamThroughputTest. Important to mention that once people
   move to the new format, they can’t assign them anything less than 1MiB/s.
   -

   Regarding DataRateSpec - the two new parameters (
   entire_sstable_stream_throughput_outbound_megabits_per_sec and
   entire_sstable_inter_dc_stream_throughput_outbound_megabits_per_sec)
   still not being in release were changed to be in MiB/s. Checked with
   Francisoco in Slack - that should be fine.
   -

   I would also like to ask Stefan and Paulo to check if they do not agree
   with anything in the latest version of DurationSpec as they have
   introduced the initial version of Duration in the codebase which evolved
   into DurationSpec in C15234. Please let me know if there is anything
   else aside from the tests you introduced that I might be missing and it
   changes your intentions.  Already sent a msg in Slack in case they miss the
   email.



   -

   Also, it was discussed on the ticket but I wanted to add it again for
   consideration - virtual tables currently show both the old names and format
   and new names and format. There are only three parameters which change only
   type but not names. Those will be listed only with the new format value.
   This was not considered as enough reason to bump to 5.0 because of
   breaking change but I wanted one more time everyone to be aware of that
   agreement.



   -

   In addition, I will add one more note to NEWS.txt for people to raise
   awareness to check carefully their config on upgrade that there are no
   breaking changes. We already mention the new option and the backward
   compatibility but I think it won’t hurt to stress more on the part we
   prohibit negative values considering it being a bug and have only a few
   special cases around -1 that we convert.



   -

   Please check the matching patterns we use to accept config values in
   DurationSpec, DataRateSpec and DataStorageSpec and let me know if you
   think there was some special case missed or anything.



   -

   One more topic I want to mention is changes to public classes. There
   were some name changes (not to JMX) and out of this work I’ve seen such
   changes even in patch releases. I know we give a promise not to change
   only interfaces but I still want to mention it here and to ensure we are
   aligned. As I said, Config changes the types of our parameters. We use
   annotations and the mentioned converters for backward compatibility of our
   yaml file.



   -

   New JMX methods are not added and the old ones still work by converting
   to the base unit of the specific parameter. By base I mean the internally
   used one. The agreement was that in the future virtual tables should handle
   the new format when we add the update option for SettingsTable.



   -

   In order not to run into precision issues, we introduced the Smallest
   possible unit for certain parameters of type Duration and DataStorage,
   that can be changed if we decide to migrate any of those in time to the
   smallest possible unit internally. I am also adding their Int equivalents
   now to improve the former int parameters handling.
   -

   We didn’t find any parameters with suffix _mb or _kb that internally
   didn’t mean kibi- or mebi- but kilo- or mega-. So all default values are
   the same in the new format.


For more details - please refer to the write up doc I shared before here -
https://cassandra.apache.org/doc/trunk/cassandra/new/configuration.html
There is important note and ticket around parameters overloading which
turned to be undocumented behavior in the project that people use.


   -

   Last but not least - I want to remind you that any new parameters are
   added with the new types. Users will have to set them with their new
   format. So whoever wants to use old yaml and wants to update default values
   of new parameters will have to do this in the new format.


This email came rather long but I truly believe it is important and I want
to ask you if there were parameters you think they might have been
affected, to double check them and raise the flag if you are not sure or
worried about something done or not done.

[1]
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/GroupCommitLogService.java#L31


      Ekaterina Dimitrova

Re: PLEASE READ ME! IMPORTANT!

Posted by Ekaterina Dimitrova <e....@gmail.com>.
Thank you Stefan.
“  Other than that, I do not remember I have touched /
worked with any other mentioned parameter she elaborated on between
4.0 and 4.1 so I can not comment on that.”

Not sure what do you mean here as the parameters which are affected are all
duration, data rate and data storage. It’s not about post-4.0 new
parameters only. Some of those migrated to the new types which will
prohibit negatives, etc (no matter if you use old or new yaml) in the new
release are probably there since the beginning of Cassandra.

On Tue, 5 Apr 2022 at 18:05, Stefan Miklosovic <
stefan.miklosovic@instaclustr.com> wrote:

> I have checked with Ekaterina, we are ok with what she asked us to
> take a look at. Other than that, I do not remember I have touched /
> worked with any other mentioned parameter she elaborated on between
> 4.0 and 4.1 so I can not comment on that.
>
> On Tue, 5 Apr 2022 at 23:24, Ekaterina Dimitrova <e....@gmail.com>
> wrote:
> >
> > ATTENTION PLEASE Below email will be long but I believe you will agree
> it deserves attention for good reasons. Thank you in advance for your time
> and consideration!
> >
> >
> > Hi everyone,
> >
> > I am working on the last batch of config parameters to be transferred to
> the new types after CASSANDRA-15234 landed.
> >
> > This led to a few questions and concerns and here I am to raise
> awareness and one more time to confirm things for the community to ensure
> there is no regression when 4.1 is out.
> >
> > As the main decisions were taken and implemented two years ago before
> even 4.0 beta, I would like to ensure that everyone here is aware that in
> CASSANDRA-15234 we have backward compatibility to transfer the old format
> (only values, no opportunity to change unit) to the new format (value +
> unit) but it falls back to the new types at the end. This is not a feature
> with a flag to be disabled. We migrated the parameters to the new types in
> Config class and in cassandra.yaml. Also, we explicitly say we don’t
> support negative values (old and new yaml config) which was not the case
> before. We never advertised the usage of negative values but we also did
> not prohibit that which we do now as one more improvement in
> CASSANDRA-15234. I will add an explicit note in NEWS.txt to be sure no one
> was expecting to keep on using negative values with the old yaml if they
> were doing it for some unknown reason, old/new - we prohibit on trunk
> negative values except special cases.
> >
> > As part of the documentation I mentioned the Converter class which
> serves to do these conversions between old value and new value - load the
> old value to the new Config types. There are special cases which I want to
> highlight in case someone knows of any behavior not caught by our CI that
> might be changed or broken because of those changes(better safe than sorry):
> >
> > The new types DataRateSpec, DurationSpec and DataRateSpec accept only
> non-negative values (this will affect both old and new config). Thus as
> part of the converters we have the following special cases:
> >
> > - MILLIS_DOUBLE_DURATION for commitlog_sync_group_window_in_ms or now
> already commitlog_sync_group_window which covers that this was double and
> even if we think this was a bug as it is casted later to int here[1],
> someone might have been using double and NaN. Disallow less than 0 as we
> already said for all types. With the new config types this parameter is
> stored as long.
> >
> > - MILLIS_CUSTOM_DURATION for permissions_update_interval,
> roles_update_interval and credentials_update_interval. After
> CASSANDRA-17431 those will be updated to - old value of “-1” being
> translated to null. The setters will be doing that too. Anything below -1
> is prohibited and considered a bug (both with old and new config).
> >
> > - we are adding NEGATIVE_SECONDS_DURATION as part of C17431 to handle
> validation_preview_purge_head_start. Anyone using the old yaml and value <0
> will have it translated into 0 seconds. New yaml and format prohibits
> values less than 0 and the smallest unit for this parameter is seconds.
> >
> > - BYTES_CUSTOM_DATA_STORAGE translates “-1” to “null” and prohibits
> anything less than -1 with the old yaml and less than 0 with the new one
> for native_transport_max_concurrent_requests_in_bytes_per_ip and
> native_transport_max_concurrent_requests_in_bytes after C17341. I had also
> separate mail for these two as we have some concerns about them for all C*
> versions.
> >
> > In C17431 I also decided to leave parameters phi_convict_threshold,
> memtable_cleanup_threshold, block_for_peers_timeout_in_secs alone and not
> to migrate them to the new Config types because of various reasons
> explained in the ticket. Please let me know if you have any particular
> suggestions/concerns/thoughts around those.
> >
> > DataRateSpec - it was requested to support mebibytes/s, kibibytes/s,
> bytes/s. This made things complicated internally with two of the parameters
> introduced in 4.0 which were in megabits/s and I had to keep the old
> behavior and them being able to still support megabits/s with the old
> format. As the conversion between megabits and mebibytes is not really a
> whole number, I had to store the DataRateSpec in double and not long as in
> the other types in order to ensure accurate precision, etc. We considered
> this being fine because the RateLimiter uses double and we still allow only
> whole numbers to be provided by the users. Please let me know if you see
> any problem. A quick unit test where you can validate how those work  is
> SetGetInterDCStreamThroughputTest. Important to mention that once people
> move to the new format, they can’t assign them anything less than 1MiB/s.
> >
> > Regarding DataRateSpec - the two new parameters
> (entire_sstable_stream_throughput_outbound_megabits_per_sec and
> entire_sstable_inter_dc_stream_throughput_outbound_megabits_per_sec) still
> not being in release were changed to be in MiB/s. Checked with Francisoco
> in Slack - that should be fine.
> >
> > I would also like to ask Stefan and Paulo to check if they do not agree
> with anything in the latest version of DurationSpec as they have introduced
> the initial version of Duration in the codebase which evolved into
> DurationSpec in C15234. Please let me know if there is anything else aside
> from the tests you introduced that I might be missing and it changes your
> intentions.  Already sent a msg in Slack in case they miss the email.
> >
> >
> > Also, it was discussed on the ticket but I wanted to add it again for
> consideration - virtual tables currently show both the old names and format
> and new names and format. There are only three parameters which change only
> type but not names. Those will be listed only with the new format value.
> This was not considered as enough reason to bump to 5.0 because of breaking
> change but I wanted one more time everyone to be aware of that agreement.
> >
> >
> > In addition, I will add one more note to NEWS.txt for people to raise
> awareness to check carefully their config on upgrade that there are no
> breaking changes. We already mention the new option and the backward
> compatibility but I think it won’t hurt to stress more on the part we
> prohibit negative values considering it being a bug and have only a few
> special cases around -1 that we convert.
> >
> >
> > Please check the matching patterns we use to accept config values in
> DurationSpec, DataRateSpec and DataStorageSpec and let me know if you think
> there was some special case missed or anything.
> >
> >
> > One more topic I want to mention is changes to public classes. There
> were some name changes (not to JMX) and out of this work I’ve seen such
> changes even in patch releases. I know we give a promise not to change only
> interfaces but I still want to mention it here and to ensure we are
> aligned. As I said, Config changes the types of our parameters. We use
> annotations and the mentioned converters for backward compatibility of our
> yaml file.
> >
> >
> > New JMX methods are not added and the old ones still work by converting
> to the base unit of the specific parameter. By base I mean the internally
> used one. The agreement was that in the future virtual tables should handle
> the new format when we add the update option for SettingsTable.
> >
> >
> > In order not to run into precision issues, we introduced the Smallest
> possible unit for certain parameters of type Duration and DataStorage, that
> can be changed if we decide to migrate any of those in time to the smallest
> possible unit internally. I am also adding their Int equivalents now to
> improve the former int parameters handling.
> >
> > We didn’t find any parameters with suffix _mb or _kb that internally
> didn’t mean kibi- or mebi- but kilo- or mega-. So all default values are
> the same in the new format.
> >
> >
> > For more details - please refer to the write up doc I shared before here
> - https://cassandra.apache.org/doc/trunk/cassandra/new/configuration.html
> There is important note and ticket around parameters overloading which
> turned to be undocumented behavior in the project that people use.
> >
> >
> > Last but not least - I want to remind you that any new parameters are
> added with the new types. Users will have to set them with their new
> format. So whoever wants to use old yaml and wants to update default values
> of new parameters will have to do this in the new format.
> >
> >
> > This email came rather long but I truly believe it is important and I
> want to ask you if there were parameters you think they might have been
> affected, to double check them and raise the flag if you are not sure or
> worried about something done or not done.
> >
> >
> > [1]
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/GroupCommitLogService.java#L31
> >
> >
> >
> >       Ekaterina Dimitrova
>

Re: PLEASE READ ME! IMPORTANT!

Posted by Stefan Miklosovic <st...@instaclustr.com>.
I have checked with Ekaterina, we are ok with what she asked us to
take a look at. Other than that, I do not remember I have touched /
worked with any other mentioned parameter she elaborated on between
4.0 and 4.1 so I can not comment on that.

On Tue, 5 Apr 2022 at 23:24, Ekaterina Dimitrova <e....@gmail.com> wrote:
>
> ATTENTION PLEASE Below email will be long but I believe you will agree it deserves attention for good reasons. Thank you in advance for your time and consideration!
>
>
> Hi everyone,
>
> I am working on the last batch of config parameters to be transferred to the new types after CASSANDRA-15234 landed.
>
> This led to a few questions and concerns and here I am to raise awareness and one more time to confirm things for the community to ensure there is no regression when 4.1 is out.
>
> As the main decisions were taken and implemented two years ago before even 4.0 beta, I would like to ensure that everyone here is aware that in CASSANDRA-15234 we have backward compatibility to transfer the old format (only values, no opportunity to change unit) to the new format (value + unit) but it falls back to the new types at the end. This is not a feature with a flag to be disabled. We migrated the parameters to the new types in Config class and in cassandra.yaml. Also, we explicitly say we don’t support negative values (old and new yaml config) which was not the case before. We never advertised the usage of negative values but we also did not prohibit that which we do now as one more improvement in CASSANDRA-15234. I will add an explicit note in NEWS.txt to be sure no one was expecting to keep on using negative values with the old yaml if they were doing it for some unknown reason, old/new - we prohibit on trunk negative values except special cases.
>
> As part of the documentation I mentioned the Converter class which serves to do these conversions between old value and new value - load the old value to the new Config types. There are special cases which I want to highlight in case someone knows of any behavior not caught by our CI that might be changed or broken because of those changes(better safe than sorry):
>
> The new types DataRateSpec, DurationSpec and DataRateSpec accept only non-negative values (this will affect both old and new config). Thus as part of the converters we have the following special cases:
>
> - MILLIS_DOUBLE_DURATION for commitlog_sync_group_window_in_ms or now already commitlog_sync_group_window which covers that this was double and even if we think this was a bug as it is casted later to int here[1], someone might have been using double and NaN. Disallow less than 0 as we already said for all types. With the new config types this parameter is stored as long.
>
> - MILLIS_CUSTOM_DURATION for permissions_update_interval, roles_update_interval and credentials_update_interval. After CASSANDRA-17431 those will be updated to - old value of “-1” being translated to null. The setters will be doing that too. Anything below -1 is prohibited and considered a bug (both with old and new config).
>
> - we are adding NEGATIVE_SECONDS_DURATION as part of C17431 to handle validation_preview_purge_head_start. Anyone using the old yaml and value <0 will have it translated into 0 seconds. New yaml and format prohibits values less than 0 and the smallest unit for this parameter is seconds.
>
> - BYTES_CUSTOM_DATA_STORAGE translates “-1” to “null” and prohibits anything less than -1 with the old yaml and less than 0 with the new one for native_transport_max_concurrent_requests_in_bytes_per_ip and native_transport_max_concurrent_requests_in_bytes after C17341. I had also separate mail for these two as we have some concerns about them for all C* versions.
>
> In C17431 I also decided to leave parameters phi_convict_threshold, memtable_cleanup_threshold, block_for_peers_timeout_in_secs alone and not to migrate them to the new Config types because of various reasons explained in the ticket. Please let me know if you have any particular suggestions/concerns/thoughts around those.
>
> DataRateSpec - it was requested to support mebibytes/s, kibibytes/s, bytes/s. This made things complicated internally with two of the parameters introduced in 4.0 which were in megabits/s and I had to keep the old behavior and them being able to still support megabits/s with the old format. As the conversion between megabits and mebibytes is not really a whole number, I had to store the DataRateSpec in double and not long as in the other types in order to ensure accurate precision, etc. We considered this being fine because the RateLimiter uses double and we still allow only whole numbers to be provided by the users. Please let me know if you see any problem. A quick unit test where you can validate how those work  is SetGetInterDCStreamThroughputTest. Important to mention that once people move to the new format, they can’t assign them anything less than 1MiB/s.
>
> Regarding DataRateSpec - the two new parameters (entire_sstable_stream_throughput_outbound_megabits_per_sec and entire_sstable_inter_dc_stream_throughput_outbound_megabits_per_sec) still not being in release were changed to be in MiB/s. Checked with Francisoco in Slack - that should be fine.
>
> I would also like to ask Stefan and Paulo to check if they do not agree with anything in the latest version of DurationSpec as they have introduced the initial version of Duration in the codebase which evolved into DurationSpec in C15234. Please let me know if there is anything else aside from the tests you introduced that I might be missing and it changes your intentions.  Already sent a msg in Slack in case they miss the email.
>
>
> Also, it was discussed on the ticket but I wanted to add it again for consideration - virtual tables currently show both the old names and format and new names and format. There are only three parameters which change only type but not names. Those will be listed only with the new format value. This was not considered as enough reason to bump to 5.0 because of breaking change but I wanted one more time everyone to be aware of that agreement.
>
>
> In addition, I will add one more note to NEWS.txt for people to raise awareness to check carefully their config on upgrade that there are no breaking changes. We already mention the new option and the backward compatibility but I think it won’t hurt to stress more on the part we prohibit negative values considering it being a bug and have only a few special cases around -1 that we convert.
>
>
> Please check the matching patterns we use to accept config values in DurationSpec, DataRateSpec and DataStorageSpec and let me know if you think there was some special case missed or anything.
>
>
> One more topic I want to mention is changes to public classes. There were some name changes (not to JMX) and out of this work I’ve seen such changes even in patch releases. I know we give a promise not to change only interfaces but I still want to mention it here and to ensure we are aligned. As I said, Config changes the types of our parameters. We use annotations and the mentioned converters for backward compatibility of our yaml file.
>
>
> New JMX methods are not added and the old ones still work by converting to the base unit of the specific parameter. By base I mean the internally used one. The agreement was that in the future virtual tables should handle the new format when we add the update option for SettingsTable.
>
>
> In order not to run into precision issues, we introduced the Smallest possible unit for certain parameters of type Duration and DataStorage, that can be changed if we decide to migrate any of those in time to the smallest possible unit internally. I am also adding their Int equivalents now to improve the former int parameters handling.
>
> We didn’t find any parameters with suffix _mb or _kb that internally didn’t mean kibi- or mebi- but kilo- or mega-. So all default values are the same in the new format.
>
>
> For more details - please refer to the write up doc I shared before here - https://cassandra.apache.org/doc/trunk/cassandra/new/configuration.html There is important note and ticket around parameters overloading which turned to be undocumented behavior in the project that people use.
>
>
> Last but not least - I want to remind you that any new parameters are added with the new types. Users will have to set them with their new format. So whoever wants to use old yaml and wants to update default values of new parameters will have to do this in the new format.
>
>
> This email came rather long but I truly believe it is important and I want to ask you if there were parameters you think they might have been affected, to double check them and raise the flag if you are not sure or worried about something done or not done.
>
>
> [1] https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/commitlog/GroupCommitLogService.java#L31
>
>
>
>       Ekaterina Dimitrova