You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Bryan Beaudreault <bb...@hubspot.com.INVALID> on 2022/03/15 15:28:10 UTC

[DISCUSS] How to expose configs for users to tune

Hi devs,

As a major user of hbase, my company has thousands of clients deployed
which use the hbase client to connect to a variety of hbase clusters. We
have a common library which handles configuring all clients by setting up
the Configuration object prior to creating a Connection. Our library sets
configurations using the various configs in HConstants, but there are also
a bunch of configs which don't exist in HConstants. For these we have
hardcoded config strings in our client.

We're now working on an hbase upgrade and need to go through our client
library and check how the configs may have changed in the new version. This
is relatively easy to do for those HConstants cases -- configs may be
marked @Deprecated which will show up in one's editor, they may be removed
entirely which would show up is a compile error, and otherwise one can
easily click through or bring up the javadoc. For the others that don't
exist in HConstants, we need to go manually search the hbase codebase for
those strings.

Without doing this painstaking manual process, we would potentially deploy
the upgraded client with configs which are no longer used or deprecated by
the hbase client. For those using HConstants, this is immediately obvious
because the HConstant field may have been removed. This is a clear
indication of needing to investigate the config. In this case it's
preferred to face the compile failure because it's clearer than having
something silently disappear or change.

I opened 3 jiras to move some fields to HConstants, but got some reasonable
pushback from Duo:

https://issues.apache.org/jira/browse/HBASE-26845
https://issues.apache.org/jira/browse/HBASE-26846
https://issues.apache.org/jira/browse/HBASE-26847

Duo's pushback is that HConstants is an anti-pattern and these configs are
not part of our public API. I can agree that a catch-all constants class
might be an anti-pattern, but would argue that consolidating configs there
is very useful for end-users.  I can also potentially agree that exposing
these as part of our public API might limit the flexibility of development
due to compatibility constraints about IA.Public.

To me it seems odd to add a configuration, whose whole point is to make
something tuneable, but then bury it in a private class despite having real
implications for how the application runs. If a configuration is not meant
to be tuned, it shouldn't be a configuration at all. Otherwise it should be
exposed for reference.

I'm wondering if there is some compromise we can achieve which makes it
easier for end-users to integrate with tunable configs.

One can imagine a large project to clean up all of our configs under some
new class with maybe IA.LimitedPrivate(CONFIG), but I fear making perfect
(needing to migrate all configs) the enemy of good.

A better option might be to make those classes which expose configs
LimitedPrivate(CONFIG) -- for example AsyncProcess and
ConnectionImplementation. That might be the most incremental change we
could make. We could handle this on a case-by-case basis.

Does anyone have any thoughts?

Re: [DISCUSS] How to expose configs for users to tune

Posted by Andrew Purtell <an...@gmail.com>.
As discussed on the thread, the idea was we would also tweak the definition of LP(CONFIG). I am not opposed to making them Public. 


> On Mar 24, 2022, at 7:19 PM, 张铎 <pa...@gmail.com> wrote:
> 
> IMO if we want to have a constant class for this, we should mark it as
> IA.Public. IA.LP(Config) just means the class itself can be referenced in
> config files.
> 
> Bryan Beaudreault <bb...@hubspot.com.invalid> 于2022年3月25日周五 04:25写道:
> 
>> Yes but I don't see a major issue with promoting to LP(Config)? I thought
>> the agreement here was that LP(Config) was basically "exposed for config
>> constants", but maybe I misunderstood.
>> 
>> Also ConnectionConfiguration is a relatively basic and lightweight wrapper
>> around configs. I think it's a perfect example of what this LP(Config) API
>> could look like.
>> 
>>> On Thu, Mar 24, 2022 at 3:47 PM Nick Dimiduk <nd...@apache.org> wrote:
>>> 
>>> One sticking point: ConnectionConfiguration and
>>> AsyncConnectionConfiguration are both IA.Private.
>>> 
>>> On Thu, Mar 24, 2022 at 17:21 Bryan Beaudreault
>>> <bb...@hubspot.com.invalid> wrote:
>>> 
>>>> I realized there already exists a good candidate for this in
>> hbase-client
>>>> -- ConnectionConfiguration. My latest commit in
>>>> https://github.com/apache/hbase/pull/4180
>>> <https://github.com/apache/hbase/pull/4180>
>>> adds a new config constant there
>>>> and marks it as LP(Config), but I'd also be happy to revert that part
>> of
>>>> the PR and instead handle that in a dedicated jira for this topic if
>>>> desired.
>>>> 
>>>> On Thu, Mar 24, 2022 at 11:08 AM Bryan Beaudreault <
>>>> bbeaudreault@hubspot.com>
>>>> wrote:
>>>> 
>>>>> Thank you both for the input here!
>>>>> 
>>>>> It seems like we've come to the conclusion that it would be ok to:
>>>>> 
>>>>> - Update docs around LP(CONFIG) to say that it also encompasses some
>>>>> module-aggregated classes which hold config constants
>>>>> - Maybe document this convention elsewhere for contributors as well,
>>>>> though need to look into where (guidance welcome, but will look
>> around)
>>>>> - I can create the first constant class in hbase-client for the
>>> use-cases
>>>>> in my jira. I also have another config constant I've added in
>>>>> https://github.com/apache/hbase/pull/4180
>>> <https://github.com/apache/hbase/pull/4180>
>>> prior to this discussion, so
>>>>> may want to use that class there.
>>>>> - Look into updates to tooling to audit the LP(CONFIG) classes
>>>>> 
>>>>> If this sounds good, I'll create a JIRA to track the work. The first
>> 3
>>>>> todos will be relatively easy, and I'll look into options for the
>> audit
>>>> tool
>>>>> 
>>>>> 
>>>>> On Mon, Mar 21, 2022 at 12:14 PM Andrew Purtell <
>>>> andrew.purtell@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> Agreed the definition of LP(CONFIG) would need to be tweaked.
>>>>>> 
>>>>>> We do not quite have enough support for analyzing configuration key
>>> set
>>>>>> changes in the current API audit tool. Removal of a public constant
>>>> field
>>>>>> from an LP annotated class would be flagged but a modification of
>> the
>>>>>> constant would not. However it is a OSS project consisting of Perl
>>>> scripts
>>>>>> that might accept a contribution or at least could be patched or
>>>> extended.
>>>>>> Or we can build a new audit tool for the purpose, which is what I
>>> would
>>>>>> recommend. The Perl based tool shells out to javah and is quite
>>>> expensive,
>>>>>> and on some platforms, depending on comparison, can fail due to
>>> command
>>>>>> line length limits. A Java tool would likely be more efficient at
>>>>>> processing Java class annotations and introspecting string
>> constants.
>>>>>> 
>>>>>>> 
>>>>>>> On Mar 21, 2022, at 8:52 AM, Nick Dimiduk <nd...@apache.org>
>>>> wrote:
>>>>>>> 
>>>>>>> On Mon, Mar 21, 2022 at 4:32 PM Andrew Purtell <
>>>>>> andrew.purtell@gmail.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Although collecting all configuration keys into a single file is
>>>>>>>> definitely an anti-pattern I’m not sure the same is true of
>> package
>>>> or
>>>>>>>> Maven module level aggregation classes marked LP(CONFIG).Somewhat
>>>> like
>>>>>>>> DFSConfigKeys but geared toward our API/release auditing.
>>>>>>>> 
>>>>>>>> This would seem virtuous for a couple of reasons. Relevant
>>>>>> configuration
>>>>>>>> key constants for the package or module would be grouped in a
>> well
>>>>>> known
>>>>>>>> place for users and developers alike. The LP(CONFIG) designation
>>>> would
>>>>>>>> require developers to think about deprecation cycle if
>>> contemplating
>>>> a
>>>>>>>> change, thus providing some back pressure against snap decisions.
>>> Or,
>>>>>> if
>>>>>>>> not then, then at release candidate evaluation time, user
>>>> configuration
>>>>>>>> breaking changes could be caught be a release automation tool
>> that
>>>>>> diffs
>>>>>>>> LP(CONFIG) annotated classes. Something like this would improve
>> the
>>>>>> state
>>>>>>>> of configuration key management quite dramatically, because
>>> currently
>>>>>> it’s
>>>>>>>> ad hoc.
>>>>>>>> 
>>>>>>> 
>>>>>>> I am supportive of trying such an effort. We'll need to tweak the
>>>>>> meaning
>>>>>>> of LP(CONFIG) as we define it currently, but that can be done. I
>>> don't
>>>>>> know
>>>>>>> what our current tools do or assume regarding this annotation. I
>>> think
>>>>>>> there is something custom happening in the compatibility reports
>>> that
>>>> we
>>>>>>> generate as part of each RC.
>>>>>>> 
>>>>>>>>> On Mar 16, 2022, at 10:46 AM, Bryan Beaudreault
>>>>>> <bb...@hubspot.com.invalid>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Thanks for your detailed response, Nick!
>>>>>>>>> 
>>>>>>>>>> I think that none of my comments address your intended topic:
>> how
>>>> do
>>>>>> we
>>>>>>>>> publish our configuration points as an API that can be consumed
>> by
>>>>>> user
>>>>>>>>> applications? (Do I have that correct?)
>>>>>>>>> 
>>>>>>>>> This is a good summary, and I appreciate the other
>>>>>>>> thoughts/clarifications
>>>>>>>>> as well. I also realize this is probably hard to get perfect and
>>> any
>>>>>>>> choice
>>>>>>>>> must be weighed against the effort necessary to change/maintain.
>>>>>>>>> 
>>>>>>>>> One example I know is Hadoop/HDFS, and I bet some on this list
>>> have
>>>>>> much
>>>>>>>>> more knowledge of that project's history than I do. For HDFS
>> they
>>>> have
>>>>>>>>> DFSConfigKeys which in my experience does seem to include most
>>>>>> configs. I
>>>>>>>>> believe they even have unit tests which verify that all configs
>> in
>>>> the
>>>>>>>>> various site.xml files are represented in code. In more recent
>>>>>> versions
>>>>>>>>> they have split that class up into smaller groupings, for
>> example
>>>>>>>>> DfsClientConf and the various inner classes there.
>>>>>>>>> 
>>>>>>>>> In a vacuum, from a code design perspective, I'm not commenting
>> on
>>>>>>>> whether
>>>>>>>>> that's a good or bad pattern. I also don't know of the politics
>> of
>>>> the
>>>>>>>>> project or what sorts of pain points they've discovered in that
>>>>>> pattern
>>>>>>>>> over the years. But from *user's perspective*, this is a handy
>> way
>>>> to
>>>>>>>>> handle things in my opinion.
>>>>>>>>> 
>>>>>>>>> At my company, in general we try to avoid "magic strings" [1]
>> and
>>>>>> instead
>>>>>>>>> always try to use constants. We can and do define our own
>>> constants
>>>> to
>>>>>>>> try
>>>>>>>>> to mirror some of the "private" magic strings in the hbase
>> client.
>>>>>> This
>>>>>>>> is
>>>>>>>>> better than nothing but even better would be to use
>> hbase-provided
>>>>>>>>> constants so that we can build more defensive applications,
>> using
>>>> the
>>>>>>>>> compiler to verify that the configs we reference still do
>>> anything.
>>>>>>>>> 
>>>>>>>>> I unfortunately can't speak to the original issues with
>> HConstants
>>>>>> that
>>>>>>>>> turned it into an anti-pattern. What I do notice is there are
>>>>>> definitely
>>>>>>>>> examples in the hbase codebase of duplicated config strings, one
>>> of
>>>>>> which
>>>>>>>>> is called out in one of the jiras I linked in my original email.
>>>> These
>>>>>>>> are
>>>>>>>>> just bugs waiting to happen in my opinion, either for hbase
>> itself
>>>> or
>>>>>> for
>>>>>>>>> users which may reference them.
>>>>>>>>> 
>>>>>>>>> [1] https://deviq.com/antipatterns/magic-strings
>>> <https://deviq.com/antipatterns/magic-strings>
>>>>>> <https://deviq.com/antipatterns/magic-strings
>>> <https://deviq.com/antipatterns/magic-strings>
>>>> 
>>>>>>>>>> On Wed, Mar 16, 2022 at 10:52 AM Nick Dimiduk <
>>> ndimiduk@apache.org
>>>>> 
>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Hi Bryan,
>>>>>>>>>> 
>>>>>>>>>> Thanks for bringing this up.
>>>>>>>>>> 
>>>>>>>>>> I agree with Duo (and I think we have this settled as
>>> project-wide
>>>>>>>>>> consensus) that HConstants is/was an anti-pattern, that we are
>>>>>> actively
>>>>>>>>>> against adding new fields there, and opportunistically removing
>>>>>> fields
>>>>>>>> when
>>>>>>>>>> we can. Further, the documented meaning of the
>>>>>>>>>> HBaseInterfaceAudience.CONFIG field is "Denotes class names
>> that
>>>>>> appear
>>>>>>>> in
>>>>>>>>>> user facing configuration files", so this isn't really
>>> appropriate
>>>>>> for
>>>>>>>>>> marking a field that exposes a configuration key to user
>>>>>> applications. I
>>>>>>>>>> will also note that there appears to be two categories of
>> tunable
>>>>>>>>>> parameters -- configuration points that we expect users to
>> tweak
>>>> are
>>>>>>>>>> catalogued and documented in the book [0] and everything else
>> is
>>>>>> left to
>>>>>>>>>> the obscurity of code-grep.
>>>>>>>>>> 
>>>>>>>>>> While we are actively squashing use of fields in HConstants, I
>>>> don't
>>>>>>>> know
>>>>>>>>>> that we have proposed some alternative to the user community.
>> For
>>>> my
>>>>>>>> part,
>>>>>>>>>> when I write and review code that involves configuration keys,
>> I
>>>>>>>> generally
>>>>>>>>>> implement the key constant string as a private field in an
>>>>>> appropriate
>>>>>>>>>> class, and the unit test coverage for that configuration key
>>>>>> replicates
>>>>>>>> the
>>>>>>>>>> string in the test. My reasoning being that the string is a
>> part
>>> of
>>>>>> our
>>>>>>>>>> public API and making a change to the public API should be
>>> detected
>>>>>> from
>>>>>>>>>> the unit test. I also have (on occasion) gone out of my way to
>>>> write
>>>>>>>> about
>>>>>>>>>> the configuration keys in the package or class-level javadoc.
>>>>>>>>>> 
>>>>>>>>>> I think that none of my comments address your intended topic:
>> how
>>>> do
>>>>>> we
>>>>>>>>>> publish our configuration points as an API that can be consumed
>>> by
>>>>>> user
>>>>>>>>>> applications? (Do I have that correct?)
>>>>>>>>>> 
>>>>>>>>>> I am of the mind that we don't need/want an API of
>>> configurations ;
>>>>>> we
>>>>>>>> want
>>>>>>>>>> a catalogue, i.e., what has been started in our book. Perhaps
>>>>>>>> accompanied
>>>>>>>>>> by/generated from an authoritative hbase-defaults.xml file. In
>>>> fact,
>>>>>> we
>>>>>>>>>> already do generate from hbase-default.xml, the result is [1]
>> ; I
>>>>>> don't
>>>>>>>>>> believe it is authoritative.
>>>>>>>>>> 
>>>>>>>>>> If we did have an AP thoughI, what would be better than the
>>>>>> HConstants
>>>>>>>>>> approach of key-strings as public fields ? What if we had a
>>>>>>>>>> ConfigurationBuilder type of class, which had methods tied to
>>>>>>>> configuration
>>>>>>>>>> keys? I would think that such a globally applicable class would
>>>> have
>>>>>> the
>>>>>>>>>> same maintenance issues as HConstants. But what if we had some
>>> kind
>>>>>> of
>>>>>>>>>> ConfigurationSetter class, perhaps per package, that performed
>>> this
>>>>>>>>>> function? That might be maintainable for us and useful for
>> users.
>>>>>>>>>> 
>>>>>>>>>> I'm keen to hear what other ideas are out there, or better,
>>>> examples
>>>>>> and
>>>>>>>>>> counter-examples from other projects.
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> Nick
>>>>>>>>>> 
>>>>>>>>>> [0]:
>> https://hbase.apache.org/book.html#important_configurations
>>> <https://hbase.apache.org/book.html#important_configurations>
>>>>>> <https://hbase.apache.org/book.html#important_configurations
>>> <https://hbase.apache.org/book.html#important_configurations>
>>>> 
>>>>>>>>>> <https://hbase.apache.org/book.html#important_configurations
>>> <https://hbase.apache.org/book.html#important_configurations>
>>>>>> <https://hbase.apache.org/book.html#important_configurations
>>> <https://hbase.apache.org/book.html#important_configurations>
>>>> 
>>>>>>> 
>>>>>>>>>> [1]:
>>>> https://hbase.apache.org/book.html#hbase_default_configurations
>>> <https://hbase.apache.org/book.html#hbase_default_configurations>
>>>>>> <https://hbase.apache.org/book.html#hbase_default_configurations
>>> <https://hbase.apache.org/book.html#hbase_default_configurations>
>>>> 
>>>>>>>>>> <
>> https://hbase.apache.org/book.html#hbase_default_configurations
>>> <https://hbase.apache.org/book.html#hbase_default_configurations>
>>>>>> <https://hbase.apache.org/book.html#hbase_default_configurations
>>> <https://hbase.apache.org/book.html#hbase_default_configurations>
>>>> 
>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Tue, Mar 15, 2022 at 4:28 PM Bryan Beaudreault
>>>>>>>>>> <bb...@hubspot.com.invalid> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi devs,
>>>>>>>>>>> 
>>>>>>>>>>> As a major user of hbase, my company has thousands of clients
>>>>>> deployed
>>>>>>>>>>> which use the hbase client to connect to a variety of hbase
>>>>>> clusters.
>>>>>>>> We
>>>>>>>>>>> have a common library which handles configuring all clients by
>>>>>> setting
>>>>>>>> up
>>>>>>>>>>> the Configuration object prior to creating a Connection. Our
>>>> library
>>>>>>>> sets
>>>>>>>>>>> configurations using the various configs in HConstants, but
>>> there
>>>>>> are
>>>>>>>>>> also
>>>>>>>>>>> a bunch of configs which don't exist in HConstants. For these
>> we
>>>>>> have
>>>>>>>>>>> hardcoded config strings in our client.
>>>>>>>>>>> 
>>>>>>>>>>> We're now working on an hbase upgrade and need to go through
>> our
>>>>>> client
>>>>>>>>>>> library and check how the configs may have changed in the new
>>>>>> version.
>>>>>>>>>> This
>>>>>>>>>>> is relatively easy to do for those HConstants cases -- configs
>>> may
>>>>>> be
>>>>>>>>>>> marked @Deprecated which will show up in one's editor, they
>> may
>>> be
>>>>>>>>>> removed
>>>>>>>>>>> entirely which would show up is a compile error, and otherwise
>>> one
>>>>>> can
>>>>>>>>>>> easily click through or bring up the javadoc. For the others
>>> that
>>>>>> don't
>>>>>>>>>>> exist in HConstants, we need to go manually search the hbase
>>>>>> codebase
>>>>>>>> for
>>>>>>>>>>> those strings.
>>>>>>>>>>> 
>>>>>>>>>>> Without doing this painstaking manual process, we would
>>>> potentially
>>>>>>>>>> deploy
>>>>>>>>>>> the upgraded client with configs which are no longer used or
>>>>>> deprecated
>>>>>>>>>> by
>>>>>>>>>>> the hbase client. For those using HConstants, this is
>>> immediately
>>>>>>>> obvious
>>>>>>>>>>> because the HConstant field may have been removed. This is a
>>> clear
>>>>>>>>>>> indication of needing to investigate the config. In this case
>>> it's
>>>>>>>>>>> preferred to face the compile failure because it's clearer
>> than
>>>>>> having
>>>>>>>>>>> something silently disappear or change.
>>>>>>>>>>> 
>>>>>>>>>>> I opened 3 jiras to move some fields to HConstants, but got
>> some
>>>>>>>>>> reasonable
>>>>>>>>>>> pushback from Duo:
>>>>>>>>>>> 
>>>>>>>>>>> https://issues.apache.org/jira/browse/HBASE-26845
>>> <https://issues.apache.org/jira/browse/HBASE-26845>
>>>>>> <https://issues.apache.org/jira/browse/HBASE-26845
>>> <https://issues.apache.org/jira/browse/HBASE-26845>
>>>> 
>>>>>>>>>> <https://issues.apache.org/jira/browse/HBASE-26845
>>> <https://issues.apache.org/jira/browse/HBASE-26845>
>>>>>> <https://issues.apache.org/jira/browse/HBASE-26845
>>> <https://issues.apache.org/jira/browse/HBASE-26845>
>>>> 
>>>>>>> 
>>>>>>>>>>> https://issues.apache.org/jira/browse/HBASE-26846
>>> <https://issues.apache.org/jira/browse/HBASE-26846>
>>>>>> <https://issues.apache.org/jira/browse/HBASE-26846
>>> <https://issues.apache.org/jira/browse/HBASE-26846>
>>>> 
>>>>>>>>>> <https://issues.apache.org/jira/browse/HBASE-26846
>>> <https://issues.apache.org/jira/browse/HBASE-26846>
>>>>>> <https://issues.apache.org/jira/browse/HBASE-26846
>>> <https://issues.apache.org/jira/browse/HBASE-26846>
>>>> 
>>>>>>> 
>>>>>>>>>>> https://issues.apache.org/jira/browse/HBASE-26847
>>> <https://issues.apache.org/jira/browse/HBASE-26847>
>>>>>> <https://issues.apache.org/jira/browse/HBASE-26847
>>> <https://issues.apache.org/jira/browse/HBASE-26847>
>>>> 
>>>>>>>>>> <https://issues.apache.org/jira/browse/HBASE-26847
>>> <https://issues.apache.org/jira/browse/HBASE-26847>
>>>>>> <https://issues.apache.org/jira/browse/HBASE-26847
>>> <https://issues.apache.org/jira/browse/HBASE-26847>
>>>> 
>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Duo's pushback is that HConstants is an anti-pattern and these
>>>>>> configs
>>>>>>>>>> are
>>>>>>>>>>> not part of our public API. I can agree that a catch-all
>>> constants
>>>>>>>> class
>>>>>>>>>>> might be an anti-pattern, but would argue that consolidating
>>>> configs
>>>>>>>>>> there
>>>>>>>>>>> is very useful for end-users. I can also potentially agree
>> that
>>>>>>>> exposing
>>>>>>>>>>> these as part of our public API might limit the flexibility of
>>>>>>>>>> development
>>>>>>>>>>> due to compatibility constraints about IA.Public.
>>>>>>>>>>> 
>>>>>>>>>>> To me it seems odd to add a configuration, whose whole point
>> is
>>> to
>>>>>> make
>>>>>>>>>>> something tuneable, but then bury it in a private class
>> despite
>>>>>> having
>>>>>>>>>> real
>>>>>>>>>>> implications for how the application runs. If a configuration
>> is
>>>> not
>>>>>>>>>> meant
>>>>>>>>>>> to be tuned, it shouldn't be a configuration at all. Otherwise
>>> it
>>>>>>>> should
>>>>>>>>>> be
>>>>>>>>>>> exposed for reference.
>>>>>>>>>>> 
>>>>>>>>>>> I'm wondering if there is some compromise we can achieve which
>>>>>> makes it
>>>>>>>>>>> easier for end-users to integrate with tunable configs.
>>>>>>>>>>> 
>>>>>>>>>>> One can imagine a large project to clean up all of our configs
>>>> under
>>>>>>>> some
>>>>>>>>>>> new class with maybe IA.LimitedPrivate(CONFIG), but I fear
>>> making
>>>>>>>> perfect
>>>>>>>>>>> (needing to migrate all configs) the enemy of good.
>>>>>>>>>>> 
>>>>>>>>>>> A better option might be to make those classes which expose
>>>> configs
>>>>>>>>>>> LimitedPrivate(CONFIG) -- for example AsyncProcess and
>>>>>>>>>>> ConnectionImplementation. That might be the most incremental
>>>> change
>>>>>> we
>>>>>>>>>>> could make. We could handle this on a case-by-case basis.
>>>>>>>>>>> 
>>>>>>>>>>> Does anyone have any thoughts?
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 

Re: [DISCUSS] How to expose configs for users to tune

Posted by "张铎(Duo Zhang)" <pa...@gmail.com>.
IMO if we want to have a constant class for this, we should mark it as
IA.Public. IA.LP(Config) just means the class itself can be referenced in
config files.

Bryan Beaudreault <bb...@hubspot.com.invalid> 于2022年3月25日周五 04:25写道:

> Yes but I don't see a major issue with promoting to LP(Config)? I thought
> the agreement here was that LP(Config) was basically "exposed for config
> constants", but maybe I misunderstood.
>
> Also ConnectionConfiguration is a relatively basic and lightweight wrapper
> around configs. I think it's a perfect example of what this LP(Config) API
> could look like.
>
> On Thu, Mar 24, 2022 at 3:47 PM Nick Dimiduk <nd...@apache.org> wrote:
>
> > One sticking point: ConnectionConfiguration and
> > AsyncConnectionConfiguration are both IA.Private.
> >
> > On Thu, Mar 24, 2022 at 17:21 Bryan Beaudreault
> > <bb...@hubspot.com.invalid> wrote:
> >
> > > I realized there already exists a good candidate for this in
> hbase-client
> > > -- ConnectionConfiguration. My latest commit in
> > > https://github.com/apache/hbase/pull/4180
> > <https://github.com/apache/hbase/pull/4180>
> > adds a new config constant there
> > > and marks it as LP(Config), but I'd also be happy to revert that part
> of
> > > the PR and instead handle that in a dedicated jira for this topic if
> > > desired.
> > >
> > > On Thu, Mar 24, 2022 at 11:08 AM Bryan Beaudreault <
> > > bbeaudreault@hubspot.com>
> > > wrote:
> > >
> > > > Thank you both for the input here!
> > > >
> > > > It seems like we've come to the conclusion that it would be ok to:
> > > >
> > > > - Update docs around LP(CONFIG) to say that it also encompasses some
> > > > module-aggregated classes which hold config constants
> > > > - Maybe document this convention elsewhere for contributors as well,
> > > > though need to look into where (guidance welcome, but will look
> around)
> > > > - I can create the first constant class in hbase-client for the
> > use-cases
> > > > in my jira. I also have another config constant I've added in
> > > > https://github.com/apache/hbase/pull/4180
> > <https://github.com/apache/hbase/pull/4180>
> > prior to this discussion, so
> > > > may want to use that class there.
> > > > - Look into updates to tooling to audit the LP(CONFIG) classes
> > > >
> > > > If this sounds good, I'll create a JIRA to track the work. The first
> 3
> > > > todos will be relatively easy, and I'll look into options for the
> audit
> > > tool
> > > >
> > > >
> > > > On Mon, Mar 21, 2022 at 12:14 PM Andrew Purtell <
> > > andrew.purtell@gmail.com>
> > > > wrote:
> > > >
> > > >> Agreed the definition of LP(CONFIG) would need to be tweaked.
> > > >>
> > > >> We do not quite have enough support for analyzing configuration key
> > set
> > > >> changes in the current API audit tool. Removal of a public constant
> > > field
> > > >> from an LP annotated class would be flagged but a modification of
> the
> > > >> constant would not. However it is a OSS project consisting of Perl
> > > scripts
> > > >> that might accept a contribution or at least could be patched or
> > > extended.
> > > >> Or we can build a new audit tool for the purpose, which is what I
> > would
> > > >> recommend. The Perl based tool shells out to javah and is quite
> > > expensive,
> > > >> and on some platforms, depending on comparison, can fail due to
> > command
> > > >> line length limits. A Java tool would likely be more efficient at
> > > >> processing Java class annotations and introspecting string
> constants.
> > > >>
> > > >> >
> > > >> > On Mar 21, 2022, at 8:52 AM, Nick Dimiduk <nd...@apache.org>
> > > wrote:
> > > >> >
> > > >> > On Mon, Mar 21, 2022 at 4:32 PM Andrew Purtell <
> > > >> andrew.purtell@gmail.com>
> > > >> > wrote:
> > > >> >
> > > >> >> Although collecting all configuration keys into a single file is
> > > >> >> definitely an anti-pattern I’m not sure the same is true of
> package
> > > or
> > > >> >> Maven module level aggregation classes marked LP(CONFIG).Somewhat
> > > like
> > > >> >> DFSConfigKeys but geared toward our API/release auditing.
> > > >> >>
> > > >> >> This would seem virtuous for a couple of reasons. Relevant
> > > >> configuration
> > > >> >> key constants for the package or module would be grouped in a
> well
> > > >> known
> > > >> >> place for users and developers alike. The LP(CONFIG) designation
> > > would
> > > >> >> require developers to think about deprecation cycle if
> > contemplating
> > > a
> > > >> >> change, thus providing some back pressure against snap decisions.
> > Or,
> > > >> if
> > > >> >> not then, then at release candidate evaluation time, user
> > > configuration
> > > >> >> breaking changes could be caught be a release automation tool
> that
> > > >> diffs
> > > >> >> LP(CONFIG) annotated classes. Something like this would improve
> the
> > > >> state
> > > >> >> of configuration key management quite dramatically, because
> > currently
> > > >> it’s
> > > >> >> ad hoc.
> > > >> >>
> > > >> >
> > > >> > I am supportive of trying such an effort. We'll need to tweak the
> > > >> meaning
> > > >> > of LP(CONFIG) as we define it currently, but that can be done. I
> > don't
> > > >> know
> > > >> > what our current tools do or assume regarding this annotation. I
> > think
> > > >> > there is something custom happening in the compatibility reports
> > that
> > > we
> > > >> > generate as part of each RC.
> > > >> >
> > > >> >>> On Mar 16, 2022, at 10:46 AM, Bryan Beaudreault
> > > >> <bb...@hubspot.com.invalid>
> > > >> >>> wrote:
> > > >> >>>
> > > >> >>> Thanks for your detailed response, Nick!
> > > >> >>>
> > > >> >>>> I think that none of my comments address your intended topic:
> how
> > > do
> > > >> we
> > > >> >>> publish our configuration points as an API that can be consumed
> by
> > > >> user
> > > >> >>> applications? (Do I have that correct?)
> > > >> >>>
> > > >> >>> This is a good summary, and I appreciate the other
> > > >> >> thoughts/clarifications
> > > >> >>> as well. I also realize this is probably hard to get perfect and
> > any
> > > >> >> choice
> > > >> >>> must be weighed against the effort necessary to change/maintain.
> > > >> >>>
> > > >> >>> One example I know is Hadoop/HDFS, and I bet some on this list
> > have
> > > >> much
> > > >> >>> more knowledge of that project's history than I do. For HDFS
> they
> > > have
> > > >> >>> DFSConfigKeys which in my experience does seem to include most
> > > >> configs. I
> > > >> >>> believe they even have unit tests which verify that all configs
> in
> > > the
> > > >> >>> various site.xml files are represented in code. In more recent
> > > >> versions
> > > >> >>> they have split that class up into smaller groupings, for
> example
> > > >> >>> DfsClientConf and the various inner classes there.
> > > >> >>>
> > > >> >>> In a vacuum, from a code design perspective, I'm not commenting
> on
> > > >> >> whether
> > > >> >>> that's a good or bad pattern. I also don't know of the politics
> of
> > > the
> > > >> >>> project or what sorts of pain points they've discovered in that
> > > >> pattern
> > > >> >>> over the years. But from *user's perspective*, this is a handy
> way
> > > to
> > > >> >>> handle things in my opinion.
> > > >> >>>
> > > >> >>> At my company, in general we try to avoid "magic strings" [1]
> and
> > > >> instead
> > > >> >>> always try to use constants. We can and do define our own
> > constants
> > > to
> > > >> >> try
> > > >> >>> to mirror some of the "private" magic strings in the hbase
> client.
> > > >> This
> > > >> >> is
> > > >> >>> better than nothing but even better would be to use
> hbase-provided
> > > >> >>> constants so that we can build more defensive applications,
> using
> > > the
> > > >> >>> compiler to verify that the configs we reference still do
> > anything.
> > > >> >>>
> > > >> >>> I unfortunately can't speak to the original issues with
> HConstants
> > > >> that
> > > >> >>> turned it into an anti-pattern. What I do notice is there are
> > > >> definitely
> > > >> >>> examples in the hbase codebase of duplicated config strings, one
> > of
> > > >> which
> > > >> >>> is called out in one of the jiras I linked in my original email.
> > > These
> > > >> >> are
> > > >> >>> just bugs waiting to happen in my opinion, either for hbase
> itself
> > > or
> > > >> for
> > > >> >>> users which may reference them.
> > > >> >>>
> > > >> >>> [1] https://deviq.com/antipatterns/magic-strings
> > <https://deviq.com/antipatterns/magic-strings>
> > > >> <https://deviq.com/antipatterns/magic-strings
> > <https://deviq.com/antipatterns/magic-strings>
> > >
> > > >> >>>> On Wed, Mar 16, 2022 at 10:52 AM Nick Dimiduk <
> > ndimiduk@apache.org
> > > >
> > > >> >> wrote:
> > > >> >>>>
> > > >> >>>> Hi Bryan,
> > > >> >>>>
> > > >> >>>> Thanks for bringing this up.
> > > >> >>>>
> > > >> >>>> I agree with Duo (and I think we have this settled as
> > project-wide
> > > >> >>>> consensus) that HConstants is/was an anti-pattern, that we are
> > > >> actively
> > > >> >>>> against adding new fields there, and opportunistically removing
> > > >> fields
> > > >> >> when
> > > >> >>>> we can. Further, the documented meaning of the
> > > >> >>>> HBaseInterfaceAudience.CONFIG field is "Denotes class names
> that
> > > >> appear
> > > >> >> in
> > > >> >>>> user facing configuration files", so this isn't really
> > appropriate
> > > >> for
> > > >> >>>> marking a field that exposes a configuration key to user
> > > >> applications. I
> > > >> >>>> will also note that there appears to be two categories of
> tunable
> > > >> >>>> parameters -- configuration points that we expect users to
> tweak
> > > are
> > > >> >>>> catalogued and documented in the book [0] and everything else
> is
> > > >> left to
> > > >> >>>> the obscurity of code-grep.
> > > >> >>>>
> > > >> >>>> While we are actively squashing use of fields in HConstants, I
> > > don't
> > > >> >> know
> > > >> >>>> that we have proposed some alternative to the user community.
> For
> > > my
> > > >> >> part,
> > > >> >>>> when I write and review code that involves configuration keys,
> I
> > > >> >> generally
> > > >> >>>> implement the key constant string as a private field in an
> > > >> appropriate
> > > >> >>>> class, and the unit test coverage for that configuration key
> > > >> replicates
> > > >> >> the
> > > >> >>>> string in the test. My reasoning being that the string is a
> part
> > of
> > > >> our
> > > >> >>>> public API and making a change to the public API should be
> > detected
> > > >> from
> > > >> >>>> the unit test. I also have (on occasion) gone out of my way to
> > > write
> > > >> >> about
> > > >> >>>> the configuration keys in the package or class-level javadoc.
> > > >> >>>>
> > > >> >>>> I think that none of my comments address your intended topic:
> how
> > > do
> > > >> we
> > > >> >>>> publish our configuration points as an API that can be consumed
> > by
> > > >> user
> > > >> >>>> applications? (Do I have that correct?)
> > > >> >>>>
> > > >> >>>> I am of the mind that we don't need/want an API of
> > configurations ;
> > > >> we
> > > >> >> want
> > > >> >>>> a catalogue, i.e., what has been started in our book. Perhaps
> > > >> >> accompanied
> > > >> >>>> by/generated from an authoritative hbase-defaults.xml file. In
> > > fact,
> > > >> we
> > > >> >>>> already do generate from hbase-default.xml, the result is [1]
> ; I
> > > >> don't
> > > >> >>>> believe it is authoritative.
> > > >> >>>>
> > > >> >>>> If we did have an AP thoughI, what would be better than the
> > > >> HConstants
> > > >> >>>> approach of key-strings as public fields ? What if we had a
> > > >> >>>> ConfigurationBuilder type of class, which had methods tied to
> > > >> >> configuration
> > > >> >>>> keys? I would think that such a globally applicable class would
> > > have
> > > >> the
> > > >> >>>> same maintenance issues as HConstants. But what if we had some
> > kind
> > > >> of
> > > >> >>>> ConfigurationSetter class, perhaps per package, that performed
> > this
> > > >> >>>> function? That might be maintainable for us and useful for
> users.
> > > >> >>>>
> > > >> >>>> I'm keen to hear what other ideas are out there, or better,
> > > examples
> > > >> and
> > > >> >>>> counter-examples from other projects.
> > > >> >>>>
> > > >> >>>> Thanks,
> > > >> >>>> Nick
> > > >> >>>>
> > > >> >>>> [0]:
> https://hbase.apache.org/book.html#important_configurations
> > <https://hbase.apache.org/book.html#important_configurations>
> > > >> <https://hbase.apache.org/book.html#important_configurations
> > <https://hbase.apache.org/book.html#important_configurations>
> > >
> > > >> >>>> <https://hbase.apache.org/book.html#important_configurations
> > <https://hbase.apache.org/book.html#important_configurations>
> > > >> <https://hbase.apache.org/book.html#important_configurations
> > <https://hbase.apache.org/book.html#important_configurations>
> > >
> > > >> >
> > > >> >>>> [1]:
> > > https://hbase.apache.org/book.html#hbase_default_configurations
> > <https://hbase.apache.org/book.html#hbase_default_configurations>
> > > >> <https://hbase.apache.org/book.html#hbase_default_configurations
> > <https://hbase.apache.org/book.html#hbase_default_configurations>
> > >
> > > >> >>>> <
> https://hbase.apache.org/book.html#hbase_default_configurations
> > <https://hbase.apache.org/book.html#hbase_default_configurations>
> > > >> <https://hbase.apache.org/book.html#hbase_default_configurations
> > <https://hbase.apache.org/book.html#hbase_default_configurations>
> > >
> > > >> >
> > > >> >>>>
> > > >> >>>> On Tue, Mar 15, 2022 at 4:28 PM Bryan Beaudreault
> > > >> >>>> <bb...@hubspot.com.invalid> wrote:
> > > >> >>>>
> > > >> >>>>> Hi devs,
> > > >> >>>>>
> > > >> >>>>> As a major user of hbase, my company has thousands of clients
> > > >> deployed
> > > >> >>>>> which use the hbase client to connect to a variety of hbase
> > > >> clusters.
> > > >> >> We
> > > >> >>>>> have a common library which handles configuring all clients by
> > > >> setting
> > > >> >> up
> > > >> >>>>> the Configuration object prior to creating a Connection. Our
> > > library
> > > >> >> sets
> > > >> >>>>> configurations using the various configs in HConstants, but
> > there
> > > >> are
> > > >> >>>> also
> > > >> >>>>> a bunch of configs which don't exist in HConstants. For these
> we
> > > >> have
> > > >> >>>>> hardcoded config strings in our client.
> > > >> >>>>>
> > > >> >>>>> We're now working on an hbase upgrade and need to go through
> our
> > > >> client
> > > >> >>>>> library and check how the configs may have changed in the new
> > > >> version.
> > > >> >>>> This
> > > >> >>>>> is relatively easy to do for those HConstants cases -- configs
> > may
> > > >> be
> > > >> >>>>> marked @Deprecated which will show up in one's editor, they
> may
> > be
> > > >> >>>> removed
> > > >> >>>>> entirely which would show up is a compile error, and otherwise
> > one
> > > >> can
> > > >> >>>>> easily click through or bring up the javadoc. For the others
> > that
> > > >> don't
> > > >> >>>>> exist in HConstants, we need to go manually search the hbase
> > > >> codebase
> > > >> >> for
> > > >> >>>>> those strings.
> > > >> >>>>>
> > > >> >>>>> Without doing this painstaking manual process, we would
> > > potentially
> > > >> >>>> deploy
> > > >> >>>>> the upgraded client with configs which are no longer used or
> > > >> deprecated
> > > >> >>>> by
> > > >> >>>>> the hbase client. For those using HConstants, this is
> > immediately
> > > >> >> obvious
> > > >> >>>>> because the HConstant field may have been removed. This is a
> > clear
> > > >> >>>>> indication of needing to investigate the config. In this case
> > it's
> > > >> >>>>> preferred to face the compile failure because it's clearer
> than
> > > >> having
> > > >> >>>>> something silently disappear or change.
> > > >> >>>>>
> > > >> >>>>> I opened 3 jiras to move some fields to HConstants, but got
> some
> > > >> >>>> reasonable
> > > >> >>>>> pushback from Duo:
> > > >> >>>>>
> > > >> >>>>> https://issues.apache.org/jira/browse/HBASE-26845
> > <https://issues.apache.org/jira/browse/HBASE-26845>
> > > >> <https://issues.apache.org/jira/browse/HBASE-26845
> > <https://issues.apache.org/jira/browse/HBASE-26845>
> > >
> > > >> >>>> <https://issues.apache.org/jira/browse/HBASE-26845
> > <https://issues.apache.org/jira/browse/HBASE-26845>
> > > >> <https://issues.apache.org/jira/browse/HBASE-26845
> > <https://issues.apache.org/jira/browse/HBASE-26845>
> > >
> > > >> >
> > > >> >>>>> https://issues.apache.org/jira/browse/HBASE-26846
> > <https://issues.apache.org/jira/browse/HBASE-26846>
> > > >> <https://issues.apache.org/jira/browse/HBASE-26846
> > <https://issues.apache.org/jira/browse/HBASE-26846>
> > >
> > > >> >>>> <https://issues.apache.org/jira/browse/HBASE-26846
> > <https://issues.apache.org/jira/browse/HBASE-26846>
> > > >> <https://issues.apache.org/jira/browse/HBASE-26846
> > <https://issues.apache.org/jira/browse/HBASE-26846>
> > >
> > > >> >
> > > >> >>>>> https://issues.apache.org/jira/browse/HBASE-26847
> > <https://issues.apache.org/jira/browse/HBASE-26847>
> > > >> <https://issues.apache.org/jira/browse/HBASE-26847
> > <https://issues.apache.org/jira/browse/HBASE-26847>
> > >
> > > >> >>>> <https://issues.apache.org/jira/browse/HBASE-26847
> > <https://issues.apache.org/jira/browse/HBASE-26847>
> > > >> <https://issues.apache.org/jira/browse/HBASE-26847
> > <https://issues.apache.org/jira/browse/HBASE-26847>
> > >
> > > >> >
> > > >> >>>>>
> > > >> >>>>> Duo's pushback is that HConstants is an anti-pattern and these
> > > >> configs
> > > >> >>>> are
> > > >> >>>>> not part of our public API. I can agree that a catch-all
> > constants
> > > >> >> class
> > > >> >>>>> might be an anti-pattern, but would argue that consolidating
> > > configs
> > > >> >>>> there
> > > >> >>>>> is very useful for end-users. I can also potentially agree
> that
> > > >> >> exposing
> > > >> >>>>> these as part of our public API might limit the flexibility of
> > > >> >>>> development
> > > >> >>>>> due to compatibility constraints about IA.Public.
> > > >> >>>>>
> > > >> >>>>> To me it seems odd to add a configuration, whose whole point
> is
> > to
> > > >> make
> > > >> >>>>> something tuneable, but then bury it in a private class
> despite
> > > >> having
> > > >> >>>> real
> > > >> >>>>> implications for how the application runs. If a configuration
> is
> > > not
> > > >> >>>> meant
> > > >> >>>>> to be tuned, it shouldn't be a configuration at all. Otherwise
> > it
> > > >> >> should
> > > >> >>>> be
> > > >> >>>>> exposed for reference.
> > > >> >>>>>
> > > >> >>>>> I'm wondering if there is some compromise we can achieve which
> > > >> makes it
> > > >> >>>>> easier for end-users to integrate with tunable configs.
> > > >> >>>>>
> > > >> >>>>> One can imagine a large project to clean up all of our configs
> > > under
> > > >> >> some
> > > >> >>>>> new class with maybe IA.LimitedPrivate(CONFIG), but I fear
> > making
> > > >> >> perfect
> > > >> >>>>> (needing to migrate all configs) the enemy of good.
> > > >> >>>>>
> > > >> >>>>> A better option might be to make those classes which expose
> > > configs
> > > >> >>>>> LimitedPrivate(CONFIG) -- for example AsyncProcess and
> > > >> >>>>> ConnectionImplementation. That might be the most incremental
> > > change
> > > >> we
> > > >> >>>>> could make. We could handle this on a case-by-case basis.
> > > >> >>>>>
> > > >> >>>>> Does anyone have any thoughts?
> > > >> >>>>>
> > > >> >>>>
> > > >> >>
> > > >>
> > > >
> > >
> >
>

Re: [DISCUSS] How to expose configs for users to tune

Posted by Bryan Beaudreault <bb...@hubspot.com.INVALID>.
Yes but I don't see a major issue with promoting to LP(Config)? I thought
the agreement here was that LP(Config) was basically "exposed for config
constants", but maybe I misunderstood.

Also ConnectionConfiguration is a relatively basic and lightweight wrapper
around configs. I think it's a perfect example of what this LP(Config) API
could look like.

On Thu, Mar 24, 2022 at 3:47 PM Nick Dimiduk <nd...@apache.org> wrote:

> One sticking point: ConnectionConfiguration and
> AsyncConnectionConfiguration are both IA.Private.
>
> On Thu, Mar 24, 2022 at 17:21 Bryan Beaudreault
> <bb...@hubspot.com.invalid> wrote:
>
> > I realized there already exists a good candidate for this in hbase-client
> > -- ConnectionConfiguration. My latest commit in
> > https://github.com/apache/hbase/pull/4180
> <https://github.com/apache/hbase/pull/4180>
> adds a new config constant there
> > and marks it as LP(Config), but I'd also be happy to revert that part of
> > the PR and instead handle that in a dedicated jira for this topic if
> > desired.
> >
> > On Thu, Mar 24, 2022 at 11:08 AM Bryan Beaudreault <
> > bbeaudreault@hubspot.com>
> > wrote:
> >
> > > Thank you both for the input here!
> > >
> > > It seems like we've come to the conclusion that it would be ok to:
> > >
> > > - Update docs around LP(CONFIG) to say that it also encompasses some
> > > module-aggregated classes which hold config constants
> > > - Maybe document this convention elsewhere for contributors as well,
> > > though need to look into where (guidance welcome, but will look around)
> > > - I can create the first constant class in hbase-client for the
> use-cases
> > > in my jira. I also have another config constant I've added in
> > > https://github.com/apache/hbase/pull/4180
> <https://github.com/apache/hbase/pull/4180>
> prior to this discussion, so
> > > may want to use that class there.
> > > - Look into updates to tooling to audit the LP(CONFIG) classes
> > >
> > > If this sounds good, I'll create a JIRA to track the work. The first 3
> > > todos will be relatively easy, and I'll look into options for the audit
> > tool
> > >
> > >
> > > On Mon, Mar 21, 2022 at 12:14 PM Andrew Purtell <
> > andrew.purtell@gmail.com>
> > > wrote:
> > >
> > >> Agreed the definition of LP(CONFIG) would need to be tweaked.
> > >>
> > >> We do not quite have enough support for analyzing configuration key
> set
> > >> changes in the current API audit tool. Removal of a public constant
> > field
> > >> from an LP annotated class would be flagged but a modification of the
> > >> constant would not. However it is a OSS project consisting of Perl
> > scripts
> > >> that might accept a contribution or at least could be patched or
> > extended.
> > >> Or we can build a new audit tool for the purpose, which is what I
> would
> > >> recommend. The Perl based tool shells out to javah and is quite
> > expensive,
> > >> and on some platforms, depending on comparison, can fail due to
> command
> > >> line length limits. A Java tool would likely be more efficient at
> > >> processing Java class annotations and introspecting string constants.
> > >>
> > >> >
> > >> > On Mar 21, 2022, at 8:52 AM, Nick Dimiduk <nd...@apache.org>
> > wrote:
> > >> >
> > >> > On Mon, Mar 21, 2022 at 4:32 PM Andrew Purtell <
> > >> andrew.purtell@gmail.com>
> > >> > wrote:
> > >> >
> > >> >> Although collecting all configuration keys into a single file is
> > >> >> definitely an anti-pattern I’m not sure the same is true of package
> > or
> > >> >> Maven module level aggregation classes marked LP(CONFIG).Somewhat
> > like
> > >> >> DFSConfigKeys but geared toward our API/release auditing.
> > >> >>
> > >> >> This would seem virtuous for a couple of reasons. Relevant
> > >> configuration
> > >> >> key constants for the package or module would be grouped in a well
> > >> known
> > >> >> place for users and developers alike. The LP(CONFIG) designation
> > would
> > >> >> require developers to think about deprecation cycle if
> contemplating
> > a
> > >> >> change, thus providing some back pressure against snap decisions.
> Or,
> > >> if
> > >> >> not then, then at release candidate evaluation time, user
> > configuration
> > >> >> breaking changes could be caught be a release automation tool that
> > >> diffs
> > >> >> LP(CONFIG) annotated classes. Something like this would improve the
> > >> state
> > >> >> of configuration key management quite dramatically, because
> currently
> > >> it’s
> > >> >> ad hoc.
> > >> >>
> > >> >
> > >> > I am supportive of trying such an effort. We'll need to tweak the
> > >> meaning
> > >> > of LP(CONFIG) as we define it currently, but that can be done. I
> don't
> > >> know
> > >> > what our current tools do or assume regarding this annotation. I
> think
> > >> > there is something custom happening in the compatibility reports
> that
> > we
> > >> > generate as part of each RC.
> > >> >
> > >> >>> On Mar 16, 2022, at 10:46 AM, Bryan Beaudreault
> > >> <bb...@hubspot.com.invalid>
> > >> >>> wrote:
> > >> >>>
> > >> >>> Thanks for your detailed response, Nick!
> > >> >>>
> > >> >>>> I think that none of my comments address your intended topic: how
> > do
> > >> we
> > >> >>> publish our configuration points as an API that can be consumed by
> > >> user
> > >> >>> applications? (Do I have that correct?)
> > >> >>>
> > >> >>> This is a good summary, and I appreciate the other
> > >> >> thoughts/clarifications
> > >> >>> as well. I also realize this is probably hard to get perfect and
> any
> > >> >> choice
> > >> >>> must be weighed against the effort necessary to change/maintain.
> > >> >>>
> > >> >>> One example I know is Hadoop/HDFS, and I bet some on this list
> have
> > >> much
> > >> >>> more knowledge of that project's history than I do. For HDFS they
> > have
> > >> >>> DFSConfigKeys which in my experience does seem to include most
> > >> configs. I
> > >> >>> believe they even have unit tests which verify that all configs in
> > the
> > >> >>> various site.xml files are represented in code. In more recent
> > >> versions
> > >> >>> they have split that class up into smaller groupings, for example
> > >> >>> DfsClientConf and the various inner classes there.
> > >> >>>
> > >> >>> In a vacuum, from a code design perspective, I'm not commenting on
> > >> >> whether
> > >> >>> that's a good or bad pattern. I also don't know of the politics of
> > the
> > >> >>> project or what sorts of pain points they've discovered in that
> > >> pattern
> > >> >>> over the years. But from *user's perspective*, this is a handy way
> > to
> > >> >>> handle things in my opinion.
> > >> >>>
> > >> >>> At my company, in general we try to avoid "magic strings" [1] and
> > >> instead
> > >> >>> always try to use constants. We can and do define our own
> constants
> > to
> > >> >> try
> > >> >>> to mirror some of the "private" magic strings in the hbase client.
> > >> This
> > >> >> is
> > >> >>> better than nothing but even better would be to use hbase-provided
> > >> >>> constants so that we can build more defensive applications, using
> > the
> > >> >>> compiler to verify that the configs we reference still do
> anything.
> > >> >>>
> > >> >>> I unfortunately can't speak to the original issues with HConstants
> > >> that
> > >> >>> turned it into an anti-pattern. What I do notice is there are
> > >> definitely
> > >> >>> examples in the hbase codebase of duplicated config strings, one
> of
> > >> which
> > >> >>> is called out in one of the jiras I linked in my original email.
> > These
> > >> >> are
> > >> >>> just bugs waiting to happen in my opinion, either for hbase itself
> > or
> > >> for
> > >> >>> users which may reference them.
> > >> >>>
> > >> >>> [1] https://deviq.com/antipatterns/magic-strings
> <https://deviq.com/antipatterns/magic-strings>
> > >> <https://deviq.com/antipatterns/magic-strings
> <https://deviq.com/antipatterns/magic-strings>
> >
> > >> >>>> On Wed, Mar 16, 2022 at 10:52 AM Nick Dimiduk <
> ndimiduk@apache.org
> > >
> > >> >> wrote:
> > >> >>>>
> > >> >>>> Hi Bryan,
> > >> >>>>
> > >> >>>> Thanks for bringing this up.
> > >> >>>>
> > >> >>>> I agree with Duo (and I think we have this settled as
> project-wide
> > >> >>>> consensus) that HConstants is/was an anti-pattern, that we are
> > >> actively
> > >> >>>> against adding new fields there, and opportunistically removing
> > >> fields
> > >> >> when
> > >> >>>> we can. Further, the documented meaning of the
> > >> >>>> HBaseInterfaceAudience.CONFIG field is "Denotes class names that
> > >> appear
> > >> >> in
> > >> >>>> user facing configuration files", so this isn't really
> appropriate
> > >> for
> > >> >>>> marking a field that exposes a configuration key to user
> > >> applications. I
> > >> >>>> will also note that there appears to be two categories of tunable
> > >> >>>> parameters -- configuration points that we expect users to tweak
> > are
> > >> >>>> catalogued and documented in the book [0] and everything else is
> > >> left to
> > >> >>>> the obscurity of code-grep.
> > >> >>>>
> > >> >>>> While we are actively squashing use of fields in HConstants, I
> > don't
> > >> >> know
> > >> >>>> that we have proposed some alternative to the user community. For
> > my
> > >> >> part,
> > >> >>>> when I write and review code that involves configuration keys, I
> > >> >> generally
> > >> >>>> implement the key constant string as a private field in an
> > >> appropriate
> > >> >>>> class, and the unit test coverage for that configuration key
> > >> replicates
> > >> >> the
> > >> >>>> string in the test. My reasoning being that the string is a part
> of
> > >> our
> > >> >>>> public API and making a change to the public API should be
> detected
> > >> from
> > >> >>>> the unit test. I also have (on occasion) gone out of my way to
> > write
> > >> >> about
> > >> >>>> the configuration keys in the package or class-level javadoc.
> > >> >>>>
> > >> >>>> I think that none of my comments address your intended topic: how
> > do
> > >> we
> > >> >>>> publish our configuration points as an API that can be consumed
> by
> > >> user
> > >> >>>> applications? (Do I have that correct?)
> > >> >>>>
> > >> >>>> I am of the mind that we don't need/want an API of
> configurations ;
> > >> we
> > >> >> want
> > >> >>>> a catalogue, i.e., what has been started in our book. Perhaps
> > >> >> accompanied
> > >> >>>> by/generated from an authoritative hbase-defaults.xml file. In
> > fact,
> > >> we
> > >> >>>> already do generate from hbase-default.xml, the result is [1] ; I
> > >> don't
> > >> >>>> believe it is authoritative.
> > >> >>>>
> > >> >>>> If we did have an AP thoughI, what would be better than the
> > >> HConstants
> > >> >>>> approach of key-strings as public fields ? What if we had a
> > >> >>>> ConfigurationBuilder type of class, which had methods tied to
> > >> >> configuration
> > >> >>>> keys? I would think that such a globally applicable class would
> > have
> > >> the
> > >> >>>> same maintenance issues as HConstants. But what if we had some
> kind
> > >> of
> > >> >>>> ConfigurationSetter class, perhaps per package, that performed
> this
> > >> >>>> function? That might be maintainable for us and useful for users.
> > >> >>>>
> > >> >>>> I'm keen to hear what other ideas are out there, or better,
> > examples
> > >> and
> > >> >>>> counter-examples from other projects.
> > >> >>>>
> > >> >>>> Thanks,
> > >> >>>> Nick
> > >> >>>>
> > >> >>>> [0]: https://hbase.apache.org/book.html#important_configurations
> <https://hbase.apache.org/book.html#important_configurations>
> > >> <https://hbase.apache.org/book.html#important_configurations
> <https://hbase.apache.org/book.html#important_configurations>
> >
> > >> >>>> <https://hbase.apache.org/book.html#important_configurations
> <https://hbase.apache.org/book.html#important_configurations>
> > >> <https://hbase.apache.org/book.html#important_configurations
> <https://hbase.apache.org/book.html#important_configurations>
> >
> > >> >
> > >> >>>> [1]:
> > https://hbase.apache.org/book.html#hbase_default_configurations
> <https://hbase.apache.org/book.html#hbase_default_configurations>
> > >> <https://hbase.apache.org/book.html#hbase_default_configurations
> <https://hbase.apache.org/book.html#hbase_default_configurations>
> >
> > >> >>>> <https://hbase.apache.org/book.html#hbase_default_configurations
> <https://hbase.apache.org/book.html#hbase_default_configurations>
> > >> <https://hbase.apache.org/book.html#hbase_default_configurations
> <https://hbase.apache.org/book.html#hbase_default_configurations>
> >
> > >> >
> > >> >>>>
> > >> >>>> On Tue, Mar 15, 2022 at 4:28 PM Bryan Beaudreault
> > >> >>>> <bb...@hubspot.com.invalid> wrote:
> > >> >>>>
> > >> >>>>> Hi devs,
> > >> >>>>>
> > >> >>>>> As a major user of hbase, my company has thousands of clients
> > >> deployed
> > >> >>>>> which use the hbase client to connect to a variety of hbase
> > >> clusters.
> > >> >> We
> > >> >>>>> have a common library which handles configuring all clients by
> > >> setting
> > >> >> up
> > >> >>>>> the Configuration object prior to creating a Connection. Our
> > library
> > >> >> sets
> > >> >>>>> configurations using the various configs in HConstants, but
> there
> > >> are
> > >> >>>> also
> > >> >>>>> a bunch of configs which don't exist in HConstants. For these we
> > >> have
> > >> >>>>> hardcoded config strings in our client.
> > >> >>>>>
> > >> >>>>> We're now working on an hbase upgrade and need to go through our
> > >> client
> > >> >>>>> library and check how the configs may have changed in the new
> > >> version.
> > >> >>>> This
> > >> >>>>> is relatively easy to do for those HConstants cases -- configs
> may
> > >> be
> > >> >>>>> marked @Deprecated which will show up in one's editor, they may
> be
> > >> >>>> removed
> > >> >>>>> entirely which would show up is a compile error, and otherwise
> one
> > >> can
> > >> >>>>> easily click through or bring up the javadoc. For the others
> that
> > >> don't
> > >> >>>>> exist in HConstants, we need to go manually search the hbase
> > >> codebase
> > >> >> for
> > >> >>>>> those strings.
> > >> >>>>>
> > >> >>>>> Without doing this painstaking manual process, we would
> > potentially
> > >> >>>> deploy
> > >> >>>>> the upgraded client with configs which are no longer used or
> > >> deprecated
> > >> >>>> by
> > >> >>>>> the hbase client. For those using HConstants, this is
> immediately
> > >> >> obvious
> > >> >>>>> because the HConstant field may have been removed. This is a
> clear
> > >> >>>>> indication of needing to investigate the config. In this case
> it's
> > >> >>>>> preferred to face the compile failure because it's clearer than
> > >> having
> > >> >>>>> something silently disappear or change.
> > >> >>>>>
> > >> >>>>> I opened 3 jiras to move some fields to HConstants, but got some
> > >> >>>> reasonable
> > >> >>>>> pushback from Duo:
> > >> >>>>>
> > >> >>>>> https://issues.apache.org/jira/browse/HBASE-26845
> <https://issues.apache.org/jira/browse/HBASE-26845>
> > >> <https://issues.apache.org/jira/browse/HBASE-26845
> <https://issues.apache.org/jira/browse/HBASE-26845>
> >
> > >> >>>> <https://issues.apache.org/jira/browse/HBASE-26845
> <https://issues.apache.org/jira/browse/HBASE-26845>
> > >> <https://issues.apache.org/jira/browse/HBASE-26845
> <https://issues.apache.org/jira/browse/HBASE-26845>
> >
> > >> >
> > >> >>>>> https://issues.apache.org/jira/browse/HBASE-26846
> <https://issues.apache.org/jira/browse/HBASE-26846>
> > >> <https://issues.apache.org/jira/browse/HBASE-26846
> <https://issues.apache.org/jira/browse/HBASE-26846>
> >
> > >> >>>> <https://issues.apache.org/jira/browse/HBASE-26846
> <https://issues.apache.org/jira/browse/HBASE-26846>
> > >> <https://issues.apache.org/jira/browse/HBASE-26846
> <https://issues.apache.org/jira/browse/HBASE-26846>
> >
> > >> >
> > >> >>>>> https://issues.apache.org/jira/browse/HBASE-26847
> <https://issues.apache.org/jira/browse/HBASE-26847>
> > >> <https://issues.apache.org/jira/browse/HBASE-26847
> <https://issues.apache.org/jira/browse/HBASE-26847>
> >
> > >> >>>> <https://issues.apache.org/jira/browse/HBASE-26847
> <https://issues.apache.org/jira/browse/HBASE-26847>
> > >> <https://issues.apache.org/jira/browse/HBASE-26847
> <https://issues.apache.org/jira/browse/HBASE-26847>
> >
> > >> >
> > >> >>>>>
> > >> >>>>> Duo's pushback is that HConstants is an anti-pattern and these
> > >> configs
> > >> >>>> are
> > >> >>>>> not part of our public API. I can agree that a catch-all
> constants
> > >> >> class
> > >> >>>>> might be an anti-pattern, but would argue that consolidating
> > configs
> > >> >>>> there
> > >> >>>>> is very useful for end-users. I can also potentially agree that
> > >> >> exposing
> > >> >>>>> these as part of our public API might limit the flexibility of
> > >> >>>> development
> > >> >>>>> due to compatibility constraints about IA.Public.
> > >> >>>>>
> > >> >>>>> To me it seems odd to add a configuration, whose whole point is
> to
> > >> make
> > >> >>>>> something tuneable, but then bury it in a private class despite
> > >> having
> > >> >>>> real
> > >> >>>>> implications for how the application runs. If a configuration is
> > not
> > >> >>>> meant
> > >> >>>>> to be tuned, it shouldn't be a configuration at all. Otherwise
> it
> > >> >> should
> > >> >>>> be
> > >> >>>>> exposed for reference.
> > >> >>>>>
> > >> >>>>> I'm wondering if there is some compromise we can achieve which
> > >> makes it
> > >> >>>>> easier for end-users to integrate with tunable configs.
> > >> >>>>>
> > >> >>>>> One can imagine a large project to clean up all of our configs
> > under
> > >> >> some
> > >> >>>>> new class with maybe IA.LimitedPrivate(CONFIG), but I fear
> making
> > >> >> perfect
> > >> >>>>> (needing to migrate all configs) the enemy of good.
> > >> >>>>>
> > >> >>>>> A better option might be to make those classes which expose
> > configs
> > >> >>>>> LimitedPrivate(CONFIG) -- for example AsyncProcess and
> > >> >>>>> ConnectionImplementation. That might be the most incremental
> > change
> > >> we
> > >> >>>>> could make. We could handle this on a case-by-case basis.
> > >> >>>>>
> > >> >>>>> Does anyone have any thoughts?
> > >> >>>>>
> > >> >>>>
> > >> >>
> > >>
> > >
> >
>

Re: [DISCUSS] How to expose configs for users to tune

Posted by Nick Dimiduk <nd...@apache.org>.
One sticking point: ConnectionConfiguration and
AsyncConnectionConfiguration are both IA.Private.

On Thu, Mar 24, 2022 at 17:21 Bryan Beaudreault
<bb...@hubspot.com.invalid> wrote:

> I realized there already exists a good candidate for this in hbase-client
> -- ConnectionConfiguration. My latest commit in
> https://github.com/apache/hbase/pull/4180 adds a new config constant there
> and marks it as LP(Config), but I'd also be happy to revert that part of
> the PR and instead handle that in a dedicated jira for this topic if
> desired.
>
> On Thu, Mar 24, 2022 at 11:08 AM Bryan Beaudreault <
> bbeaudreault@hubspot.com>
> wrote:
>
> > Thank you both for the input here!
> >
> > It seems like we've come to the conclusion that it would be ok to:
> >
> > - Update docs around LP(CONFIG) to say that it also encompasses some
> > module-aggregated classes which hold config constants
> > - Maybe document this convention elsewhere for contributors as well,
> > though need to look into where (guidance welcome, but will look around)
> > - I can create the first constant class in hbase-client for the use-cases
> > in my jira. I also have another config constant I've added in
> > https://github.com/apache/hbase/pull/4180 prior to this discussion, so
> > may want to use that class there.
> > - Look into updates to tooling to audit the LP(CONFIG) classes
> >
> > If this sounds good, I'll create a JIRA to track the work. The first 3
> > todos will be relatively easy, and I'll look into options for the audit
> tool
> >
> >
> > On Mon, Mar 21, 2022 at 12:14 PM Andrew Purtell <
> andrew.purtell@gmail.com>
> > wrote:
> >
> >> Agreed the definition of LP(CONFIG) would need to be tweaked.
> >>
> >> We do not quite have enough support for analyzing configuration key set
> >> changes in the current API audit tool. Removal of a public constant
> field
> >> from an LP annotated class would be flagged but a modification of the
> >> constant would not. However it is a OSS project consisting of Perl
> scripts
> >> that might accept a contribution or at least could be patched or
> extended.
> >> Or we can build a new audit tool for the purpose, which is what I would
> >> recommend. The Perl based tool shells out to javah and is quite
> expensive,
> >> and on some platforms, depending on comparison, can fail due to command
> >> line length limits. A Java tool would likely be more efficient at
> >> processing Java class annotations and introspecting string constants.
> >>
> >> >
> >> > On Mar 21, 2022, at 8:52 AM, Nick Dimiduk <nd...@apache.org>
> wrote:
> >> >
> >> > On Mon, Mar 21, 2022 at 4:32 PM Andrew Purtell <
> >> andrew.purtell@gmail.com>
> >> > wrote:
> >> >
> >> >> Although collecting all configuration keys into a single file is
> >> >> definitely an anti-pattern I’m not sure the same is true of package
> or
> >> >> Maven module level aggregation classes marked LP(CONFIG).Somewhat
> like
> >> >> DFSConfigKeys but geared toward our API/release auditing.
> >> >>
> >> >> This would seem virtuous for a couple of reasons. Relevant
> >> configuration
> >> >> key constants for the package or module would be grouped in a well
> >> known
> >> >> place for users and developers alike. The LP(CONFIG) designation
> would
> >> >> require developers to think about deprecation cycle if contemplating
> a
> >> >> change, thus providing some back pressure against snap decisions. Or,
> >> if
> >> >> not then, then at release candidate evaluation time, user
> configuration
> >> >> breaking changes could be caught be a release automation tool that
> >> diffs
> >> >> LP(CONFIG) annotated classes. Something like this would improve the
> >> state
> >> >> of configuration key management quite dramatically, because currently
> >> it’s
> >> >> ad hoc.
> >> >>
> >> >
> >> > I am supportive of trying such an effort. We'll need to tweak the
> >> meaning
> >> > of LP(CONFIG) as we define it currently, but that can be done. I don't
> >> know
> >> > what our current tools do or assume regarding this annotation. I think
> >> > there is something custom happening in the compatibility reports that
> we
> >> > generate as part of each RC.
> >> >
> >> >>> On Mar 16, 2022, at 10:46 AM, Bryan Beaudreault
> >> <bb...@hubspot.com.invalid>
> >> >>> wrote:
> >> >>>
> >> >>> Thanks for your detailed response, Nick!
> >> >>>
> >> >>>> I think that none of my comments address your intended topic: how
> do
> >> we
> >> >>> publish our configuration points as an API that can be consumed by
> >> user
> >> >>> applications? (Do I have that correct?)
> >> >>>
> >> >>> This is a good summary, and I appreciate the other
> >> >> thoughts/clarifications
> >> >>> as well. I also realize this is probably hard to get perfect and any
> >> >> choice
> >> >>> must be weighed against the effort necessary to change/maintain.
> >> >>>
> >> >>> One example I know is Hadoop/HDFS, and I bet some on this list have
> >> much
> >> >>> more knowledge of that project's history than I do. For HDFS they
> have
> >> >>> DFSConfigKeys which in my experience does seem to include most
> >> configs. I
> >> >>> believe they even have unit tests which verify that all configs in
> the
> >> >>> various site.xml files are represented in code. In more recent
> >> versions
> >> >>> they have split that class up into smaller groupings, for example
> >> >>> DfsClientConf and the various inner classes there.
> >> >>>
> >> >>> In a vacuum, from a code design perspective, I'm not commenting on
> >> >> whether
> >> >>> that's a good or bad pattern. I also don't know of the politics of
> the
> >> >>> project or what sorts of pain points they've discovered in that
> >> pattern
> >> >>> over the years. But from *user's perspective*, this is a handy way
> to
> >> >>> handle things in my opinion.
> >> >>>
> >> >>> At my company, in general we try to avoid "magic strings" [1] and
> >> instead
> >> >>> always try to use constants. We can and do define our own constants
> to
> >> >> try
> >> >>> to mirror some of the "private" magic strings in the hbase client.
> >> This
> >> >> is
> >> >>> better than nothing but even better would be to use hbase-provided
> >> >>> constants so that we can build more defensive applications, using
> the
> >> >>> compiler to verify that the configs we reference still do anything.
> >> >>>
> >> >>> I unfortunately can't speak to the original issues with HConstants
> >> that
> >> >>> turned it into an anti-pattern. What I do notice is there are
> >> definitely
> >> >>> examples in the hbase codebase of duplicated config strings, one of
> >> which
> >> >>> is called out in one of the jiras I linked in my original email.
> These
> >> >> are
> >> >>> just bugs waiting to happen in my opinion, either for hbase itself
> or
> >> for
> >> >>> users which may reference them.
> >> >>>
> >> >>> [1] https://deviq.com/antipatterns/magic-strings
> >> <https://deviq.com/antipatterns/magic-strings>
> >> >>>> On Wed, Mar 16, 2022 at 10:52 AM Nick Dimiduk <ndimiduk@apache.org
> >
> >> >> wrote:
> >> >>>>
> >> >>>> Hi Bryan,
> >> >>>>
> >> >>>> Thanks for bringing this up.
> >> >>>>
> >> >>>> I agree with Duo (and I think we have this settled as project-wide
> >> >>>> consensus) that HConstants is/was an anti-pattern, that we are
> >> actively
> >> >>>> against adding new fields there, and opportunistically removing
> >> fields
> >> >> when
> >> >>>> we can. Further, the documented meaning of the
> >> >>>> HBaseInterfaceAudience.CONFIG field is "Denotes class names that
> >> appear
> >> >> in
> >> >>>> user facing configuration files", so this isn't really appropriate
> >> for
> >> >>>> marking a field that exposes a configuration key to user
> >> applications. I
> >> >>>> will also note that there appears to be two categories of tunable
> >> >>>> parameters -- configuration points that we expect users to tweak
> are
> >> >>>> catalogued and documented in the book [0] and everything else is
> >> left to
> >> >>>> the obscurity of code-grep.
> >> >>>>
> >> >>>> While we are actively squashing use of fields in HConstants, I
> don't
> >> >> know
> >> >>>> that we have proposed some alternative to the user community. For
> my
> >> >> part,
> >> >>>> when I write and review code that involves configuration keys, I
> >> >> generally
> >> >>>> implement the key constant string as a private field in an
> >> appropriate
> >> >>>> class, and the unit test coverage for that configuration key
> >> replicates
> >> >> the
> >> >>>> string in the test. My reasoning being that the string is a part of
> >> our
> >> >>>> public API and making a change to the public API should be detected
> >> from
> >> >>>> the unit test. I also have (on occasion) gone out of my way to
> write
> >> >> about
> >> >>>> the configuration keys in the package or class-level javadoc.
> >> >>>>
> >> >>>> I think that none of my comments address your intended topic: how
> do
> >> we
> >> >>>> publish our configuration points as an API that can be consumed by
> >> user
> >> >>>> applications? (Do I have that correct?)
> >> >>>>
> >> >>>> I am of the mind that we don't need/want an API of configurations ;
> >> we
> >> >> want
> >> >>>> a catalogue, i.e., what has been started in our book. Perhaps
> >> >> accompanied
> >> >>>> by/generated from an authoritative hbase-defaults.xml file. In
> fact,
> >> we
> >> >>>> already do generate from hbase-default.xml, the result is [1] ; I
> >> don't
> >> >>>> believe it is authoritative.
> >> >>>>
> >> >>>> If we did have an AP thoughI, what would be better than the
> >> HConstants
> >> >>>> approach of key-strings as public fields ? What if we had a
> >> >>>> ConfigurationBuilder type of class, which had methods tied to
> >> >> configuration
> >> >>>> keys? I would think that such a globally applicable class would
> have
> >> the
> >> >>>> same maintenance issues as HConstants. But what if we had some kind
> >> of
> >> >>>> ConfigurationSetter class, perhaps per package, that performed this
> >> >>>> function? That might be maintainable for us and useful for users.
> >> >>>>
> >> >>>> I'm keen to hear what other ideas are out there, or better,
> examples
> >> and
> >> >>>> counter-examples from other projects.
> >> >>>>
> >> >>>> Thanks,
> >> >>>> Nick
> >> >>>>
> >> >>>> [0]: https://hbase.apache.org/book.html#important_configurations
> >> <https://hbase.apache.org/book.html#important_configurations>
> >> >>>> <https://hbase.apache.org/book.html#important_configurations
> >> <https://hbase.apache.org/book.html#important_configurations>
> >> >
> >> >>>> [1]:
> https://hbase.apache.org/book.html#hbase_default_configurations
> >> <https://hbase.apache.org/book.html#hbase_default_configurations>
> >> >>>> <https://hbase.apache.org/book.html#hbase_default_configurations
> >> <https://hbase.apache.org/book.html#hbase_default_configurations>
> >> >
> >> >>>>
> >> >>>> On Tue, Mar 15, 2022 at 4:28 PM Bryan Beaudreault
> >> >>>> <bb...@hubspot.com.invalid> wrote:
> >> >>>>
> >> >>>>> Hi devs,
> >> >>>>>
> >> >>>>> As a major user of hbase, my company has thousands of clients
> >> deployed
> >> >>>>> which use the hbase client to connect to a variety of hbase
> >> clusters.
> >> >> We
> >> >>>>> have a common library which handles configuring all clients by
> >> setting
> >> >> up
> >> >>>>> the Configuration object prior to creating a Connection. Our
> library
> >> >> sets
> >> >>>>> configurations using the various configs in HConstants, but there
> >> are
> >> >>>> also
> >> >>>>> a bunch of configs which don't exist in HConstants. For these we
> >> have
> >> >>>>> hardcoded config strings in our client.
> >> >>>>>
> >> >>>>> We're now working on an hbase upgrade and need to go through our
> >> client
> >> >>>>> library and check how the configs may have changed in the new
> >> version.
> >> >>>> This
> >> >>>>> is relatively easy to do for those HConstants cases -- configs may
> >> be
> >> >>>>> marked @Deprecated which will show up in one's editor, they may be
> >> >>>> removed
> >> >>>>> entirely which would show up is a compile error, and otherwise one
> >> can
> >> >>>>> easily click through or bring up the javadoc. For the others that
> >> don't
> >> >>>>> exist in HConstants, we need to go manually search the hbase
> >> codebase
> >> >> for
> >> >>>>> those strings.
> >> >>>>>
> >> >>>>> Without doing this painstaking manual process, we would
> potentially
> >> >>>> deploy
> >> >>>>> the upgraded client with configs which are no longer used or
> >> deprecated
> >> >>>> by
> >> >>>>> the hbase client. For those using HConstants, this is immediately
> >> >> obvious
> >> >>>>> because the HConstant field may have been removed. This is a clear
> >> >>>>> indication of needing to investigate the config. In this case it's
> >> >>>>> preferred to face the compile failure because it's clearer than
> >> having
> >> >>>>> something silently disappear or change.
> >> >>>>>
> >> >>>>> I opened 3 jiras to move some fields to HConstants, but got some
> >> >>>> reasonable
> >> >>>>> pushback from Duo:
> >> >>>>>
> >> >>>>> https://issues.apache.org/jira/browse/HBASE-26845
> >> <https://issues.apache.org/jira/browse/HBASE-26845>
> >> >>>> <https://issues.apache.org/jira/browse/HBASE-26845
> >> <https://issues.apache.org/jira/browse/HBASE-26845>
> >> >
> >> >>>>> https://issues.apache.org/jira/browse/HBASE-26846
> >> <https://issues.apache.org/jira/browse/HBASE-26846>
> >> >>>> <https://issues.apache.org/jira/browse/HBASE-26846
> >> <https://issues.apache.org/jira/browse/HBASE-26846>
> >> >
> >> >>>>> https://issues.apache.org/jira/browse/HBASE-26847
> >> <https://issues.apache.org/jira/browse/HBASE-26847>
> >> >>>> <https://issues.apache.org/jira/browse/HBASE-26847
> >> <https://issues.apache.org/jira/browse/HBASE-26847>
> >> >
> >> >>>>>
> >> >>>>> Duo's pushback is that HConstants is an anti-pattern and these
> >> configs
> >> >>>> are
> >> >>>>> not part of our public API. I can agree that a catch-all constants
> >> >> class
> >> >>>>> might be an anti-pattern, but would argue that consolidating
> configs
> >> >>>> there
> >> >>>>> is very useful for end-users. I can also potentially agree that
> >> >> exposing
> >> >>>>> these as part of our public API might limit the flexibility of
> >> >>>> development
> >> >>>>> due to compatibility constraints about IA.Public.
> >> >>>>>
> >> >>>>> To me it seems odd to add a configuration, whose whole point is to
> >> make
> >> >>>>> something tuneable, but then bury it in a private class despite
> >> having
> >> >>>> real
> >> >>>>> implications for how the application runs. If a configuration is
> not
> >> >>>> meant
> >> >>>>> to be tuned, it shouldn't be a configuration at all. Otherwise it
> >> >> should
> >> >>>> be
> >> >>>>> exposed for reference.
> >> >>>>>
> >> >>>>> I'm wondering if there is some compromise we can achieve which
> >> makes it
> >> >>>>> easier for end-users to integrate with tunable configs.
> >> >>>>>
> >> >>>>> One can imagine a large project to clean up all of our configs
> under
> >> >> some
> >> >>>>> new class with maybe IA.LimitedPrivate(CONFIG), but I fear making
> >> >> perfect
> >> >>>>> (needing to migrate all configs) the enemy of good.
> >> >>>>>
> >> >>>>> A better option might be to make those classes which expose
> configs
> >> >>>>> LimitedPrivate(CONFIG) -- for example AsyncProcess and
> >> >>>>> ConnectionImplementation. That might be the most incremental
> change
> >> we
> >> >>>>> could make. We could handle this on a case-by-case basis.
> >> >>>>>
> >> >>>>> Does anyone have any thoughts?
> >> >>>>>
> >> >>>>
> >> >>
> >>
> >
>

Re: [DISCUSS] How to expose configs for users to tune

Posted by Bryan Beaudreault <bb...@hubspot.com.INVALID>.
I realized there already exists a good candidate for this in hbase-client
-- ConnectionConfiguration. My latest commit in
https://github.com/apache/hbase/pull/4180 adds a new config constant there
and marks it as LP(Config), but I'd also be happy to revert that part of
the PR and instead handle that in a dedicated jira for this topic if
desired.

On Thu, Mar 24, 2022 at 11:08 AM Bryan Beaudreault <bb...@hubspot.com>
wrote:

> Thank you both for the input here!
>
> It seems like we've come to the conclusion that it would be ok to:
>
> - Update docs around LP(CONFIG) to say that it also encompasses some
> module-aggregated classes which hold config constants
> - Maybe document this convention elsewhere for contributors as well,
> though need to look into where (guidance welcome, but will look around)
> - I can create the first constant class in hbase-client for the use-cases
> in my jira. I also have another config constant I've added in
> https://github.com/apache/hbase/pull/4180 prior to this discussion, so
> may want to use that class there.
> - Look into updates to tooling to audit the LP(CONFIG) classes
>
> If this sounds good, I'll create a JIRA to track the work. The first 3
> todos will be relatively easy, and I'll look into options for the audit tool
>
>
> On Mon, Mar 21, 2022 at 12:14 PM Andrew Purtell <an...@gmail.com>
> wrote:
>
>> Agreed the definition of LP(CONFIG) would need to be tweaked.
>>
>> We do not quite have enough support for analyzing configuration key set
>> changes in the current API audit tool. Removal of a public constant field
>> from an LP annotated class would be flagged but a modification of the
>> constant would not. However it is a OSS project consisting of Perl scripts
>> that might accept a contribution or at least could be patched or extended.
>> Or we can build a new audit tool for the purpose, which is what I would
>> recommend. The Perl based tool shells out to javah and is quite expensive,
>> and on some platforms, depending on comparison, can fail due to command
>> line length limits. A Java tool would likely be more efficient at
>> processing Java class annotations and introspecting string constants.
>>
>> >
>> > On Mar 21, 2022, at 8:52 AM, Nick Dimiduk <nd...@apache.org> wrote:
>> >
>> > On Mon, Mar 21, 2022 at 4:32 PM Andrew Purtell <
>> andrew.purtell@gmail.com>
>> > wrote:
>> >
>> >> Although collecting all configuration keys into a single file is
>> >> definitely an anti-pattern I’m not sure the same is true of package or
>> >> Maven module level aggregation classes marked LP(CONFIG).Somewhat like
>> >> DFSConfigKeys but geared toward our API/release auditing.
>> >>
>> >> This would seem virtuous for a couple of reasons. Relevant
>> configuration
>> >> key constants for the package or module would be grouped in a well
>> known
>> >> place for users and developers alike. The LP(CONFIG) designation would
>> >> require developers to think about deprecation cycle if contemplating a
>> >> change, thus providing some back pressure against snap decisions. Or,
>> if
>> >> not then, then at release candidate evaluation time, user configuration
>> >> breaking changes could be caught be a release automation tool that
>> diffs
>> >> LP(CONFIG) annotated classes. Something like this would improve the
>> state
>> >> of configuration key management quite dramatically, because currently
>> it’s
>> >> ad hoc.
>> >>
>> >
>> > I am supportive of trying such an effort. We'll need to tweak the
>> meaning
>> > of LP(CONFIG) as we define it currently, but that can be done. I don't
>> know
>> > what our current tools do or assume regarding this annotation. I think
>> > there is something custom happening in the compatibility reports that we
>> > generate as part of each RC.
>> >
>> >>> On Mar 16, 2022, at 10:46 AM, Bryan Beaudreault
>> <bb...@hubspot.com.invalid>
>> >>> wrote:
>> >>>
>> >>> Thanks for your detailed response, Nick!
>> >>>
>> >>>> I think that none of my comments address your intended topic: how do
>> we
>> >>> publish our configuration points as an API that can be consumed by
>> user
>> >>> applications? (Do I have that correct?)
>> >>>
>> >>> This is a good summary, and I appreciate the other
>> >> thoughts/clarifications
>> >>> as well. I also realize this is probably hard to get perfect and any
>> >> choice
>> >>> must be weighed against the effort necessary to change/maintain.
>> >>>
>> >>> One example I know is Hadoop/HDFS, and I bet some on this list have
>> much
>> >>> more knowledge of that project's history than I do. For HDFS they have
>> >>> DFSConfigKeys which in my experience does seem to include most
>> configs. I
>> >>> believe they even have unit tests which verify that all configs in the
>> >>> various site.xml files are represented in code. In more recent
>> versions
>> >>> they have split that class up into smaller groupings, for example
>> >>> DfsClientConf and the various inner classes there.
>> >>>
>> >>> In a vacuum, from a code design perspective, I'm not commenting on
>> >> whether
>> >>> that's a good or bad pattern. I also don't know of the politics of the
>> >>> project or what sorts of pain points they've discovered in that
>> pattern
>> >>> over the years. But from *user's perspective*, this is a handy way to
>> >>> handle things in my opinion.
>> >>>
>> >>> At my company, in general we try to avoid "magic strings" [1] and
>> instead
>> >>> always try to use constants. We can and do define our own constants to
>> >> try
>> >>> to mirror some of the "private" magic strings in the hbase client.
>> This
>> >> is
>> >>> better than nothing but even better would be to use hbase-provided
>> >>> constants so that we can build more defensive applications, using the
>> >>> compiler to verify that the configs we reference still do anything.
>> >>>
>> >>> I unfortunately can't speak to the original issues with HConstants
>> that
>> >>> turned it into an anti-pattern. What I do notice is there are
>> definitely
>> >>> examples in the hbase codebase of duplicated config strings, one of
>> which
>> >>> is called out in one of the jiras I linked in my original email. These
>> >> are
>> >>> just bugs waiting to happen in my opinion, either for hbase itself or
>> for
>> >>> users which may reference them.
>> >>>
>> >>> [1] https://deviq.com/antipatterns/magic-strings
>> <https://deviq.com/antipatterns/magic-strings>
>> >>>> On Wed, Mar 16, 2022 at 10:52 AM Nick Dimiduk <nd...@apache.org>
>> >> wrote:
>> >>>>
>> >>>> Hi Bryan,
>> >>>>
>> >>>> Thanks for bringing this up.
>> >>>>
>> >>>> I agree with Duo (and I think we have this settled as project-wide
>> >>>> consensus) that HConstants is/was an anti-pattern, that we are
>> actively
>> >>>> against adding new fields there, and opportunistically removing
>> fields
>> >> when
>> >>>> we can. Further, the documented meaning of the
>> >>>> HBaseInterfaceAudience.CONFIG field is "Denotes class names that
>> appear
>> >> in
>> >>>> user facing configuration files", so this isn't really appropriate
>> for
>> >>>> marking a field that exposes a configuration key to user
>> applications. I
>> >>>> will also note that there appears to be two categories of tunable
>> >>>> parameters -- configuration points that we expect users to tweak are
>> >>>> catalogued and documented in the book [0] and everything else is
>> left to
>> >>>> the obscurity of code-grep.
>> >>>>
>> >>>> While we are actively squashing use of fields in HConstants, I don't
>> >> know
>> >>>> that we have proposed some alternative to the user community. For my
>> >> part,
>> >>>> when I write and review code that involves configuration keys, I
>> >> generally
>> >>>> implement the key constant string as a private field in an
>> appropriate
>> >>>> class, and the unit test coverage for that configuration key
>> replicates
>> >> the
>> >>>> string in the test. My reasoning being that the string is a part of
>> our
>> >>>> public API and making a change to the public API should be detected
>> from
>> >>>> the unit test. I also have (on occasion) gone out of my way to write
>> >> about
>> >>>> the configuration keys in the package or class-level javadoc.
>> >>>>
>> >>>> I think that none of my comments address your intended topic: how do
>> we
>> >>>> publish our configuration points as an API that can be consumed by
>> user
>> >>>> applications? (Do I have that correct?)
>> >>>>
>> >>>> I am of the mind that we don't need/want an API of configurations ;
>> we
>> >> want
>> >>>> a catalogue, i.e., what has been started in our book. Perhaps
>> >> accompanied
>> >>>> by/generated from an authoritative hbase-defaults.xml file. In fact,
>> we
>> >>>> already do generate from hbase-default.xml, the result is [1] ; I
>> don't
>> >>>> believe it is authoritative.
>> >>>>
>> >>>> If we did have an AP thoughI, what would be better than the
>> HConstants
>> >>>> approach of key-strings as public fields ? What if we had a
>> >>>> ConfigurationBuilder type of class, which had methods tied to
>> >> configuration
>> >>>> keys? I would think that such a globally applicable class would have
>> the
>> >>>> same maintenance issues as HConstants. But what if we had some kind
>> of
>> >>>> ConfigurationSetter class, perhaps per package, that performed this
>> >>>> function? That might be maintainable for us and useful for users.
>> >>>>
>> >>>> I'm keen to hear what other ideas are out there, or better, examples
>> and
>> >>>> counter-examples from other projects.
>> >>>>
>> >>>> Thanks,
>> >>>> Nick
>> >>>>
>> >>>> [0]: https://hbase.apache.org/book.html#important_configurations
>> <https://hbase.apache.org/book.html#important_configurations>
>> >>>> <https://hbase.apache.org/book.html#important_configurations
>> <https://hbase.apache.org/book.html#important_configurations>
>> >
>> >>>> [1]: https://hbase.apache.org/book.html#hbase_default_configurations
>> <https://hbase.apache.org/book.html#hbase_default_configurations>
>> >>>> <https://hbase.apache.org/book.html#hbase_default_configurations
>> <https://hbase.apache.org/book.html#hbase_default_configurations>
>> >
>> >>>>
>> >>>> On Tue, Mar 15, 2022 at 4:28 PM Bryan Beaudreault
>> >>>> <bb...@hubspot.com.invalid> wrote:
>> >>>>
>> >>>>> Hi devs,
>> >>>>>
>> >>>>> As a major user of hbase, my company has thousands of clients
>> deployed
>> >>>>> which use the hbase client to connect to a variety of hbase
>> clusters.
>> >> We
>> >>>>> have a common library which handles configuring all clients by
>> setting
>> >> up
>> >>>>> the Configuration object prior to creating a Connection. Our library
>> >> sets
>> >>>>> configurations using the various configs in HConstants, but there
>> are
>> >>>> also
>> >>>>> a bunch of configs which don't exist in HConstants. For these we
>> have
>> >>>>> hardcoded config strings in our client.
>> >>>>>
>> >>>>> We're now working on an hbase upgrade and need to go through our
>> client
>> >>>>> library and check how the configs may have changed in the new
>> version.
>> >>>> This
>> >>>>> is relatively easy to do for those HConstants cases -- configs may
>> be
>> >>>>> marked @Deprecated which will show up in one's editor, they may be
>> >>>> removed
>> >>>>> entirely which would show up is a compile error, and otherwise one
>> can
>> >>>>> easily click through or bring up the javadoc. For the others that
>> don't
>> >>>>> exist in HConstants, we need to go manually search the hbase
>> codebase
>> >> for
>> >>>>> those strings.
>> >>>>>
>> >>>>> Without doing this painstaking manual process, we would potentially
>> >>>> deploy
>> >>>>> the upgraded client with configs which are no longer used or
>> deprecated
>> >>>> by
>> >>>>> the hbase client. For those using HConstants, this is immediately
>> >> obvious
>> >>>>> because the HConstant field may have been removed. This is a clear
>> >>>>> indication of needing to investigate the config. In this case it's
>> >>>>> preferred to face the compile failure because it's clearer than
>> having
>> >>>>> something silently disappear or change.
>> >>>>>
>> >>>>> I opened 3 jiras to move some fields to HConstants, but got some
>> >>>> reasonable
>> >>>>> pushback from Duo:
>> >>>>>
>> >>>>> https://issues.apache.org/jira/browse/HBASE-26845
>> <https://issues.apache.org/jira/browse/HBASE-26845>
>> >>>> <https://issues.apache.org/jira/browse/HBASE-26845
>> <https://issues.apache.org/jira/browse/HBASE-26845>
>> >
>> >>>>> https://issues.apache.org/jira/browse/HBASE-26846
>> <https://issues.apache.org/jira/browse/HBASE-26846>
>> >>>> <https://issues.apache.org/jira/browse/HBASE-26846
>> <https://issues.apache.org/jira/browse/HBASE-26846>
>> >
>> >>>>> https://issues.apache.org/jira/browse/HBASE-26847
>> <https://issues.apache.org/jira/browse/HBASE-26847>
>> >>>> <https://issues.apache.org/jira/browse/HBASE-26847
>> <https://issues.apache.org/jira/browse/HBASE-26847>
>> >
>> >>>>>
>> >>>>> Duo's pushback is that HConstants is an anti-pattern and these
>> configs
>> >>>> are
>> >>>>> not part of our public API. I can agree that a catch-all constants
>> >> class
>> >>>>> might be an anti-pattern, but would argue that consolidating configs
>> >>>> there
>> >>>>> is very useful for end-users. I can also potentially agree that
>> >> exposing
>> >>>>> these as part of our public API might limit the flexibility of
>> >>>> development
>> >>>>> due to compatibility constraints about IA.Public.
>> >>>>>
>> >>>>> To me it seems odd to add a configuration, whose whole point is to
>> make
>> >>>>> something tuneable, but then bury it in a private class despite
>> having
>> >>>> real
>> >>>>> implications for how the application runs. If a configuration is not
>> >>>> meant
>> >>>>> to be tuned, it shouldn't be a configuration at all. Otherwise it
>> >> should
>> >>>> be
>> >>>>> exposed for reference.
>> >>>>>
>> >>>>> I'm wondering if there is some compromise we can achieve which
>> makes it
>> >>>>> easier for end-users to integrate with tunable configs.
>> >>>>>
>> >>>>> One can imagine a large project to clean up all of our configs under
>> >> some
>> >>>>> new class with maybe IA.LimitedPrivate(CONFIG), but I fear making
>> >> perfect
>> >>>>> (needing to migrate all configs) the enemy of good.
>> >>>>>
>> >>>>> A better option might be to make those classes which expose configs
>> >>>>> LimitedPrivate(CONFIG) -- for example AsyncProcess and
>> >>>>> ConnectionImplementation. That might be the most incremental change
>> we
>> >>>>> could make. We could handle this on a case-by-case basis.
>> >>>>>
>> >>>>> Does anyone have any thoughts?
>> >>>>>
>> >>>>
>> >>
>>
>

Re: [DISCUSS] How to expose configs for users to tune

Posted by Bryan Beaudreault <bb...@hubspot.com.INVALID>.
Thank you both for the input here!

It seems like we've come to the conclusion that it would be ok to:

- Update docs around LP(CONFIG) to say that it also encompasses some
module-aggregated classes which hold config constants
- Maybe document this convention elsewhere for contributors as well, though
need to look into where (guidance welcome, but will look around)
- I can create the first constant class in hbase-client for the use-cases
in my jira. I also have another config constant I've added in
https://github.com/apache/hbase/pull/4180 prior to this discussion, so may
want to use that class there.
- Look into updates to tooling to audit the LP(CONFIG) classes

If this sounds good, I'll create a JIRA to track the work. The first 3
todos will be relatively easy, and I'll look into options for the audit tool


On Mon, Mar 21, 2022 at 12:14 PM Andrew Purtell <an...@gmail.com>
wrote:

> Agreed the definition of LP(CONFIG) would need to be tweaked.
>
> We do not quite have enough support for analyzing configuration key set
> changes in the current API audit tool. Removal of a public constant field
> from an LP annotated class would be flagged but a modification of the
> constant would not. However it is a OSS project consisting of Perl scripts
> that might accept a contribution or at least could be patched or extended.
> Or we can build a new audit tool for the purpose, which is what I would
> recommend. The Perl based tool shells out to javah and is quite expensive,
> and on some platforms, depending on comparison, can fail due to command
> line length limits. A Java tool would likely be more efficient at
> processing Java class annotations and introspecting string constants.
>
> >
> > On Mar 21, 2022, at 8:52 AM, Nick Dimiduk <nd...@apache.org> wrote:
> >
> > On Mon, Mar 21, 2022 at 4:32 PM Andrew Purtell <
> andrew.purtell@gmail.com>
> > wrote:
> >
> >> Although collecting all configuration keys into a single file is
> >> definitely an anti-pattern I’m not sure the same is true of package or
> >> Maven module level aggregation classes marked LP(CONFIG).Somewhat like
> >> DFSConfigKeys but geared toward our API/release auditing.
> >>
> >> This would seem virtuous for a couple of reasons. Relevant configuration
> >> key constants for the package or module would be grouped in a well known
> >> place for users and developers alike. The LP(CONFIG) designation would
> >> require developers to think about deprecation cycle if contemplating a
> >> change, thus providing some back pressure against snap decisions. Or, if
> >> not then, then at release candidate evaluation time, user configuration
> >> breaking changes could be caught be a release automation tool that diffs
> >> LP(CONFIG) annotated classes. Something like this would improve the
> state
> >> of configuration key management quite dramatically, because currently
> it’s
> >> ad hoc.
> >>
> >
> > I am supportive of trying such an effort. We'll need to tweak the meaning
> > of LP(CONFIG) as we define it currently, but that can be done. I don't
> know
> > what our current tools do or assume regarding this annotation. I think
> > there is something custom happening in the compatibility reports that we
> > generate as part of each RC.
> >
> >>> On Mar 16, 2022, at 10:46 AM, Bryan Beaudreault
> <bb...@hubspot.com.invalid>
> >>> wrote:
> >>>
> >>> Thanks for your detailed response, Nick!
> >>>
> >>>> I think that none of my comments address your intended topic: how do
> we
> >>> publish our configuration points as an API that can be consumed by user
> >>> applications? (Do I have that correct?)
> >>>
> >>> This is a good summary, and I appreciate the other
> >> thoughts/clarifications
> >>> as well. I also realize this is probably hard to get perfect and any
> >> choice
> >>> must be weighed against the effort necessary to change/maintain.
> >>>
> >>> One example I know is Hadoop/HDFS, and I bet some on this list have
> much
> >>> more knowledge of that project's history than I do. For HDFS they have
> >>> DFSConfigKeys which in my experience does seem to include most
> configs. I
> >>> believe they even have unit tests which verify that all configs in the
> >>> various site.xml files are represented in code. In more recent versions
> >>> they have split that class up into smaller groupings, for example
> >>> DfsClientConf and the various inner classes there.
> >>>
> >>> In a vacuum, from a code design perspective, I'm not commenting on
> >> whether
> >>> that's a good or bad pattern. I also don't know of the politics of the
> >>> project or what sorts of pain points they've discovered in that pattern
> >>> over the years. But from *user's perspective*, this is a handy way to
> >>> handle things in my opinion.
> >>>
> >>> At my company, in general we try to avoid "magic strings" [1] and
> instead
> >>> always try to use constants. We can and do define our own constants to
> >> try
> >>> to mirror some of the "private" magic strings in the hbase client. This
> >> is
> >>> better than nothing but even better would be to use hbase-provided
> >>> constants so that we can build more defensive applications, using the
> >>> compiler to verify that the configs we reference still do anything.
> >>>
> >>> I unfortunately can't speak to the original issues with HConstants that
> >>> turned it into an anti-pattern. What I do notice is there are
> definitely
> >>> examples in the hbase codebase of duplicated config strings, one of
> which
> >>> is called out in one of the jiras I linked in my original email. These
> >> are
> >>> just bugs waiting to happen in my opinion, either for hbase itself or
> for
> >>> users which may reference them.
> >>>
> >>> [1] https://deviq.com/antipatterns/magic-strings
> <https://deviq.com/antipatterns/magic-strings>
> >>>> On Wed, Mar 16, 2022 at 10:52 AM Nick Dimiduk <nd...@apache.org>
> >> wrote:
> >>>>
> >>>> Hi Bryan,
> >>>>
> >>>> Thanks for bringing this up.
> >>>>
> >>>> I agree with Duo (and I think we have this settled as project-wide
> >>>> consensus) that HConstants is/was an anti-pattern, that we are
> actively
> >>>> against adding new fields there, and opportunistically removing fields
> >> when
> >>>> we can. Further, the documented meaning of the
> >>>> HBaseInterfaceAudience.CONFIG field is "Denotes class names that
> appear
> >> in
> >>>> user facing configuration files", so this isn't really appropriate for
> >>>> marking a field that exposes a configuration key to user
> applications. I
> >>>> will also note that there appears to be two categories of tunable
> >>>> parameters -- configuration points that we expect users to tweak are
> >>>> catalogued and documented in the book [0] and everything else is left
> to
> >>>> the obscurity of code-grep.
> >>>>
> >>>> While we are actively squashing use of fields in HConstants, I don't
> >> know
> >>>> that we have proposed some alternative to the user community. For my
> >> part,
> >>>> when I write and review code that involves configuration keys, I
> >> generally
> >>>> implement the key constant string as a private field in an appropriate
> >>>> class, and the unit test coverage for that configuration key
> replicates
> >> the
> >>>> string in the test. My reasoning being that the string is a part of
> our
> >>>> public API and making a change to the public API should be detected
> from
> >>>> the unit test. I also have (on occasion) gone out of my way to write
> >> about
> >>>> the configuration keys in the package or class-level javadoc.
> >>>>
> >>>> I think that none of my comments address your intended topic: how do
> we
> >>>> publish our configuration points as an API that can be consumed by
> user
> >>>> applications? (Do I have that correct?)
> >>>>
> >>>> I am of the mind that we don't need/want an API of configurations ; we
> >> want
> >>>> a catalogue, i.e., what has been started in our book. Perhaps
> >> accompanied
> >>>> by/generated from an authoritative hbase-defaults.xml file. In fact,
> we
> >>>> already do generate from hbase-default.xml, the result is [1] ; I
> don't
> >>>> believe it is authoritative.
> >>>>
> >>>> If we did have an AP thoughI, what would be better than the HConstants
> >>>> approach of key-strings as public fields ? What if we had a
> >>>> ConfigurationBuilder type of class, which had methods tied to
> >> configuration
> >>>> keys? I would think that such a globally applicable class would have
> the
> >>>> same maintenance issues as HConstants. But what if we had some kind of
> >>>> ConfigurationSetter class, perhaps per package, that performed this
> >>>> function? That might be maintainable for us and useful for users.
> >>>>
> >>>> I'm keen to hear what other ideas are out there, or better, examples
> and
> >>>> counter-examples from other projects.
> >>>>
> >>>> Thanks,
> >>>> Nick
> >>>>
> >>>> [0]: https://hbase.apache.org/book.html#important_configurations
> <https://hbase.apache.org/book.html#important_configurations>
> >>>> <https://hbase.apache.org/book.html#important_configurations
> <https://hbase.apache.org/book.html#important_configurations>
> >
> >>>> [1]: https://hbase.apache.org/book.html#hbase_default_configurations
> <https://hbase.apache.org/book.html#hbase_default_configurations>
> >>>> <https://hbase.apache.org/book.html#hbase_default_configurations
> <https://hbase.apache.org/book.html#hbase_default_configurations>
> >
> >>>>
> >>>> On Tue, Mar 15, 2022 at 4:28 PM Bryan Beaudreault
> >>>> <bb...@hubspot.com.invalid> wrote:
> >>>>
> >>>>> Hi devs,
> >>>>>
> >>>>> As a major user of hbase, my company has thousands of clients
> deployed
> >>>>> which use the hbase client to connect to a variety of hbase clusters.
> >> We
> >>>>> have a common library which handles configuring all clients by
> setting
> >> up
> >>>>> the Configuration object prior to creating a Connection. Our library
> >> sets
> >>>>> configurations using the various configs in HConstants, but there are
> >>>> also
> >>>>> a bunch of configs which don't exist in HConstants. For these we have
> >>>>> hardcoded config strings in our client.
> >>>>>
> >>>>> We're now working on an hbase upgrade and need to go through our
> client
> >>>>> library and check how the configs may have changed in the new
> version.
> >>>> This
> >>>>> is relatively easy to do for those HConstants cases -- configs may be
> >>>>> marked @Deprecated which will show up in one's editor, they may be
> >>>> removed
> >>>>> entirely which would show up is a compile error, and otherwise one
> can
> >>>>> easily click through or bring up the javadoc. For the others that
> don't
> >>>>> exist in HConstants, we need to go manually search the hbase codebase
> >> for
> >>>>> those strings.
> >>>>>
> >>>>> Without doing this painstaking manual process, we would potentially
> >>>> deploy
> >>>>> the upgraded client with configs which are no longer used or
> deprecated
> >>>> by
> >>>>> the hbase client. For those using HConstants, this is immediately
> >> obvious
> >>>>> because the HConstant field may have been removed. This is a clear
> >>>>> indication of needing to investigate the config. In this case it's
> >>>>> preferred to face the compile failure because it's clearer than
> having
> >>>>> something silently disappear or change.
> >>>>>
> >>>>> I opened 3 jiras to move some fields to HConstants, but got some
> >>>> reasonable
> >>>>> pushback from Duo:
> >>>>>
> >>>>> https://issues.apache.org/jira/browse/HBASE-26845
> <https://issues.apache.org/jira/browse/HBASE-26845>
> >>>> <https://issues.apache.org/jira/browse/HBASE-26845
> <https://issues.apache.org/jira/browse/HBASE-26845>
> >
> >>>>> https://issues.apache.org/jira/browse/HBASE-26846
> <https://issues.apache.org/jira/browse/HBASE-26846>
> >>>> <https://issues.apache.org/jira/browse/HBASE-26846
> <https://issues.apache.org/jira/browse/HBASE-26846>
> >
> >>>>> https://issues.apache.org/jira/browse/HBASE-26847
> <https://issues.apache.org/jira/browse/HBASE-26847>
> >>>> <https://issues.apache.org/jira/browse/HBASE-26847
> <https://issues.apache.org/jira/browse/HBASE-26847>
> >
> >>>>>
> >>>>> Duo's pushback is that HConstants is an anti-pattern and these
> configs
> >>>> are
> >>>>> not part of our public API. I can agree that a catch-all constants
> >> class
> >>>>> might be an anti-pattern, but would argue that consolidating configs
> >>>> there
> >>>>> is very useful for end-users. I can also potentially agree that
> >> exposing
> >>>>> these as part of our public API might limit the flexibility of
> >>>> development
> >>>>> due to compatibility constraints about IA.Public.
> >>>>>
> >>>>> To me it seems odd to add a configuration, whose whole point is to
> make
> >>>>> something tuneable, but then bury it in a private class despite
> having
> >>>> real
> >>>>> implications for how the application runs. If a configuration is not
> >>>> meant
> >>>>> to be tuned, it shouldn't be a configuration at all. Otherwise it
> >> should
> >>>> be
> >>>>> exposed for reference.
> >>>>>
> >>>>> I'm wondering if there is some compromise we can achieve which makes
> it
> >>>>> easier for end-users to integrate with tunable configs.
> >>>>>
> >>>>> One can imagine a large project to clean up all of our configs under
> >> some
> >>>>> new class with maybe IA.LimitedPrivate(CONFIG), but I fear making
> >> perfect
> >>>>> (needing to migrate all configs) the enemy of good.
> >>>>>
> >>>>> A better option might be to make those classes which expose configs
> >>>>> LimitedPrivate(CONFIG) -- for example AsyncProcess and
> >>>>> ConnectionImplementation. That might be the most incremental change
> we
> >>>>> could make. We could handle this on a case-by-case basis.
> >>>>>
> >>>>> Does anyone have any thoughts?
> >>>>>
> >>>>
> >>
>

Re: [DISCUSS] How to expose configs for users to tune

Posted by Andrew Purtell <an...@gmail.com>.
Agreed the definition of LP(CONFIG) would need to be tweaked.

We do not quite have enough support for analyzing configuration key set changes in the current API audit tool. Removal of a public constant field from an LP annotated class would be flagged but a modification of the constant would not. However it is a OSS project consisting of Perl scripts that might accept a contribution or at least could be patched or extended. Or we can build a new audit tool for the purpose, which is what I would recommend. The Perl based tool shells out to javah and is quite expensive, and on some platforms, depending on comparison, can fail due to command line length limits. A Java tool would likely be more efficient at processing Java class annotations and introspecting string constants. 

> 
> On Mar 21, 2022, at 8:52 AM, Nick Dimiduk <nd...@apache.org> wrote:
> 
> On Mon, Mar 21, 2022 at 4:32 PM Andrew Purtell <an...@gmail.com>
> wrote:
> 
>> Although collecting all configuration keys into a single file is
>> definitely an anti-pattern I’m not sure the same is true of package or
>> Maven module level aggregation classes marked LP(CONFIG).Somewhat like
>> DFSConfigKeys but geared toward our API/release auditing.
>> 
>> This would seem virtuous for a couple of reasons. Relevant configuration
>> key constants for the package or module would be grouped in a well known
>> place for users and developers alike. The LP(CONFIG) designation would
>> require developers to think about deprecation cycle if contemplating a
>> change, thus providing some back pressure against snap decisions. Or, if
>> not then, then at release candidate evaluation time, user configuration
>> breaking changes could be caught be a release automation tool that diffs
>> LP(CONFIG) annotated classes. Something like this would improve the state
>> of configuration key management quite dramatically, because currently it’s
>> ad hoc.
>> 
> 
> I am supportive of trying such an effort. We'll need to tweak the meaning
> of LP(CONFIG) as we define it currently, but that can be done. I don't know
> what our current tools do or assume regarding this annotation. I think
> there is something custom happening in the compatibility reports that we
> generate as part of each RC.
> 
>>> On Mar 16, 2022, at 10:46 AM, Bryan Beaudreault <bb...@hubspot.com.invalid>
>>> wrote:
>>> 
>>> Thanks for your detailed response, Nick!
>>> 
>>>> I think that none of my comments address your intended topic: how do we
>>> publish our configuration points as an API that can be consumed by user
>>> applications? (Do I have that correct?)
>>> 
>>> This is a good summary, and I appreciate the other
>> thoughts/clarifications
>>> as well. I also realize this is probably hard to get perfect and any
>> choice
>>> must be weighed against the effort necessary to change/maintain.
>>> 
>>> One example I know is Hadoop/HDFS, and I bet some on this list have much
>>> more knowledge of that project's history than I do. For HDFS they have
>>> DFSConfigKeys which in my experience does seem to include most configs. I
>>> believe they even have unit tests which verify that all configs in the
>>> various site.xml files are represented in code. In more recent versions
>>> they have split that class up into smaller groupings, for example
>>> DfsClientConf and the various inner classes there.
>>> 
>>> In a vacuum, from a code design perspective, I'm not commenting on
>> whether
>>> that's a good or bad pattern. I also don't know of the politics of the
>>> project or what sorts of pain points they've discovered in that pattern
>>> over the years. But from *user's perspective*, this is a handy way to
>>> handle things in my opinion.
>>> 
>>> At my company, in general we try to avoid "magic strings" [1] and instead
>>> always try to use constants. We can and do define our own constants to
>> try
>>> to mirror some of the "private" magic strings in the hbase client. This
>> is
>>> better than nothing but even better would be to use hbase-provided
>>> constants so that we can build more defensive applications, using the
>>> compiler to verify that the configs we reference still do anything.
>>> 
>>> I unfortunately can't speak to the original issues with HConstants that
>>> turned it into an anti-pattern. What I do notice is there are definitely
>>> examples in the hbase codebase of duplicated config strings, one of which
>>> is called out in one of the jiras I linked in my original email. These
>> are
>>> just bugs waiting to happen in my opinion, either for hbase itself or for
>>> users which may reference them.
>>> 
>>> [1] https://deviq.com/antipatterns/magic-strings
>>>> On Wed, Mar 16, 2022 at 10:52 AM Nick Dimiduk <nd...@apache.org>
>> wrote:
>>>> 
>>>> Hi Bryan,
>>>> 
>>>> Thanks for bringing this up.
>>>> 
>>>> I agree with Duo (and I think we have this settled as project-wide
>>>> consensus) that HConstants is/was an anti-pattern, that we are actively
>>>> against adding new fields there, and opportunistically removing fields
>> when
>>>> we can. Further, the documented meaning of the
>>>> HBaseInterfaceAudience.CONFIG field is "Denotes class names that appear
>> in
>>>> user facing configuration files", so this isn't really appropriate for
>>>> marking a field that exposes a configuration key to user applications. I
>>>> will also note that there appears to be two categories of tunable
>>>> parameters -- configuration points that we expect users to tweak are
>>>> catalogued and documented in the book [0] and everything else is left to
>>>> the obscurity of code-grep.
>>>> 
>>>> While we are actively squashing use of fields in HConstants, I don't
>> know
>>>> that we have proposed some alternative to the user community. For my
>> part,
>>>> when I write and review code that involves configuration keys, I
>> generally
>>>> implement the key constant string as a private field in an appropriate
>>>> class, and the unit test coverage for that configuration key replicates
>> the
>>>> string in the test. My reasoning being that the string is a part of our
>>>> public API and making a change to the public API should be detected from
>>>> the unit test. I also have (on occasion) gone out of my way to write
>> about
>>>> the configuration keys in the package or class-level javadoc.
>>>> 
>>>> I think that none of my comments address your intended topic: how do we
>>>> publish our configuration points as an API that can be consumed by user
>>>> applications? (Do I have that correct?)
>>>> 
>>>> I am of the mind that we don't need/want an API of configurations ; we
>> want
>>>> a catalogue, i.e., what has been started in our book. Perhaps
>> accompanied
>>>> by/generated from an authoritative hbase-defaults.xml file. In fact, we
>>>> already do generate from hbase-default.xml, the result is [1] ; I don't
>>>> believe it is authoritative.
>>>> 
>>>> If we did have an AP thoughI, what would be better than the HConstants
>>>> approach of key-strings as public fields ? What if we had a
>>>> ConfigurationBuilder type of class, which had methods tied to
>> configuration
>>>> keys? I would think that such a globally applicable class would have the
>>>> same maintenance issues as HConstants. But what if we had some kind of
>>>> ConfigurationSetter class, perhaps per package, that performed this
>>>> function? That might be maintainable for us and useful for users.
>>>> 
>>>> I'm keen to hear what other ideas are out there, or better, examples and
>>>> counter-examples from other projects.
>>>> 
>>>> Thanks,
>>>> Nick
>>>> 
>>>> [0]: https://hbase.apache.org/book.html#important_configurations
>>>> <https://hbase.apache.org/book.html#important_configurations>
>>>> [1]: https://hbase.apache.org/book.html#hbase_default_configurations
>>>> <https://hbase.apache.org/book.html#hbase_default_configurations>
>>>> 
>>>> On Tue, Mar 15, 2022 at 4:28 PM Bryan Beaudreault
>>>> <bb...@hubspot.com.invalid> wrote:
>>>> 
>>>>> Hi devs,
>>>>> 
>>>>> As a major user of hbase, my company has thousands of clients deployed
>>>>> which use the hbase client to connect to a variety of hbase clusters.
>> We
>>>>> have a common library which handles configuring all clients by setting
>> up
>>>>> the Configuration object prior to creating a Connection. Our library
>> sets
>>>>> configurations using the various configs in HConstants, but there are
>>>> also
>>>>> a bunch of configs which don't exist in HConstants. For these we have
>>>>> hardcoded config strings in our client.
>>>>> 
>>>>> We're now working on an hbase upgrade and need to go through our client
>>>>> library and check how the configs may have changed in the new version.
>>>> This
>>>>> is relatively easy to do for those HConstants cases -- configs may be
>>>>> marked @Deprecated which will show up in one's editor, they may be
>>>> removed
>>>>> entirely which would show up is a compile error, and otherwise one can
>>>>> easily click through or bring up the javadoc. For the others that don't
>>>>> exist in HConstants, we need to go manually search the hbase codebase
>> for
>>>>> those strings.
>>>>> 
>>>>> Without doing this painstaking manual process, we would potentially
>>>> deploy
>>>>> the upgraded client with configs which are no longer used or deprecated
>>>> by
>>>>> the hbase client. For those using HConstants, this is immediately
>> obvious
>>>>> because the HConstant field may have been removed. This is a clear
>>>>> indication of needing to investigate the config. In this case it's
>>>>> preferred to face the compile failure because it's clearer than having
>>>>> something silently disappear or change.
>>>>> 
>>>>> I opened 3 jiras to move some fields to HConstants, but got some
>>>> reasonable
>>>>> pushback from Duo:
>>>>> 
>>>>> https://issues.apache.org/jira/browse/HBASE-26845
>>>> <https://issues.apache.org/jira/browse/HBASE-26845>
>>>>> https://issues.apache.org/jira/browse/HBASE-26846
>>>> <https://issues.apache.org/jira/browse/HBASE-26846>
>>>>> https://issues.apache.org/jira/browse/HBASE-26847
>>>> <https://issues.apache.org/jira/browse/HBASE-26847>
>>>>> 
>>>>> Duo's pushback is that HConstants is an anti-pattern and these configs
>>>> are
>>>>> not part of our public API. I can agree that a catch-all constants
>> class
>>>>> might be an anti-pattern, but would argue that consolidating configs
>>>> there
>>>>> is very useful for end-users. I can also potentially agree that
>> exposing
>>>>> these as part of our public API might limit the flexibility of
>>>> development
>>>>> due to compatibility constraints about IA.Public.
>>>>> 
>>>>> To me it seems odd to add a configuration, whose whole point is to make
>>>>> something tuneable, but then bury it in a private class despite having
>>>> real
>>>>> implications for how the application runs. If a configuration is not
>>>> meant
>>>>> to be tuned, it shouldn't be a configuration at all. Otherwise it
>> should
>>>> be
>>>>> exposed for reference.
>>>>> 
>>>>> I'm wondering if there is some compromise we can achieve which makes it
>>>>> easier for end-users to integrate with tunable configs.
>>>>> 
>>>>> One can imagine a large project to clean up all of our configs under
>> some
>>>>> new class with maybe IA.LimitedPrivate(CONFIG), but I fear making
>> perfect
>>>>> (needing to migrate all configs) the enemy of good.
>>>>> 
>>>>> A better option might be to make those classes which expose configs
>>>>> LimitedPrivate(CONFIG) -- for example AsyncProcess and
>>>>> ConnectionImplementation. That might be the most incremental change we
>>>>> could make. We could handle this on a case-by-case basis.
>>>>> 
>>>>> Does anyone have any thoughts?
>>>>> 
>>>> 
>> 

Re: [DISCUSS] How to expose configs for users to tune

Posted by Nick Dimiduk <nd...@apache.org>.
On Mon, Mar 21, 2022 at 4:32 PM Andrew Purtell <an...@gmail.com>
wrote:

> Although collecting all configuration keys into a single file is
> definitely an anti-pattern I’m not sure the same is true of package or
> Maven module level aggregation classes marked LP(CONFIG).Somewhat like
> DFSConfigKeys but geared toward our API/release auditing.
>
> This would seem virtuous for a couple of reasons. Relevant configuration
> key constants for the package or module would be grouped in a well known
> place for users and developers alike. The LP(CONFIG) designation would
> require developers to think about deprecation cycle if contemplating a
> change, thus providing some back pressure against snap decisions. Or, if
> not then, then at release candidate evaluation time, user configuration
> breaking changes could be caught be a release automation tool that diffs
> LP(CONFIG) annotated classes. Something like this would improve the state
> of configuration key management quite dramatically, because currently it’s
> ad hoc.
>

I am supportive of trying such an effort. We'll need to tweak the meaning
of LP(CONFIG) as we define it currently, but that can be done. I don't know
what our current tools do or assume regarding this annotation. I think
there is something custom happening in the compatibility reports that we
generate as part of each RC.

> On Mar 16, 2022, at 10:46 AM, Bryan Beaudreault <bb...@hubspot.com.invalid>
> wrote:
> >
> > Thanks for your detailed response, Nick!
> >
> >> I think that none of my comments address your intended topic: how do we
> > publish our configuration points as an API that can be consumed by user
> > applications? (Do I have that correct?)
> >
> > This is a good summary, and I appreciate the other
> thoughts/clarifications
> > as well. I also realize this is probably hard to get perfect and any
> choice
> > must be weighed against the effort necessary to change/maintain.
> >
> > One example I know is Hadoop/HDFS, and I bet some on this list have much
> > more knowledge of that project's history than I do. For HDFS they have
> > DFSConfigKeys which in my experience does seem to include most configs. I
> > believe they even have unit tests which verify that all configs in the
> > various site.xml files are represented in code. In more recent versions
> > they have split that class up into smaller groupings, for example
> > DfsClientConf and the various inner classes there.
> >
> > In a vacuum, from a code design perspective, I'm not commenting on
> whether
> > that's a good or bad pattern. I also don't know of the politics of the
> > project or what sorts of pain points they've discovered in that pattern
> > over the years. But from *user's perspective*, this is a handy way to
> > handle things in my opinion.
> >
> > At my company, in general we try to avoid "magic strings" [1] and instead
> > always try to use constants. We can and do define our own constants to
> try
> > to mirror some of the "private" magic strings in the hbase client. This
> is
> > better than nothing but even better would be to use hbase-provided
> > constants so that we can build more defensive applications, using the
> > compiler to verify that the configs we reference still do anything.
> >
> > I unfortunately can't speak to the original issues with HConstants that
> > turned it into an anti-pattern. What I do notice is there are definitely
> > examples in the hbase codebase of duplicated config strings, one of which
> > is called out in one of the jiras I linked in my original email. These
> are
> > just bugs waiting to happen in my opinion, either for hbase itself or for
> > users which may reference them.
> >
> > [1] https://deviq.com/antipatterns/magic-strings
> >> On Wed, Mar 16, 2022 at 10:52 AM Nick Dimiduk <nd...@apache.org>
> wrote:
> >>
> >> Hi Bryan,
> >>
> >> Thanks for bringing this up.
> >>
> >> I agree with Duo (and I think we have this settled as project-wide
> >> consensus) that HConstants is/was an anti-pattern, that we are actively
> >> against adding new fields there, and opportunistically removing fields
> when
> >> we can. Further, the documented meaning of the
> >> HBaseInterfaceAudience.CONFIG field is "Denotes class names that appear
> in
> >> user facing configuration files", so this isn't really appropriate for
> >> marking a field that exposes a configuration key to user applications. I
> >> will also note that there appears to be two categories of tunable
> >> parameters -- configuration points that we expect users to tweak are
> >> catalogued and documented in the book [0] and everything else is left to
> >> the obscurity of code-grep.
> >>
> >> While we are actively squashing use of fields in HConstants, I don't
> know
> >> that we have proposed some alternative to the user community. For my
> part,
> >> when I write and review code that involves configuration keys, I
> generally
> >> implement the key constant string as a private field in an appropriate
> >> class, and the unit test coverage for that configuration key replicates
> the
> >> string in the test. My reasoning being that the string is a part of our
> >> public API and making a change to the public API should be detected from
> >> the unit test. I also have (on occasion) gone out of my way to write
> about
> >> the configuration keys in the package or class-level javadoc.
> >>
> >> I think that none of my comments address your intended topic: how do we
> >> publish our configuration points as an API that can be consumed by user
> >> applications? (Do I have that correct?)
> >>
> >> I am of the mind that we don't need/want an API of configurations ; we
> want
> >> a catalogue, i.e., what has been started in our book. Perhaps
> accompanied
> >> by/generated from an authoritative hbase-defaults.xml file. In fact, we
> >> already do generate from hbase-default.xml, the result is [1] ; I don't
> >> believe it is authoritative.
> >>
> >> If we did have an AP thoughI, what would be better than the HConstants
> >> approach of key-strings as public fields ? What if we had a
> >> ConfigurationBuilder type of class, which had methods tied to
> configuration
> >> keys? I would think that such a globally applicable class would have the
> >> same maintenance issues as HConstants. But what if we had some kind of
> >> ConfigurationSetter class, perhaps per package, that performed this
> >> function? That might be maintainable for us and useful for users.
> >>
> >> I'm keen to hear what other ideas are out there, or better, examples and
> >> counter-examples from other projects.
> >>
> >> Thanks,
> >> Nick
> >>
> >> [0]: https://hbase.apache.org/book.html#important_configurations
> >> <https://hbase.apache.org/book.html#important_configurations>
> >> [1]: https://hbase.apache.org/book.html#hbase_default_configurations
> >> <https://hbase.apache.org/book.html#hbase_default_configurations>
> >>
> >> On Tue, Mar 15, 2022 at 4:28 PM Bryan Beaudreault
> >> <bb...@hubspot.com.invalid> wrote:
> >>
> >>> Hi devs,
> >>>
> >>> As a major user of hbase, my company has thousands of clients deployed
> >>> which use the hbase client to connect to a variety of hbase clusters.
> We
> >>> have a common library which handles configuring all clients by setting
> up
> >>> the Configuration object prior to creating a Connection. Our library
> sets
> >>> configurations using the various configs in HConstants, but there are
> >> also
> >>> a bunch of configs which don't exist in HConstants. For these we have
> >>> hardcoded config strings in our client.
> >>>
> >>> We're now working on an hbase upgrade and need to go through our client
> >>> library and check how the configs may have changed in the new version.
> >> This
> >>> is relatively easy to do for those HConstants cases -- configs may be
> >>> marked @Deprecated which will show up in one's editor, they may be
> >> removed
> >>> entirely which would show up is a compile error, and otherwise one can
> >>> easily click through or bring up the javadoc. For the others that don't
> >>> exist in HConstants, we need to go manually search the hbase codebase
> for
> >>> those strings.
> >>>
> >>> Without doing this painstaking manual process, we would potentially
> >> deploy
> >>> the upgraded client with configs which are no longer used or deprecated
> >> by
> >>> the hbase client. For those using HConstants, this is immediately
> obvious
> >>> because the HConstant field may have been removed. This is a clear
> >>> indication of needing to investigate the config. In this case it's
> >>> preferred to face the compile failure because it's clearer than having
> >>> something silently disappear or change.
> >>>
> >>> I opened 3 jiras to move some fields to HConstants, but got some
> >> reasonable
> >>> pushback from Duo:
> >>>
> >>> https://issues.apache.org/jira/browse/HBASE-26845
> >> <https://issues.apache.org/jira/browse/HBASE-26845>
> >>> https://issues.apache.org/jira/browse/HBASE-26846
> >> <https://issues.apache.org/jira/browse/HBASE-26846>
> >>> https://issues.apache.org/jira/browse/HBASE-26847
> >> <https://issues.apache.org/jira/browse/HBASE-26847>
> >>>
> >>> Duo's pushback is that HConstants is an anti-pattern and these configs
> >> are
> >>> not part of our public API. I can agree that a catch-all constants
> class
> >>> might be an anti-pattern, but would argue that consolidating configs
> >> there
> >>> is very useful for end-users. I can also potentially agree that
> exposing
> >>> these as part of our public API might limit the flexibility of
> >> development
> >>> due to compatibility constraints about IA.Public.
> >>>
> >>> To me it seems odd to add a configuration, whose whole point is to make
> >>> something tuneable, but then bury it in a private class despite having
> >> real
> >>> implications for how the application runs. If a configuration is not
> >> meant
> >>> to be tuned, it shouldn't be a configuration at all. Otherwise it
> should
> >> be
> >>> exposed for reference.
> >>>
> >>> I'm wondering if there is some compromise we can achieve which makes it
> >>> easier for end-users to integrate with tunable configs.
> >>>
> >>> One can imagine a large project to clean up all of our configs under
> some
> >>> new class with maybe IA.LimitedPrivate(CONFIG), but I fear making
> perfect
> >>> (needing to migrate all configs) the enemy of good.
> >>>
> >>> A better option might be to make those classes which expose configs
> >>> LimitedPrivate(CONFIG) -- for example AsyncProcess and
> >>> ConnectionImplementation. That might be the most incremental change we
> >>> could make. We could handle this on a case-by-case basis.
> >>>
> >>> Does anyone have any thoughts?
> >>>
> >>
>

Re: [DISCUSS] How to expose configs for users to tune

Posted by Andrew Purtell <an...@gmail.com>.
Although collecting all configuration keys into a single file is definitely an anti-pattern I’m not sure the same is true of package or Maven module level aggregation classes marked LP(CONFIG).Somewhat like DFSConfigKeys but geared toward our API/release auditing. 

This would seem virtuous for a couple of reasons. Relevant configuration key constants for the package or module would be grouped in a well known place for users and developers alike. The LP(CONFIG) designation would require developers to think about deprecation cycle if contemplating a change, thus providing some back pressure against snap decisions. Or, if not then, then at release candidate evaluation time, user configuration breaking changes could be caught be a release automation tool that diffs LP(CONFIG) annotated classes. Something like this would improve the state of configuration key management quite dramatically, because currently it’s ad hoc. 


> On Mar 16, 2022, at 10:46 AM, Bryan Beaudreault <bb...@hubspot.com.invalid> wrote:
> 
> Thanks for your detailed response, Nick!
> 
>> I think that none of my comments address your intended topic: how do we
> publish our configuration points as an API that can be consumed by user
> applications? (Do I have that correct?)
> 
> This is a good summary, and I appreciate the other thoughts/clarifications
> as well. I also realize this is probably hard to get perfect and any choice
> must be weighed against the effort necessary to change/maintain.
> 
> One example I know is Hadoop/HDFS, and I bet some on this list have much
> more knowledge of that project's history than I do. For HDFS they have
> DFSConfigKeys which in my experience does seem to include most configs. I
> believe they even have unit tests which verify that all configs in the
> various site.xml files are represented in code. In more recent versions
> they have split that class up into smaller groupings, for example
> DfsClientConf and the various inner classes there.
> 
> In a vacuum, from a code design perspective, I'm not commenting on whether
> that's a good or bad pattern. I also don't know of the politics of the
> project or what sorts of pain points they've discovered in that pattern
> over the years. But from *user's perspective*, this is a handy way to
> handle things in my opinion.
> 
> At my company, in general we try to avoid "magic strings" [1] and instead
> always try to use constants. We can and do define our own constants to try
> to mirror some of the "private" magic strings in the hbase client. This is
> better than nothing but even better would be to use hbase-provided
> constants so that we can build more defensive applications, using the
> compiler to verify that the configs we reference still do anything.
> 
> I unfortunately can't speak to the original issues with HConstants that
> turned it into an anti-pattern. What I do notice is there are definitely
> examples in the hbase codebase of duplicated config strings, one of which
> is called out in one of the jiras I linked in my original email. These are
> just bugs waiting to happen in my opinion, either for hbase itself or for
> users which may reference them.
> 
> [1] https://deviq.com/antipatterns/magic-strings
>> On Wed, Mar 16, 2022 at 10:52 AM Nick Dimiduk <nd...@apache.org> wrote:
>> 
>> Hi Bryan,
>> 
>> Thanks for bringing this up.
>> 
>> I agree with Duo (and I think we have this settled as project-wide
>> consensus) that HConstants is/was an anti-pattern, that we are actively
>> against adding new fields there, and opportunistically removing fields when
>> we can. Further, the documented meaning of the
>> HBaseInterfaceAudience.CONFIG field is "Denotes class names that appear in
>> user facing configuration files", so this isn't really appropriate for
>> marking a field that exposes a configuration key to user applications. I
>> will also note that there appears to be two categories of tunable
>> parameters -- configuration points that we expect users to tweak are
>> catalogued and documented in the book [0] and everything else is left to
>> the obscurity of code-grep.
>> 
>> While we are actively squashing use of fields in HConstants, I don't know
>> that we have proposed some alternative to the user community. For my part,
>> when I write and review code that involves configuration keys, I generally
>> implement the key constant string as a private field in an appropriate
>> class, and the unit test coverage for that configuration key replicates the
>> string in the test. My reasoning being that the string is a part of our
>> public API and making a change to the public API should be detected from
>> the unit test. I also have (on occasion) gone out of my way to write about
>> the configuration keys in the package or class-level javadoc.
>> 
>> I think that none of my comments address your intended topic: how do we
>> publish our configuration points as an API that can be consumed by user
>> applications? (Do I have that correct?)
>> 
>> I am of the mind that we don't need/want an API of configurations ; we want
>> a catalogue, i.e., what has been started in our book. Perhaps accompanied
>> by/generated from an authoritative hbase-defaults.xml file. In fact, we
>> already do generate from hbase-default.xml, the result is [1] ; I don't
>> believe it is authoritative.
>> 
>> If we did have an AP thoughI, what would be better than the HConstants
>> approach of key-strings as public fields ? What if we had a
>> ConfigurationBuilder type of class, which had methods tied to configuration
>> keys? I would think that such a globally applicable class would have the
>> same maintenance issues as HConstants. But what if we had some kind of
>> ConfigurationSetter class, perhaps per package, that performed this
>> function? That might be maintainable for us and useful for users.
>> 
>> I'm keen to hear what other ideas are out there, or better, examples and
>> counter-examples from other projects.
>> 
>> Thanks,
>> Nick
>> 
>> [0]: https://hbase.apache.org/book.html#important_configurations
>> <https://hbase.apache.org/book.html#important_configurations>
>> [1]: https://hbase.apache.org/book.html#hbase_default_configurations
>> <https://hbase.apache.org/book.html#hbase_default_configurations>
>> 
>> On Tue, Mar 15, 2022 at 4:28 PM Bryan Beaudreault
>> <bb...@hubspot.com.invalid> wrote:
>> 
>>> Hi devs,
>>> 
>>> As a major user of hbase, my company has thousands of clients deployed
>>> which use the hbase client to connect to a variety of hbase clusters. We
>>> have a common library which handles configuring all clients by setting up
>>> the Configuration object prior to creating a Connection. Our library sets
>>> configurations using the various configs in HConstants, but there are
>> also
>>> a bunch of configs which don't exist in HConstants. For these we have
>>> hardcoded config strings in our client.
>>> 
>>> We're now working on an hbase upgrade and need to go through our client
>>> library and check how the configs may have changed in the new version.
>> This
>>> is relatively easy to do for those HConstants cases -- configs may be
>>> marked @Deprecated which will show up in one's editor, they may be
>> removed
>>> entirely which would show up is a compile error, and otherwise one can
>>> easily click through or bring up the javadoc. For the others that don't
>>> exist in HConstants, we need to go manually search the hbase codebase for
>>> those strings.
>>> 
>>> Without doing this painstaking manual process, we would potentially
>> deploy
>>> the upgraded client with configs which are no longer used or deprecated
>> by
>>> the hbase client. For those using HConstants, this is immediately obvious
>>> because the HConstant field may have been removed. This is a clear
>>> indication of needing to investigate the config. In this case it's
>>> preferred to face the compile failure because it's clearer than having
>>> something silently disappear or change.
>>> 
>>> I opened 3 jiras to move some fields to HConstants, but got some
>> reasonable
>>> pushback from Duo:
>>> 
>>> https://issues.apache.org/jira/browse/HBASE-26845
>> <https://issues.apache.org/jira/browse/HBASE-26845>
>>> https://issues.apache.org/jira/browse/HBASE-26846
>> <https://issues.apache.org/jira/browse/HBASE-26846>
>>> https://issues.apache.org/jira/browse/HBASE-26847
>> <https://issues.apache.org/jira/browse/HBASE-26847>
>>> 
>>> Duo's pushback is that HConstants is an anti-pattern and these configs
>> are
>>> not part of our public API. I can agree that a catch-all constants class
>>> might be an anti-pattern, but would argue that consolidating configs
>> there
>>> is very useful for end-users. I can also potentially agree that exposing
>>> these as part of our public API might limit the flexibility of
>> development
>>> due to compatibility constraints about IA.Public.
>>> 
>>> To me it seems odd to add a configuration, whose whole point is to make
>>> something tuneable, but then bury it in a private class despite having
>> real
>>> implications for how the application runs. If a configuration is not
>> meant
>>> to be tuned, it shouldn't be a configuration at all. Otherwise it should
>> be
>>> exposed for reference.
>>> 
>>> I'm wondering if there is some compromise we can achieve which makes it
>>> easier for end-users to integrate with tunable configs.
>>> 
>>> One can imagine a large project to clean up all of our configs under some
>>> new class with maybe IA.LimitedPrivate(CONFIG), but I fear making perfect
>>> (needing to migrate all configs) the enemy of good.
>>> 
>>> A better option might be to make those classes which expose configs
>>> LimitedPrivate(CONFIG) -- for example AsyncProcess and
>>> ConnectionImplementation. That might be the most incremental change we
>>> could make. We could handle this on a case-by-case basis.
>>> 
>>> Does anyone have any thoughts?
>>> 
>> 

Re: [DISCUSS] How to expose configs for users to tune

Posted by Nick Dimiduk <nd...@apache.org>.
I have some thoughts in-line. I don't have strong guidance though.

On Thu, Mar 17, 2022 at 5:21 PM Bryan Beaudreault
<bb...@hubspot.com.invalid> wrote:

> Another thought I had which may be a compromise. In general I think we
> should avoid magic strings [1]. A good example of that is in
> ConnectionImplementation [2]. That doesn't mean we need to go in the full
> opposite direction of putting all constants into one class. I can
> understand the reasonable decision devs reached in the past in that regard.
> Instead it could simply mean that each class should have its magic strings
> in local constants at the top of the class. I hope this is something most
> people could get behind, and would have discoverability/maintenance
> benefits beyond this discussion.
>

Constants in the classes where the configuration strings are consumed is a
good first-order approach. Unfortunately, oftentimes we want those
implementation classes to be IA.Private, which you get to in the next
paragraph.

If we could reach agreement on that, we can move onto the next problem of
> if/how we should expose these constants. I realize explicitly including
> these as part of our public API may be contentious, and I would be ok not
> to tackle that debate at this very moment. Correct me if I'm wrong, I
> believe our current state-of-the-art for defining our API is through
> InterfaceAudience annotations. Beyond that some classes additionally use
> java visibility modifiers. AsyncProcess [3] is a good example here -- it is
> both InterfaceAudience.Private and package-private. But many other
> InterfaceAudience.Private classes are public visibility, such as
> ConnectionImplementation [4]. I imagine this is just based on how these
> classes are accessed within the client, but it has implications for these
> constants.
>

I agree that it would be nice if our IA.Public corresponded only with
visibly public syntactic units and also that IA.Private corresponded only
with package-protected or lower syntactic units. We cannot achieve that
today, given how our client is structured. We could achieve it if we
separated an hbase-client interfaces module from the hbase-client
implementation module. But indeed, not all configuration keys that are
available to the operator are attached to a class that is available to the
developer.

I am personally also very against having a class that is mostly IA.Public
but has a couple parts that are IA.Private.

In general I do think it's useful to use visibility modifiers to protect
> unintended usages of library classes. But we've already decided on
> InterfaceAudience for that. What if we agreed to opportunistically and only
> as necessary convert these classes/constants to public visibility? This
> would put the risk into users hands -- if they want to access these
> constants, they can, but realizing that the class may change at any time in
> any way since it's InterfaceAudience.Private. Personally as a user I would
> take this risk because it's a relatively trivial and useful compatibility
> issue to fix, but others don't have to.
>

We could use visibility modifiers to protect unintended usage of library
classes; see my early comments about separating the client-impl from
client-interfaces modules. In absence of such hygiene, InterfaceAudience
annotations are our best effort to define some barrier between public and
private classes.

If we could reach agreement on these points, I think the action items would
> simply be:
>
> - Document this convention somewhere for devs. Happy to make that change,
> but might need a pointer on where would be best.
> - I can update the jiras from my original email accordingly (don't move
> constants, but add them in the class where necessary and change visibility
> where necessary).
>
> Open to other opinions here and please feel free to correct my assumptions
> if any of the above has been misinterpreted on my part.
>
> Thanks all!
>
> [1] https://deviq.com/antipatterns/magic-strings
> [2]
>
> https://github.com/apache/hbase/blob/branch-2/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java#L577
> [3]
>
> https://github.com/apache/hbase/blob/branch-2/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncProcess.java#L81
> [4]
>
> https://github.com/apache/hbase/blob/branch-2/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java#L173
>
> On Wed, Mar 16, 2022 at 1:46 PM Bryan Beaudreault <
> bbeaudreault@hubspot.com>
> wrote:
>
> > Thanks for your detailed response, Nick!
> >
> > > I think that none of my comments address your intended topic: how do we
> > publish our configuration points as an API that can be consumed by user
> > applications? (Do I have that correct?)
> >
> > This is a good summary, and I appreciate the other
> thoughts/clarifications
> > as well. I also realize this is probably hard to get perfect and any
> choice
> > must be weighed against the effort necessary to change/maintain.
> >
> > One example I know is Hadoop/HDFS, and I bet some on this list have much
> > more knowledge of that project's history than I do. For HDFS they have
> > DFSConfigKeys which in my experience does seem to include most configs. I
> > believe they even have unit tests which verify that all configs in the
> > various site.xml files are represented in code. In more recent versions
> > they have split that class up into smaller groupings, for example
> > DfsClientConf and the various inner classes there.
> >
> > In a vacuum, from a code design perspective, I'm not commenting on
> whether
> > that's a good or bad pattern. I also don't know of the politics of the
> > project or what sorts of pain points they've discovered in that pattern
> > over the years. But from *user's perspective*, this is a handy way to
> > handle things in my opinion.
> >
> > At my company, in general we try to avoid "magic strings" [1] and instead
> > always try to use constants. We can and do define our own constants to
> try
> > to mirror some of the "private" magic strings in the hbase client. This
> is
> > better than nothing but even better would be to use hbase-provided
> > constants so that we can build more defensive applications, using the
> > compiler to verify that the configs we reference still do anything.
> >
> > I unfortunately can't speak to the original issues with HConstants that
> > turned it into an anti-pattern. What I do notice is there are definitely
> > examples in the hbase codebase of duplicated config strings, one of which
> > is called out in one of the jiras I linked in my original email. These
> are
> > just bugs waiting to happen in my opinion, either for hbase itself or for
> > users which may reference them.
> >
> > [1] https://deviq.com/antipatterns/magic-strings
> > On Wed, Mar 16, 2022 at 10:52 AM Nick Dimiduk <nd...@apache.org>
> wrote:
> >
> >> Hi Bryan,
> >>
> >> Thanks for bringing this up.
> >>
> >> I agree with Duo (and I think we have this settled as project-wide
> >> consensus) that HConstants is/was an anti-pattern, that we are actively
> >> against adding new fields there, and opportunistically removing fields
> >> when
> >> we can. Further, the documented meaning of the
> >> HBaseInterfaceAudience.CONFIG field is "Denotes class names that appear
> in
> >> user facing configuration files", so this isn't really appropriate for
> >> marking a field that exposes a configuration key to user applications. I
> >> will also note that there appears to be two categories of tunable
> >> parameters -- configuration points that we expect users to tweak are
> >> catalogued and documented in the book [0] and everything else is left to
> >> the obscurity of code-grep.
> >>
> >> While we are actively squashing use of fields in HConstants, I don't
> know
> >> that we have proposed some alternative to the user community. For my
> part,
> >> when I write and review code that involves configuration keys, I
> generally
> >> implement the key constant string as a private field in an appropriate
> >> class, and the unit test coverage for that configuration key replicates
> >> the
> >> string in the test. My reasoning being that the string is a part of our
> >> public API and making a change to the public API should be detected from
> >> the unit test. I also have (on occasion) gone out of my way to write
> about
> >> the configuration keys in the package or class-level javadoc.
> >>
> >> I think that none of my comments address your intended topic: how do we
> >> publish our configuration points as an API that can be consumed by user
> >> applications? (Do I have that correct?)
> >>
> >> I am of the mind that we don't need/want an API of configurations ; we
> >> want
> >> a catalogue, i.e., what has been started in our book. Perhaps
> accompanied
> >> by/generated from an authoritative hbase-defaults.xml file. In fact, we
> >> already do generate from hbase-default.xml, the result is [1] ; I don't
> >> believe it is authoritative.
> >>
> >> If we did have an AP thoughI, what would be better than the HConstants
> >> approach of key-strings as public fields ? What if we had a
> >> ConfigurationBuilder type of class, which had methods tied to
> >> configuration
> >> keys? I would think that such a globally applicable class would have the
> >> same maintenance issues as HConstants. But what if we had some kind of
> >> ConfigurationSetter class, perhaps per package, that performed this
> >> function? That might be maintainable for us and useful for users.
> >>
> >> I'm keen to hear what other ideas are out there, or better, examples and
> >> counter-examples from other projects.
> >>
> >> Thanks,
> >> Nick
> >>
> >> [0]: https://hbase.apache.org/book.html#important_configurations
> >> <https://hbase.apache.org/book.html#important_configurations>
> >> [1]: https://hbase.apache.org/book.html#hbase_default_configurations
> >> <https://hbase.apache.org/book.html#hbase_default_configurations>
> >>
> >> On Tue, Mar 15, 2022 at 4:28 PM Bryan Beaudreault
> >> <bb...@hubspot.com.invalid> wrote:
> >>
> >> > Hi devs,
> >> >
> >> > As a major user of hbase, my company has thousands of clients deployed
> >> > which use the hbase client to connect to a variety of hbase clusters.
> We
> >> > have a common library which handles configuring all clients by setting
> >> up
> >> > the Configuration object prior to creating a Connection. Our library
> >> sets
> >> > configurations using the various configs in HConstants, but there are
> >> also
> >> > a bunch of configs which don't exist in HConstants. For these we have
> >> > hardcoded config strings in our client.
> >> >
> >> > We're now working on an hbase upgrade and need to go through our
> client
> >> > library and check how the configs may have changed in the new version.
> >> This
> >> > is relatively easy to do for those HConstants cases -- configs may be
> >> > marked @Deprecated which will show up in one's editor, they may be
> >> removed
> >> > entirely which would show up is a compile error, and otherwise one can
> >> > easily click through or bring up the javadoc. For the others that
> don't
> >> > exist in HConstants, we need to go manually search the hbase codebase
> >> for
> >> > those strings.
> >> >
> >> > Without doing this painstaking manual process, we would potentially
> >> deploy
> >> > the upgraded client with configs which are no longer used or
> deprecated
> >> by
> >> > the hbase client. For those using HConstants, this is immediately
> >> obvious
> >> > because the HConstant field may have been removed. This is a clear
> >> > indication of needing to investigate the config. In this case it's
> >> > preferred to face the compile failure because it's clearer than having
> >> > something silently disappear or change.
> >> >
> >> > I opened 3 jiras to move some fields to HConstants, but got some
> >> reasonable
> >> > pushback from Duo:
> >> >
> >> > https://issues.apache.org/jira/browse/HBASE-26845
> >> <https://issues.apache.org/jira/browse/HBASE-26845>
> >> > https://issues.apache.org/jira/browse/HBASE-26846
> >> <https://issues.apache.org/jira/browse/HBASE-26846>
> >> > https://issues.apache.org/jira/browse/HBASE-26847
> >> <https://issues.apache.org/jira/browse/HBASE-26847>
> >> >
> >> > Duo's pushback is that HConstants is an anti-pattern and these configs
> >> are
> >> > not part of our public API. I can agree that a catch-all constants
> class
> >> > might be an anti-pattern, but would argue that consolidating configs
> >> there
> >> > is very useful for end-users. I can also potentially agree that
> exposing
> >> > these as part of our public API might limit the flexibility of
> >> development
> >> > due to compatibility constraints about IA.Public.
> >> >
> >> > To me it seems odd to add a configuration, whose whole point is to
> make
> >> > something tuneable, but then bury it in a private class despite having
> >> real
> >> > implications for how the application runs. If a configuration is not
> >> meant
> >> > to be tuned, it shouldn't be a configuration at all. Otherwise it
> >> should be
> >> > exposed for reference.
> >> >
> >> > I'm wondering if there is some compromise we can achieve which makes
> it
> >> > easier for end-users to integrate with tunable configs.
> >> >
> >> > One can imagine a large project to clean up all of our configs under
> >> some
> >> > new class with maybe IA.LimitedPrivate(CONFIG), but I fear making
> >> perfect
> >> > (needing to migrate all configs) the enemy of good.
> >> >
> >> > A better option might be to make those classes which expose configs
> >> > LimitedPrivate(CONFIG) -- for example AsyncProcess and
> >> > ConnectionImplementation. That might be the most incremental change we
> >> > could make. We could handle this on a case-by-case basis.
> >> >
> >> > Does anyone have any thoughts?
> >> >
> >>
> >
>

Re: [DISCUSS] How to expose configs for users to tune

Posted by Bryan Beaudreault <bb...@hubspot.com.INVALID>.
Another thought I had which may be a compromise. In general I think we
should avoid magic strings [1]. A good example of that is in
ConnectionImplementation [2]. That doesn't mean we need to go in the full
opposite direction of putting all constants into one class. I can
understand the reasonable decision devs reached in the past in that regard.
Instead it could simply mean that each class should have its magic strings
in local constants at the top of the class. I hope this is something most
people could get behind, and would have discoverability/maintenance
benefits beyond this discussion.

If we could reach agreement on that, we can move onto the next problem of
if/how we should expose these constants. I realize explicitly including
these as part of our public API may be contentious, and I would be ok not
to tackle that debate at this very moment. Correct me if I'm wrong, I
believe our current state-of-the-art for defining our API is through
InterfaceAudience annotations. Beyond that some classes additionally use
java visibility modifiers. AsyncProcess [3] is a good example here -- it is
both InterfaceAudience.Private and package-private. But many other
InterfaceAudience.Private classes are public visibility, such as
ConnectionImplementation [4]. I imagine this is just based on how these
classes are accessed within the client, but it has implications for these
constants.

In general I do think it's useful to use visibility modifiers to protect
unintended usages of library classes. But we've already decided on
InterfaceAudience for that. What if we agreed to opportunistically and only
as necessary convert these classes/constants to public visibility? This
would put the risk into users hands -- if they want to access these
constants, they can, but realizing that the class may change at any time in
any way since it's InterfaceAudience.Private. Personally as a user I would
take this risk because it's a relatively trivial and useful compatibility
issue to fix, but others don't have to.

If we could reach agreement on these points, I think the action items would
simply be:

- Document this convention somewhere for devs. Happy to make that change,
but might need a pointer on where would be best.
- I can update the jiras from my original email accordingly (don't move
constants, but add them in the class where necessary and change visibility
where necessary).

Open to other opinions here and please feel free to correct my assumptions
if any of the above has been misinterpreted on my part.

Thanks all!

[1] https://deviq.com/antipatterns/magic-strings
[2]
https://github.com/apache/hbase/blob/branch-2/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java#L577
[3]
https://github.com/apache/hbase/blob/branch-2/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncProcess.java#L81
[4]
https://github.com/apache/hbase/blob/branch-2/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java#L173

On Wed, Mar 16, 2022 at 1:46 PM Bryan Beaudreault <bb...@hubspot.com>
wrote:

> Thanks for your detailed response, Nick!
>
> > I think that none of my comments address your intended topic: how do we
> publish our configuration points as an API that can be consumed by user
> applications? (Do I have that correct?)
>
> This is a good summary, and I appreciate the other thoughts/clarifications
> as well. I also realize this is probably hard to get perfect and any choice
> must be weighed against the effort necessary to change/maintain.
>
> One example I know is Hadoop/HDFS, and I bet some on this list have much
> more knowledge of that project's history than I do. For HDFS they have
> DFSConfigKeys which in my experience does seem to include most configs. I
> believe they even have unit tests which verify that all configs in the
> various site.xml files are represented in code. In more recent versions
> they have split that class up into smaller groupings, for example
> DfsClientConf and the various inner classes there.
>
> In a vacuum, from a code design perspective, I'm not commenting on whether
> that's a good or bad pattern. I also don't know of the politics of the
> project or what sorts of pain points they've discovered in that pattern
> over the years. But from *user's perspective*, this is a handy way to
> handle things in my opinion.
>
> At my company, in general we try to avoid "magic strings" [1] and instead
> always try to use constants. We can and do define our own constants to try
> to mirror some of the "private" magic strings in the hbase client. This is
> better than nothing but even better would be to use hbase-provided
> constants so that we can build more defensive applications, using the
> compiler to verify that the configs we reference still do anything.
>
> I unfortunately can't speak to the original issues with HConstants that
> turned it into an anti-pattern. What I do notice is there are definitely
> examples in the hbase codebase of duplicated config strings, one of which
> is called out in one of the jiras I linked in my original email. These are
> just bugs waiting to happen in my opinion, either for hbase itself or for
> users which may reference them.
>
> [1] https://deviq.com/antipatterns/magic-strings
> On Wed, Mar 16, 2022 at 10:52 AM Nick Dimiduk <nd...@apache.org> wrote:
>
>> Hi Bryan,
>>
>> Thanks for bringing this up.
>>
>> I agree with Duo (and I think we have this settled as project-wide
>> consensus) that HConstants is/was an anti-pattern, that we are actively
>> against adding new fields there, and opportunistically removing fields
>> when
>> we can. Further, the documented meaning of the
>> HBaseInterfaceAudience.CONFIG field is "Denotes class names that appear in
>> user facing configuration files", so this isn't really appropriate for
>> marking a field that exposes a configuration key to user applications. I
>> will also note that there appears to be two categories of tunable
>> parameters -- configuration points that we expect users to tweak are
>> catalogued and documented in the book [0] and everything else is left to
>> the obscurity of code-grep.
>>
>> While we are actively squashing use of fields in HConstants, I don't know
>> that we have proposed some alternative to the user community. For my part,
>> when I write and review code that involves configuration keys, I generally
>> implement the key constant string as a private field in an appropriate
>> class, and the unit test coverage for that configuration key replicates
>> the
>> string in the test. My reasoning being that the string is a part of our
>> public API and making a change to the public API should be detected from
>> the unit test. I also have (on occasion) gone out of my way to write about
>> the configuration keys in the package or class-level javadoc.
>>
>> I think that none of my comments address your intended topic: how do we
>> publish our configuration points as an API that can be consumed by user
>> applications? (Do I have that correct?)
>>
>> I am of the mind that we don't need/want an API of configurations ; we
>> want
>> a catalogue, i.e., what has been started in our book. Perhaps accompanied
>> by/generated from an authoritative hbase-defaults.xml file. In fact, we
>> already do generate from hbase-default.xml, the result is [1] ; I don't
>> believe it is authoritative.
>>
>> If we did have an AP thoughI, what would be better than the HConstants
>> approach of key-strings as public fields ? What if we had a
>> ConfigurationBuilder type of class, which had methods tied to
>> configuration
>> keys? I would think that such a globally applicable class would have the
>> same maintenance issues as HConstants. But what if we had some kind of
>> ConfigurationSetter class, perhaps per package, that performed this
>> function? That might be maintainable for us and useful for users.
>>
>> I'm keen to hear what other ideas are out there, or better, examples and
>> counter-examples from other projects.
>>
>> Thanks,
>> Nick
>>
>> [0]: https://hbase.apache.org/book.html#important_configurations
>> <https://hbase.apache.org/book.html#important_configurations>
>> [1]: https://hbase.apache.org/book.html#hbase_default_configurations
>> <https://hbase.apache.org/book.html#hbase_default_configurations>
>>
>> On Tue, Mar 15, 2022 at 4:28 PM Bryan Beaudreault
>> <bb...@hubspot.com.invalid> wrote:
>>
>> > Hi devs,
>> >
>> > As a major user of hbase, my company has thousands of clients deployed
>> > which use the hbase client to connect to a variety of hbase clusters. We
>> > have a common library which handles configuring all clients by setting
>> up
>> > the Configuration object prior to creating a Connection. Our library
>> sets
>> > configurations using the various configs in HConstants, but there are
>> also
>> > a bunch of configs which don't exist in HConstants. For these we have
>> > hardcoded config strings in our client.
>> >
>> > We're now working on an hbase upgrade and need to go through our client
>> > library and check how the configs may have changed in the new version.
>> This
>> > is relatively easy to do for those HConstants cases -- configs may be
>> > marked @Deprecated which will show up in one's editor, they may be
>> removed
>> > entirely which would show up is a compile error, and otherwise one can
>> > easily click through or bring up the javadoc. For the others that don't
>> > exist in HConstants, we need to go manually search the hbase codebase
>> for
>> > those strings.
>> >
>> > Without doing this painstaking manual process, we would potentially
>> deploy
>> > the upgraded client with configs which are no longer used or deprecated
>> by
>> > the hbase client. For those using HConstants, this is immediately
>> obvious
>> > because the HConstant field may have been removed. This is a clear
>> > indication of needing to investigate the config. In this case it's
>> > preferred to face the compile failure because it's clearer than having
>> > something silently disappear or change.
>> >
>> > I opened 3 jiras to move some fields to HConstants, but got some
>> reasonable
>> > pushback from Duo:
>> >
>> > https://issues.apache.org/jira/browse/HBASE-26845
>> <https://issues.apache.org/jira/browse/HBASE-26845>
>> > https://issues.apache.org/jira/browse/HBASE-26846
>> <https://issues.apache.org/jira/browse/HBASE-26846>
>> > https://issues.apache.org/jira/browse/HBASE-26847
>> <https://issues.apache.org/jira/browse/HBASE-26847>
>> >
>> > Duo's pushback is that HConstants is an anti-pattern and these configs
>> are
>> > not part of our public API. I can agree that a catch-all constants class
>> > might be an anti-pattern, but would argue that consolidating configs
>> there
>> > is very useful for end-users. I can also potentially agree that exposing
>> > these as part of our public API might limit the flexibility of
>> development
>> > due to compatibility constraints about IA.Public.
>> >
>> > To me it seems odd to add a configuration, whose whole point is to make
>> > something tuneable, but then bury it in a private class despite having
>> real
>> > implications for how the application runs. If a configuration is not
>> meant
>> > to be tuned, it shouldn't be a configuration at all. Otherwise it
>> should be
>> > exposed for reference.
>> >
>> > I'm wondering if there is some compromise we can achieve which makes it
>> > easier for end-users to integrate with tunable configs.
>> >
>> > One can imagine a large project to clean up all of our configs under
>> some
>> > new class with maybe IA.LimitedPrivate(CONFIG), but I fear making
>> perfect
>> > (needing to migrate all configs) the enemy of good.
>> >
>> > A better option might be to make those classes which expose configs
>> > LimitedPrivate(CONFIG) -- for example AsyncProcess and
>> > ConnectionImplementation. That might be the most incremental change we
>> > could make. We could handle this on a case-by-case basis.
>> >
>> > Does anyone have any thoughts?
>> >
>>
>

Re: [DISCUSS] How to expose configs for users to tune

Posted by Bryan Beaudreault <bb...@hubspot.com.INVALID>.
Thanks for your detailed response, Nick!

> I think that none of my comments address your intended topic: how do we
publish our configuration points as an API that can be consumed by user
applications? (Do I have that correct?)

This is a good summary, and I appreciate the other thoughts/clarifications
as well. I also realize this is probably hard to get perfect and any choice
must be weighed against the effort necessary to change/maintain.

One example I know is Hadoop/HDFS, and I bet some on this list have much
more knowledge of that project's history than I do. For HDFS they have
DFSConfigKeys which in my experience does seem to include most configs. I
believe they even have unit tests which verify that all configs in the
various site.xml files are represented in code. In more recent versions
they have split that class up into smaller groupings, for example
DfsClientConf and the various inner classes there.

In a vacuum, from a code design perspective, I'm not commenting on whether
that's a good or bad pattern. I also don't know of the politics of the
project or what sorts of pain points they've discovered in that pattern
over the years. But from *user's perspective*, this is a handy way to
handle things in my opinion.

At my company, in general we try to avoid "magic strings" [1] and instead
always try to use constants. We can and do define our own constants to try
to mirror some of the "private" magic strings in the hbase client. This is
better than nothing but even better would be to use hbase-provided
constants so that we can build more defensive applications, using the
compiler to verify that the configs we reference still do anything.

I unfortunately can't speak to the original issues with HConstants that
turned it into an anti-pattern. What I do notice is there are definitely
examples in the hbase codebase of duplicated config strings, one of which
is called out in one of the jiras I linked in my original email. These are
just bugs waiting to happen in my opinion, either for hbase itself or for
users which may reference them.

[1] https://deviq.com/antipatterns/magic-strings
On Wed, Mar 16, 2022 at 10:52 AM Nick Dimiduk <nd...@apache.org> wrote:

> Hi Bryan,
>
> Thanks for bringing this up.
>
> I agree with Duo (and I think we have this settled as project-wide
> consensus) that HConstants is/was an anti-pattern, that we are actively
> against adding new fields there, and opportunistically removing fields when
> we can. Further, the documented meaning of the
> HBaseInterfaceAudience.CONFIG field is "Denotes class names that appear in
> user facing configuration files", so this isn't really appropriate for
> marking a field that exposes a configuration key to user applications. I
> will also note that there appears to be two categories of tunable
> parameters -- configuration points that we expect users to tweak are
> catalogued and documented in the book [0] and everything else is left to
> the obscurity of code-grep.
>
> While we are actively squashing use of fields in HConstants, I don't know
> that we have proposed some alternative to the user community. For my part,
> when I write and review code that involves configuration keys, I generally
> implement the key constant string as a private field in an appropriate
> class, and the unit test coverage for that configuration key replicates the
> string in the test. My reasoning being that the string is a part of our
> public API and making a change to the public API should be detected from
> the unit test. I also have (on occasion) gone out of my way to write about
> the configuration keys in the package or class-level javadoc.
>
> I think that none of my comments address your intended topic: how do we
> publish our configuration points as an API that can be consumed by user
> applications? (Do I have that correct?)
>
> I am of the mind that we don't need/want an API of configurations ; we want
> a catalogue, i.e., what has been started in our book. Perhaps accompanied
> by/generated from an authoritative hbase-defaults.xml file. In fact, we
> already do generate from hbase-default.xml, the result is [1] ; I don't
> believe it is authoritative.
>
> If we did have an AP thoughI, what would be better than the HConstants
> approach of key-strings as public fields ? What if we had a
> ConfigurationBuilder type of class, which had methods tied to configuration
> keys? I would think that such a globally applicable class would have the
> same maintenance issues as HConstants. But what if we had some kind of
> ConfigurationSetter class, perhaps per package, that performed this
> function? That might be maintainable for us and useful for users.
>
> I'm keen to hear what other ideas are out there, or better, examples and
> counter-examples from other projects.
>
> Thanks,
> Nick
>
> [0]: https://hbase.apache.org/book.html#important_configurations
> <https://hbase.apache.org/book.html#important_configurations>
> [1]: https://hbase.apache.org/book.html#hbase_default_configurations
> <https://hbase.apache.org/book.html#hbase_default_configurations>
>
> On Tue, Mar 15, 2022 at 4:28 PM Bryan Beaudreault
> <bb...@hubspot.com.invalid> wrote:
>
> > Hi devs,
> >
> > As a major user of hbase, my company has thousands of clients deployed
> > which use the hbase client to connect to a variety of hbase clusters. We
> > have a common library which handles configuring all clients by setting up
> > the Configuration object prior to creating a Connection. Our library sets
> > configurations using the various configs in HConstants, but there are
> also
> > a bunch of configs which don't exist in HConstants. For these we have
> > hardcoded config strings in our client.
> >
> > We're now working on an hbase upgrade and need to go through our client
> > library and check how the configs may have changed in the new version.
> This
> > is relatively easy to do for those HConstants cases -- configs may be
> > marked @Deprecated which will show up in one's editor, they may be
> removed
> > entirely which would show up is a compile error, and otherwise one can
> > easily click through or bring up the javadoc. For the others that don't
> > exist in HConstants, we need to go manually search the hbase codebase for
> > those strings.
> >
> > Without doing this painstaking manual process, we would potentially
> deploy
> > the upgraded client with configs which are no longer used or deprecated
> by
> > the hbase client. For those using HConstants, this is immediately obvious
> > because the HConstant field may have been removed. This is a clear
> > indication of needing to investigate the config. In this case it's
> > preferred to face the compile failure because it's clearer than having
> > something silently disappear or change.
> >
> > I opened 3 jiras to move some fields to HConstants, but got some
> reasonable
> > pushback from Duo:
> >
> > https://issues.apache.org/jira/browse/HBASE-26845
> <https://issues.apache.org/jira/browse/HBASE-26845>
> > https://issues.apache.org/jira/browse/HBASE-26846
> <https://issues.apache.org/jira/browse/HBASE-26846>
> > https://issues.apache.org/jira/browse/HBASE-26847
> <https://issues.apache.org/jira/browse/HBASE-26847>
> >
> > Duo's pushback is that HConstants is an anti-pattern and these configs
> are
> > not part of our public API. I can agree that a catch-all constants class
> > might be an anti-pattern, but would argue that consolidating configs
> there
> > is very useful for end-users. I can also potentially agree that exposing
> > these as part of our public API might limit the flexibility of
> development
> > due to compatibility constraints about IA.Public.
> >
> > To me it seems odd to add a configuration, whose whole point is to make
> > something tuneable, but then bury it in a private class despite having
> real
> > implications for how the application runs. If a configuration is not
> meant
> > to be tuned, it shouldn't be a configuration at all. Otherwise it should
> be
> > exposed for reference.
> >
> > I'm wondering if there is some compromise we can achieve which makes it
> > easier for end-users to integrate with tunable configs.
> >
> > One can imagine a large project to clean up all of our configs under some
> > new class with maybe IA.LimitedPrivate(CONFIG), but I fear making perfect
> > (needing to migrate all configs) the enemy of good.
> >
> > A better option might be to make those classes which expose configs
> > LimitedPrivate(CONFIG) -- for example AsyncProcess and
> > ConnectionImplementation. That might be the most incremental change we
> > could make. We could handle this on a case-by-case basis.
> >
> > Does anyone have any thoughts?
> >
>

Re: [DISCUSS] How to expose configs for users to tune

Posted by Nick Dimiduk <nd...@apache.org>.
Hi Bryan,

Thanks for bringing this up.

I agree with Duo (and I think we have this settled as project-wide
consensus) that HConstants is/was an anti-pattern, that we are actively
against adding new fields there, and opportunistically removing fields when
we can. Further, the documented meaning of the
HBaseInterfaceAudience.CONFIG field is "Denotes class names that appear in
user facing configuration files", so this isn't really appropriate for
marking a field that exposes a configuration key to user applications. I
will also note that there appears to be two categories of tunable
parameters -- configuration points that we expect users to tweak are
catalogued and documented in the book [0] and everything else is left to
the obscurity of code-grep.

While we are actively squashing use of fields in HConstants, I don't know
that we have proposed some alternative to the user community. For my part,
when I write and review code that involves configuration keys, I generally
implement the key constant string as a private field in an appropriate
class, and the unit test coverage for that configuration key replicates the
string in the test. My reasoning being that the string is a part of our
public API and making a change to the public API should be detected from
the unit test. I also have (on occasion) gone out of my way to write about
the configuration keys in the package or class-level javadoc.

I think that none of my comments address your intended topic: how do we
publish our configuration points as an API that can be consumed by user
applications? (Do I have that correct?)

I am of the mind that we don't need/want an API of configurations ; we want
a catalogue, i.e., what has been started in our book. Perhaps accompanied
by/generated from an authoritative hbase-defaults.xml file. In fact, we
already do generate from hbase-default.xml, the result is [1] ; I don't
believe it is authoritative.

If we did have an AP thoughI, what would be better than the HConstants
approach of key-strings as public fields ? What if we had a
ConfigurationBuilder type of class, which had methods tied to configuration
keys? I would think that such a globally applicable class would have the
same maintenance issues as HConstants. But what if we had some kind of
ConfigurationSetter class, perhaps per package, that performed this
function? That might be maintainable for us and useful for users.

I'm keen to hear what other ideas are out there, or better, examples and
counter-examples from other projects.

Thanks,
Nick

[0]: https://hbase.apache.org/book.html#important_configurations
[1]: https://hbase.apache.org/book.html#hbase_default_configurations

On Tue, Mar 15, 2022 at 4:28 PM Bryan Beaudreault
<bb...@hubspot.com.invalid> wrote:

> Hi devs,
>
> As a major user of hbase, my company has thousands of clients deployed
> which use the hbase client to connect to a variety of hbase clusters. We
> have a common library which handles configuring all clients by setting up
> the Configuration object prior to creating a Connection. Our library sets
> configurations using the various configs in HConstants, but there are also
> a bunch of configs which don't exist in HConstants. For these we have
> hardcoded config strings in our client.
>
> We're now working on an hbase upgrade and need to go through our client
> library and check how the configs may have changed in the new version. This
> is relatively easy to do for those HConstants cases -- configs may be
> marked @Deprecated which will show up in one's editor, they may be removed
> entirely which would show up is a compile error, and otherwise one can
> easily click through or bring up the javadoc. For the others that don't
> exist in HConstants, we need to go manually search the hbase codebase for
> those strings.
>
> Without doing this painstaking manual process, we would potentially deploy
> the upgraded client with configs which are no longer used or deprecated by
> the hbase client. For those using HConstants, this is immediately obvious
> because the HConstant field may have been removed. This is a clear
> indication of needing to investigate the config. In this case it's
> preferred to face the compile failure because it's clearer than having
> something silently disappear or change.
>
> I opened 3 jiras to move some fields to HConstants, but got some reasonable
> pushback from Duo:
>
> https://issues.apache.org/jira/browse/HBASE-26845
> https://issues.apache.org/jira/browse/HBASE-26846
> https://issues.apache.org/jira/browse/HBASE-26847
>
> Duo's pushback is that HConstants is an anti-pattern and these configs are
> not part of our public API. I can agree that a catch-all constants class
> might be an anti-pattern, but would argue that consolidating configs there
> is very useful for end-users.  I can also potentially agree that exposing
> these as part of our public API might limit the flexibility of development
> due to compatibility constraints about IA.Public.
>
> To me it seems odd to add a configuration, whose whole point is to make
> something tuneable, but then bury it in a private class despite having real
> implications for how the application runs. If a configuration is not meant
> to be tuned, it shouldn't be a configuration at all. Otherwise it should be
> exposed for reference.
>
> I'm wondering if there is some compromise we can achieve which makes it
> easier for end-users to integrate with tunable configs.
>
> One can imagine a large project to clean up all of our configs under some
> new class with maybe IA.LimitedPrivate(CONFIG), but I fear making perfect
> (needing to migrate all configs) the enemy of good.
>
> A better option might be to make those classes which expose configs
> LimitedPrivate(CONFIG) -- for example AsyncProcess and
> ConnectionImplementation. That might be the most incremental change we
> could make. We could handle this on a case-by-case basis.
>
> Does anyone have any thoughts?
>