You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@datasketches.apache.org by Ron Crocker <rc...@newrelic.com> on 2020/04/23 23:27:03 UTC

Why are so many of the classes in org.apache.datasketches.cpc final?

It seems anti-social to make these classes, particularly CpcSketch, final. Is there a good reason for that?

Ron

Re: Why are so many of the classes in org.apache.datasketches.cpc final?

Posted by leerho <le...@gmail.com>.
Hi Ron,
I'm not too familiar with Flink, but it sounds like an ideal environment
for sketching.   It also might make sense to have our DataSketches library
integrated into the core of Flink so that all users could take advantage of
it.  Do you know or are in touch with any of the core committers on the
Flink team?  Perhaps you could recommend who we could reach out to.

I am curious what the "friction" is in terms of your use of the CpcSketch.
If there is any way we can help please let us know.

Lee.


On Fri, Apr 24, 2020 at 3:26 PM Ron Crocker <rc...@newrelic.com> wrote:

> Hi Lee,
>
> Thanks for such a cogent response. All of that makes a lot of sense to me.
>
> I’m not at all considering making improvements, rather I’m merely wanting
> to use CppSketch within a Flink job as state. Making the class final adds a
> little friction to doing so, but not an overwhelming amount.
>
> Ron
>
> On Apr 24, 2020, at 3:12 PM, leerho <le...@gmail.com> wrote:
>
> Hi Ron,
>
> Our mission is to develop a robust sketch library *product* that can be
> used in production systems in many different environments and be high
> performing and binary compatible across languages and systems.
>
>    - To be able to achieve this mission with our very limited resources,
>    we have to be careful  about making the surface area of what we have to
>    support too large.  Making classes final and/or private are some of the
>    ways to reduce the size of the support surface area.
>
>
>    - We have found that robust sketch software that is usable in
>    production environments can be very tricky and even though we have been
>    doing this for a number of years, we continue to discover new ways that
>    these probabilistic algorithms can behave in totally unexpected ways.
>    Also, testing and validating these probabilistic algorithms can be very
>    tricky.  Making what appears to be a simple change to a class can have
>    major impact on its testability or stability.  Making a class final is one
>    way to communicate that the code and its testing counterparts is more
>    complex than you might think.
>
>
>    - We have had experience with a number of folks that have tried to
>    "improve" these sketches on their own, but with disastrous results.  Making
>    a class final is one way to communicate that we do not recommend users to
>    attempt to extend these clases on their own.
>
>
>    - Making a class final (or private) also gives us, as developers,
>    additional degrees of freedom in terms of making necessary changes and
>    improvements to the internals of the class because we know that there are
>    no other classes that depend on it.
>
>
>    - From a owner-developer's point of view, making a class final is
>    conservative.  We can always remove the restriction in the future if the
>    need arises.  But once a class is no longer final, it can never be put
>    back-in-the-box, so to speak.
>
> We are not in a position to support any external changes to our code that
> we release, nor can we support extensions to our code that are not part of
> our library.  Making classes final is one way of communicating that we do
> not encourage modifications or arbitrary extensions to our code base.
>
> The DataSketches library is an active open source Apache project.  We
> encourage users to make suggestions and submit pull requests and contribute
> to the library to make it a better product for everyone.   We believe we
> are very social and open to new ideas.  We have a growing community of
> interested users, developers and scientists that want to use our library
> and make it even better.  We actively monitor our communication channels
> and respond to sincere requests for help as promptly as we can.
>
> If you have ideas about added capabilities that you feel would be valuable
> extensions to the library, please engage with us through our community
> <https://datasketches.apache.org/docs/Community/index.html> mailing lists
> or Slack, we would like to hear from you.
>
> As open source, you are always free to fork the library and do whatever
> you want with the code subject to the Apache license.  But then, you are on
> your own :)
>
> Cheers,
>
> Lee.
>
>
>
>
>
>
>
>
>
>
>
> On Thu, Apr 23, 2020 at 4:27 PM Ron Crocker <rc...@newrelic.com> wrote:
>
>> It seems anti-social to make these classes, particularly CpcSketch,
>> final. Is there a good reason for that?
>>
>> Ron
>
>
>

Re: Why are so many of the classes in org.apache.datasketches.cpc final?

Posted by Ron Crocker <rc...@newrelic.com>.
Hi Lee,

Thanks for such a cogent response. All of that makes a lot of sense to me.

I’m not at all considering making improvements, rather I’m merely wanting to use CppSketch within a Flink job as state. Making the class final adds a little friction to doing so, but not an overwhelming amount.

Ron

> On Apr 24, 2020, at 3:12 PM, leerho <le...@gmail.com> wrote:
> 
> Hi Ron,
> 
> Our mission is to develop a robust sketch library product that can be used in production systems in many different environments and be high performing and binary compatible across languages and systems.  
> To be able to achieve this mission with our very limited resources, we have to be careful  about making the surface area of what we have to support too large.  Making classes final and/or private are some of the ways to reduce the size of the support surface area.
> We have found that robust sketch software that is usable in production environments can be very tricky and even though we have been doing this for a number of years, we continue to discover new ways that these probabilistic algorithms can behave in totally unexpected ways.  Also, testing and validating these probabilistic algorithms can be very tricky.  Making what appears to be a simple change to a class can have major impact on its testability or stability.  Making a class final is one way to communicate that the code and its testing counterparts is more complex than you might think.  
> We have had experience with a number of folks that have tried to "improve" these sketches on their own, but with disastrous results.  Making a class final is one way to communicate that we do not recommend users to attempt to extend these clases on their own.
> Making a class final (or private) also gives us, as developers, additional degrees of freedom in terms of making necessary changes and improvements to the internals of the class because we know that there are no other classes that depend on it.
> From a owner-developer's point of view, making a class final is conservative.  We can always remove the restriction in the future if the need arises.  But once a class is no longer final, it can never be put back-in-the-box, so to speak.
> We are not in a position to support any external changes to our code that we release, nor can we support extensions to our code that are not part of our library.  Making classes final is one way of communicating that we do not encourage modifications or arbitrary extensions to our code base.
> 
> The DataSketches library is an active open source Apache project.  We encourage users to make suggestions and submit pull requests and contribute to the library to make it a better product for everyone.   We believe we are very social and open to new ideas.  We have a growing community of interested users, developers and scientists that want to use our library and make it even better.  We actively monitor our communication channels and respond to sincere requests for help as promptly as we can.  
> 
> If you have ideas about added capabilities that you feel would be valuable extensions to the library, please engage with us through our community <https://datasketches.apache.org/docs/Community/index.html> mailing lists or Slack, we would like to hear from you.  
>  
> As open source, you are always free to fork the library and do whatever you want with the code subject to the Apache license.  But then, you are on your own :)
> 
> Cheers,
> 
> Lee.
> 
> 
> 
> 
>  
> 
> 
>  
> 
> 
> 
> On Thu, Apr 23, 2020 at 4:27 PM Ron Crocker <rcrocker@newrelic.com <ma...@newrelic.com>> wrote:
> It seems anti-social to make these classes, particularly CpcSketch, final. Is there a good reason for that?
> 
> Ron


Re: Why are so many of the classes in org.apache.datasketches.cpc final?

Posted by leerho <le...@gmail.com>.
Hi Ron,

Our mission is to develop a robust sketch library *product* that can be
used in production systems in many different environments and be high
performing and binary compatible across languages and systems.

   - To be able to achieve this mission with our very limited resources, we
   have to be careful  about making the surface area of what we have to
   support too large.  Making classes final and/or private are some of the
   ways to reduce the size of the support surface area.


   - We have found that robust sketch software that is usable in production
   environments can be very tricky and even though we have been doing this for
   a number of years, we continue to discover new ways that these
   probabilistic algorithms can behave in totally unexpected ways.  Also,
   testing and validating these probabilistic algorithms can be very tricky.
   Making what appears to be a simple change to a class can have major impact
   on its testability or stability.  Making a class final is one way to
   communicate that the code and its testing counterparts is more complex than
   you might think.


   - We have had experience with a number of folks that have tried to
   "improve" these sketches on their own, but with disastrous results.  Making
   a class final is one way to communicate that we do not recommend users to
   attempt to extend these clases on their own.


   - Making a class final (or private) also gives us, as developers,
   additional degrees of freedom in terms of making necessary changes and
   improvements to the internals of the class because we know that there are
   no other classes that depend on it.


   - From a owner-developer's point of view, making a class final is
   conservative.  We can always remove the restriction in the future if the
   need arises.  But once a class is no longer final, it can never be put
   back-in-the-box, so to speak.

We are not in a position to support any external changes to our code that
we release, nor can we support extensions to our code that are not part of
our library.  Making classes final is one way of communicating that we do
not encourage modifications or arbitrary extensions to our code base.

The DataSketches library is an active open source Apache project.  We
encourage users to make suggestions and submit pull requests and contribute
to the library to make it a better product for everyone.   We believe we
are very social and open to new ideas.  We have a growing community of
interested users, developers and scientists that want to use our library
and make it even better.  We actively monitor our communication channels
and respond to sincere requests for help as promptly as we can.

If you have ideas about added capabilities that you feel would be valuable
extensions to the library, please engage with us through our community
<https://datasketches.apache.org/docs/Community/index.html> mailing lists
or Slack, we would like to hear from you.

As open source, you are always free to fork the library and do whatever you
want with the code subject to the Apache license.  But then, you are on
your own :)

Cheers,

Lee.











On Thu, Apr 23, 2020 at 4:27 PM Ron Crocker <rc...@newrelic.com> wrote:

> It seems anti-social to make these classes, particularly CpcSketch, final.
> Is there a good reason for that?
>
> Ron