You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@datasketches.apache.org by leerho <le...@gmail.com> on 2020/03/24 00:40:04 UTC

Updates

Folks, I hope everyone is safe and healthy during these challenging times!

Some updates:

   - The website Downloads
   <https://datasketches.apache.org/docs/Community/Downloads.html> page has
   been completely redesigned and automated.  When any of our components are
   released to dist there is a step in our Release Process
   <https://dist.apache.org/repos/dist/dev/incubator/datasketches/scripts/APACHE_JAVA_RELEASE_STEPS.md>
that
   just by running a script will automatically update the downloads page with
   the latest release versions.
   - We have also added 3 new TODO lists for Java
   <https://github.com/apache/incubator-datasketches-java/projects/1>, C++
   <https://github.com/apache/incubator-datasketches-cpp/projects/1> and
   the Website
   <https://github.com/apache/incubator-datasketches-website/projects/1>.
   These are brand new and will be filling up with tasks soon.
   - There are a number of new additions to the website that should make it
   easier for users to find the right sketches for their applications:
      - Sketch Features Matrix
      <https://datasketches.apache.org/docs/Architecture/SketchFeaturesMatrix.html>.
This
      provides in one view a comparison of the major features of the different
      sketches and sketch families in the library.
      - Features Matrix for Distinct Count Sketches
      <https://datasketches.apache.org/docs/DistinctCountFeaturesMatrix.html>.
      Our library has a wide variety of sketches for counting distinct values,
      each with different capabilities and trade-offs for different
      applications.  This matrix tries to remove some of the mystery by
      highlighting the major differences between the various distinct counting
      sketches.
      - HLL vs CPC Figures of Merit
      <https://datasketches.apache.org/docs/DistinctCountMeritComparisons.html>
There
      is always a lot of interest in the Flajolet, et al, HyperLogLog (HLL)
      sketch.  Not only do we have leading implementations of the HLL
sketch, our
      team developed a new *Compressed Probabilistic Counting* (CPC) sketch
      that outperforms the HLL sketch in terms of accuracy per stored
space. This
      new sketch is discussed briefly on our Research
      <https://datasketches.apache.org/docs/Community/Research.html> page,
      which also links to the theoretical paper
      <https://arxiv.org/abs/1708.06839> that discusses the new algorithm.
      There are also a new section in the Distinct Counting section of the
      website documentation that discusses the CPC sketch along with
programming
      examples.
      - Sketches by Component Repository
      <https://datasketches.apache.org/docs/Architecture/SketchesByComponent.html>.
      This new page organizes the library by the major repository
components and
      lists the sketches that are available in each of the components.
      - Sketch Criteria for Library Inclusion
      <https://datasketches.apache.org/docs/Architecture/SketchCriteria.html>.
      For new contributors to the library, this page outlines our current
      criteria for including new sketch algorithms into the library.

As always, we look forward to your comments and suggestions!

Cheers,

Lee.

Re: Updates

Posted by Evans Ye <ev...@apache.org>.
Got it. Thanks Lee.

leerho <le...@gmail.com> 於 2020年3月27日 週五 上午7:12寫道:

> Hi Evans,
> The question you ask is a fair one.  And we will be adding more cards to
> this list in the future.  However, this specific task requires deep
> knowledge of how the C++ and Java HLL sketches have been implemented and
> why.  I would not recommend a novice user attempt this task.  Needless to
> say, it is a task for us to add simpler tasks that could be attempted by
> folks new to the library to help them become acquainted.
>
> Cheers,
>
> Lee.
>
> On Wed, Mar 25, 2020 at 9:47 AM Evans Ye <ev...@apache.org> wrote:
>
>> As a user I really love the matrix. The questions I were asking are
>> mostly answered by it!
>>
>> May I know how the TODO[1] gets break down into workable tasks?
>> How can contributors participate if someone is willing to help?
>>
>> [1] https://github.com/apache/incubator-datasketches-java/projects/1
>>
>> Evans
>>
>>
>> leerho <le...@gmail.com> 於 2020年3月24日 週二 上午8:40寫道:
>>
>>> Folks, I hope everyone is safe and healthy during these challenging
>>> times!
>>>
>>> Some updates:
>>>
>>>    - The website Downloads
>>>    <https://datasketches.apache.org/docs/Community/Downloads.html> page
>>>    has been completely redesigned and automated.  When any of our components
>>>    are released to dist there is a step in our Release Process
>>>    <https://dist.apache.org/repos/dist/dev/incubator/datasketches/scripts/APACHE_JAVA_RELEASE_STEPS.md> that
>>>    just by running a script will automatically update the downloads page with
>>>    the latest release versions.
>>>    - We have also added 3 new TODO lists for Java
>>>    <https://github.com/apache/incubator-datasketches-java/projects/1>,
>>>    C++ <https://github.com/apache/incubator-datasketches-cpp/projects/1>
>>>    and the Website
>>>    <https://github.com/apache/incubator-datasketches-website/projects/1>.
>>>    These are brand new and will be filling up with tasks soon.
>>>    - There are a number of new additions to the website that should
>>>    make it easier for users to find the right sketches for their applications:
>>>       - Sketch Features Matrix
>>>       <https://datasketches.apache.org/docs/Architecture/SketchFeaturesMatrix.html>. This
>>>       provides in one view a comparison of the major features of the different
>>>       sketches and sketch families in the library.
>>>       - Features Matrix for Distinct Count Sketches
>>>       <https://datasketches.apache.org/docs/DistinctCountFeaturesMatrix.html>.
>>>       Our library has a wide variety of sketches for counting distinct values,
>>>       each with different capabilities and trade-offs for different
>>>       applications.  This matrix tries to remove some of the mystery by
>>>       highlighting the major differences between the various distinct counting
>>>       sketches.
>>>       - HLL vs CPC Figures of Merit
>>>       <https://datasketches.apache.org/docs/DistinctCountMeritComparisons.html> There
>>>       is always a lot of interest in the Flajolet, et al, HyperLogLog (HLL)
>>>       sketch.  Not only do we have leading implementations of the HLL sketch, our
>>>       team developed a new *Compressed Probabilistic Counting* (CPC)
>>>       sketch that outperforms the HLL sketch in terms of accuracy per stored
>>>       space. This new sketch is discussed briefly on our Research
>>>       <https://datasketches.apache.org/docs/Community/Research.html>
>>>       page, which also links to the theoretical paper
>>>       <https://arxiv.org/abs/1708.06839> that discusses the new
>>>       algorithm. There are also a new section in the Distinct Counting section of
>>>       the website documentation that discusses the CPC sketch along with
>>>       programming examples.
>>>       - Sketches by Component Repository
>>>       <https://datasketches.apache.org/docs/Architecture/SketchesByComponent.html>.
>>>       This new page organizes the library by the major repository components and
>>>       lists the sketches that are available in each of the components.
>>>       - Sketch Criteria for Library Inclusion
>>>       <https://datasketches.apache.org/docs/Architecture/SketchCriteria.html>.
>>>       For new contributors to the library, this page outlines our current
>>>       criteria for including new sketch algorithms into the library.
>>>
>>> As always, we look forward to your comments and suggestions!
>>>
>>> Cheers,
>>>
>>> Lee.
>>>
>>

Re: Updates

Posted by leerho <le...@gmail.com>.
Hi Evans,
The question you ask is a fair one.  And we will be adding more cards to
this list in the future.  However, this specific task requires deep
knowledge of how the C++ and Java HLL sketches have been implemented and
why.  I would not recommend a novice user attempt this task.  Needless to
say, it is a task for us to add simpler tasks that could be attempted by
folks new to the library to help them become acquainted.

Cheers,

Lee.

On Wed, Mar 25, 2020 at 9:47 AM Evans Ye <ev...@apache.org> wrote:

> As a user I really love the matrix. The questions I were asking are mostly
> answered by it!
>
> May I know how the TODO[1] gets break down into workable tasks?
> How can contributors participate if someone is willing to help?
>
> [1] https://github.com/apache/incubator-datasketches-java/projects/1
>
> Evans
>
>
> leerho <le...@gmail.com> 於 2020年3月24日 週二 上午8:40寫道:
>
>> Folks, I hope everyone is safe and healthy during these challenging times!
>>
>> Some updates:
>>
>>    - The website Downloads
>>    <https://datasketches.apache.org/docs/Community/Downloads.html> page
>>    has been completely redesigned and automated.  When any of our components
>>    are released to dist there is a step in our Release Process
>>    <https://dist.apache.org/repos/dist/dev/incubator/datasketches/scripts/APACHE_JAVA_RELEASE_STEPS.md> that
>>    just by running a script will automatically update the downloads page with
>>    the latest release versions.
>>    - We have also added 3 new TODO lists for Java
>>    <https://github.com/apache/incubator-datasketches-java/projects/1>,
>>    C++ <https://github.com/apache/incubator-datasketches-cpp/projects/1>
>>    and the Website
>>    <https://github.com/apache/incubator-datasketches-website/projects/1>.
>>    These are brand new and will be filling up with tasks soon.
>>    - There are a number of new additions to the website that should make
>>    it easier for users to find the right sketches for their applications:
>>       - Sketch Features Matrix
>>       <https://datasketches.apache.org/docs/Architecture/SketchFeaturesMatrix.html>. This
>>       provides in one view a comparison of the major features of the different
>>       sketches and sketch families in the library.
>>       - Features Matrix for Distinct Count Sketches
>>       <https://datasketches.apache.org/docs/DistinctCountFeaturesMatrix.html>.
>>       Our library has a wide variety of sketches for counting distinct values,
>>       each with different capabilities and trade-offs for different
>>       applications.  This matrix tries to remove some of the mystery by
>>       highlighting the major differences between the various distinct counting
>>       sketches.
>>       - HLL vs CPC Figures of Merit
>>       <https://datasketches.apache.org/docs/DistinctCountMeritComparisons.html> There
>>       is always a lot of interest in the Flajolet, et al, HyperLogLog (HLL)
>>       sketch.  Not only do we have leading implementations of the HLL sketch, our
>>       team developed a new *Compressed Probabilistic Counting* (CPC)
>>       sketch that outperforms the HLL sketch in terms of accuracy per stored
>>       space. This new sketch is discussed briefly on our Research
>>       <https://datasketches.apache.org/docs/Community/Research.html>
>>       page, which also links to the theoretical paper
>>       <https://arxiv.org/abs/1708.06839> that discusses the new
>>       algorithm. There are also a new section in the Distinct Counting section of
>>       the website documentation that discusses the CPC sketch along with
>>       programming examples.
>>       - Sketches by Component Repository
>>       <https://datasketches.apache.org/docs/Architecture/SketchesByComponent.html>.
>>       This new page organizes the library by the major repository components and
>>       lists the sketches that are available in each of the components.
>>       - Sketch Criteria for Library Inclusion
>>       <https://datasketches.apache.org/docs/Architecture/SketchCriteria.html>.
>>       For new contributors to the library, this page outlines our current
>>       criteria for including new sketch algorithms into the library.
>>
>> As always, we look forward to your comments and suggestions!
>>
>> Cheers,
>>
>> Lee.
>>
>

Re: Updates

Posted by Evans Ye <ev...@apache.org>.
As a user I really love the matrix. The questions I were asking are mostly
answered by it!

May I know how the TODO[1] gets break down into workable tasks?
How can contributors participate if someone is willing to help?

[1] https://github.com/apache/incubator-datasketches-java/projects/1

Evans


leerho <le...@gmail.com> 於 2020年3月24日 週二 上午8:40寫道:

> Folks, I hope everyone is safe and healthy during these challenging times!
>
> Some updates:
>
>    - The website Downloads
>    <https://datasketches.apache.org/docs/Community/Downloads.html> page
>    has been completely redesigned and automated.  When any of our components
>    are released to dist there is a step in our Release Process
>    <https://dist.apache.org/repos/dist/dev/incubator/datasketches/scripts/APACHE_JAVA_RELEASE_STEPS.md> that
>    just by running a script will automatically update the downloads page with
>    the latest release versions.
>    - We have also added 3 new TODO lists for Java
>    <https://github.com/apache/incubator-datasketches-java/projects/1>, C++
>    <https://github.com/apache/incubator-datasketches-cpp/projects/1> and
>    the Website
>    <https://github.com/apache/incubator-datasketches-website/projects/1>.
>    These are brand new and will be filling up with tasks soon.
>    - There are a number of new additions to the website that should make
>    it easier for users to find the right sketches for their applications:
>       - Sketch Features Matrix
>       <https://datasketches.apache.org/docs/Architecture/SketchFeaturesMatrix.html>. This
>       provides in one view a comparison of the major features of the different
>       sketches and sketch families in the library.
>       - Features Matrix for Distinct Count Sketches
>       <https://datasketches.apache.org/docs/DistinctCountFeaturesMatrix.html>.
>       Our library has a wide variety of sketches for counting distinct values,
>       each with different capabilities and trade-offs for different
>       applications.  This matrix tries to remove some of the mystery by
>       highlighting the major differences between the various distinct counting
>       sketches.
>       - HLL vs CPC Figures of Merit
>       <https://datasketches.apache.org/docs/DistinctCountMeritComparisons.html> There
>       is always a lot of interest in the Flajolet, et al, HyperLogLog (HLL)
>       sketch.  Not only do we have leading implementations of the HLL sketch, our
>       team developed a new *Compressed Probabilistic Counting* (CPC)
>       sketch that outperforms the HLL sketch in terms of accuracy per stored
>       space. This new sketch is discussed briefly on our Research
>       <https://datasketches.apache.org/docs/Community/Research.html>
>       page, which also links to the theoretical paper
>       <https://arxiv.org/abs/1708.06839> that discusses the new
>       algorithm. There are also a new section in the Distinct Counting section of
>       the website documentation that discusses the CPC sketch along with
>       programming examples.
>       - Sketches by Component Repository
>       <https://datasketches.apache.org/docs/Architecture/SketchesByComponent.html>.
>       This new page organizes the library by the major repository components and
>       lists the sketches that are available in each of the components.
>       - Sketch Criteria for Library Inclusion
>       <https://datasketches.apache.org/docs/Architecture/SketchCriteria.html>.
>       For new contributors to the library, this page outlines our current
>       criteria for including new sketch algorithms into the library.
>
> As always, we look forward to your comments and suggestions!
>
> Cheers,
>
> Lee.
>