You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/06/16 18:20:35 UTC

[GitHub] [pinot] priyen opened a new pull request, #8908: [Work in progress] Distinct count HLL pre-agg

priyen opened a new pull request, #8908:
URL: https://github.com/apache/pinot/pull/8908

   This adds support for distinct count hll pre-aggregation. It introduces a new property on the fieldSpec, `fixedLength` in bytes so that BYTES data type can be treated as fixed length and we can utilize the FixedByteSVMutableForwardIndex.
   When used for Hyperloglog data values, the `fixedLength` should represent in bytes the size of the Hyperloglog object when serialized.
   
   Hyperloglog w/ log2m of `8` has a size of 180 bytes, with a log2m of `12` has a size of 2740 bytes. I unit tested using log2m of 12 because that's the size the AverageVolumePerUser Decibel job uses.
   
   AverageVolumePerUser decibel job needs this feature
   
   - unit tests for the fixedByte mutable forward indexes' getBytes() and setBytes() new implementation
   - unit tests for aggregating rows and asserting on their Hyperloglog objects
   - manual tests in QA described here: https://paper.dropbox.com/doc/Distinct-Count-HLL-pre-agg-testing--BhLCthOAjQ3gS2Hdch9YrnJhAQ-tT9Vxx9sIp389tZEmskrZ
   
   Roll out: Merge; If noon's open source PR gets merged, then make a PR in open source also
   
   (Squashed by Merge Queue - Original PR: https://git.corp.stripe.com/stripe-private-oss-forks/pinot/pull/130)
   
   Instructions:
   1. The PR has to be tagged with at least one of the following labels (*):
      1. `feature`
      2. `bugfix`
      3. `performance`
      4. `ui`
      5. `backward-incompat`
      6. `release-notes` (**)
   2. Remove these instructions before publishing the PR.
    
   (*) Other labels to consider:
   - `testing`
   - `dependencies`
   - `docker`
   - `kubernetes`
   - `observability`
   - `security`
   - `code-style`
   - `extension-point`
   - `refactor`
   - `cleanup`
   
   (**) Use `release-notes` label for scenarios like:
   - New configuration options
   - Deprecation of configurations
   - Signature changes to public methods/interfaces
   - New plugins added or old plugins removed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] priyen closed pull request #8908: [Work in progress] Distinct count HLL pre-agg

Posted by GitBox <gi...@apache.org>.
priyen closed pull request #8908: [Work in progress] Distinct count HLL pre-agg
URL: https://github.com/apache/pinot/pull/8908


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org