You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "ctubbsii (via GitHub)" <gi...@apache.org> on 2023/07/24 18:18:44 UTC

[GitHub] [accumulo] ctubbsii opened a new issue, #3651: Consider using DataSketches to precompute quantiles or other values to aid with more rapid split point computation

ctubbsii opened a new issue, #3651:
URL: https://github.com/apache/accumulo/issues/3651

   DataSketches is useful for precomputing various distribution statistics of data read exactly once. If we use it when we write a file, we could pre-compute things and store it in the file metadata to help make split point computation faster. In order for this to be useful, we would need to make sure we could aggregate the pre-computed statistics across locality groups within a file and across files, so calculation of approximate midpoints can be done very efficiently, only needing to read this pre-computed data to find a suitable midpoint when automatically splitting tablets.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org