You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2018/12/20 01:05:57 UTC

[GitHub] leerho commented on issue #6743: IncrementalIndex generally overestimates theta sketch size

leerho commented on issue #6743: IncrementalIndex generally overestimates theta sketch size
URL: https://github.com/apache/incubator-druid/issues/6743#issuecomment-448816707

@gianm

I don't have a deep understanding of the inner workings of the memory allocation strategy in Druid, but I should point out that the current model of allocating equal sized slots in a Buffer where each slot is the maximum possible size of a sketch Is very likely a horrible waste of memory space.

Sketches, are designed to start very small in their memory requirements and then grow gradually as the size of the input stream grows. Unique counting sketches (Theta, HLL, CPC) actually reach a maximum size, beyond which they do not grow at all. Quantile sketches can continue to grow but very very slowly (sub-logarithmically with the size of the input stream).

For example, a Theta Sketch configured with LgK = 14 (~16K) has a maximum size of about 256KB. However, with only one entry it consumes only 16 Bytes.

Our experience in using sketches in large systems is that the size of the input streams that feed the sketches tend to follow a power-law distribution. Which means given a system total number of sketches of one million, the vast majority of sketches will have very few entries, and just a few sketches will be full size.

> Note: If the sketches ingested into Druid are pre-aggregated sectors, for example, this power-law distribution in space required for the sketches is not as strong, as most of the sketches will already be at max size. But in this case you are dealing with far fewer sketches, so the memory management concern may not be as severe.

To illustrate this power-law effect, I have created a toy model of space utilization in the attached excel spreadsheet:

[DruidModel_pwrLaw.xlsx](https://github.com/apache/incubator-druid/files/2696971/DruidModel_pwrLaw.xlsx)

For this model I assumed a power-law distribution of stream sizes from one unique to 134M uniques and a configured sketch size LgK = 14. I adjusted the parameters so that the total number of sketches in the system is about one million.

With these assumptions, the actual storage bytes required by all of the sketches is about 440MB. However, allocating the full size of 256KB for every sketch would require nearly 35GB. This results in a space utilization of about 1.27%! This is a huge waste of memory!

***
What I would recommend is that if we could work together, we could come up with a much more efficient memory management model for sketches in Druid that would allow you to recapture most if not all of that wasted space. This will likely require some changes in Druid as well as a change in how sketches use and allocate memory.

One idea I have is to allow each sketch to self-allocate the space it needs as it grows. The sketch would be responsible for allocating new space and deallocating old space (especially for off-heap).
What would live in the equal-sized slots of the Druid buffer would be a small, fixed sized "wrapper" with a pointer and a few other values, where the data structure of the sketch would be in an off-heap space managed by the sketch itself. This means that Druid memory manager would have to allow for some amount of off-heap space that it is not controlling. In the toy model above, that would be about 440MB of off-heap space. The benefit is that Druid would get back nearly 35GB of off-heap space for other uses. It seems like a big win to me, but I don't understand what changes would have to happen in Druid to make this work.

We can certainly do the sketch work, but you (or someone deep into Druid) would have to make the appropriate changes in Druid. One important aspect of the Druid work would be to provide some simple tools to analyze the actual distributions of stream sizes, sketch sizes and space utilizations before and after the change. This meta-data would be invaluable feedback for us to understand specific Druid customers' situations, and help us tune the system for even better performance going forward.

What do you think?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org