You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/07/22 22:00:47 UTC

[GitHub] [incubator-druid] gianm commented on issue #8125: Leave a single CachePopulator and choose sync/async mode automatically

gianm commented on issue #8125: Leave a single CachePopulator and choose sync/async mode automatically
URL: https://github.com/apache/incubator-druid/issues/8125#issuecomment-513971684

Maybe it's just me, but I find foreground/background and sync/async to be equally understandable.

> There could be a single implementation. The executor may be DirectExecutorService in the "sync" case.

A note: at least currently, the control flow is different enough that the foreground populator cannot be implement as a background populator with a direct executor service. I think the main reason is that the background populator isn't only trying to do the _population_ in the background, it's also trying to do _computation of cache entries_ in the background _and in parallel_.

The latter thing means that it needs to take a different approach to the foreground one. In particular, the foreground populator is simple: it writes something to a ByteArrayOutputStream for each item as the Sequence is walked. The background populator instead writes these items into a List and then starts up separate thread pool tasks to serialize each one, and then also populates the cache using yet another task.

In other words, if the foreground populator was replaced with a background populator + direct executor, it would use a lot more memory than it does today, since it would need to buffer up a List of items that it currently just writes immediately to a ByteArrayOutputStream.

> It seems to me that the crucial difference between sync and async is whether blocking cache (Redis/Memcached) or local, (mostly) non-blocking cache (Caffeine) is used. At least, this should be highlighted in docs; at most, the mode of cachePopulator should be chosen automatically depending on the configured type of the cache.

It's not necessarily that simple, for two reasons:

1. Some remote cache implementations (like memcached) queue things internally and return from `cache.put` _before_ the remote server acknowledges the write. So background population doesn't add any extra resistance to network based blocking.
2. Even for local cache implementations, background population could in theory be useful, since it does cache entry generation in the background too. This could in theory be expensive enough (CPU-wise) that it is useful to background it.

> Does it make any sense to make async cache populator which pushes results to remote cache to have more than 1 thread? After all, we will be hitting the same bottleneck - the local NIC. So async populator may just have a queue and a single thread takes cache entries from the queue and tries to push them to remote cache.

It depends how exactly the Cache impl works. If its `put` method is blocking, then having more than one thread is useful, since in that case you can get more bandwidth by doing multiple connections requests in parallel (otherwise, you have too much 'dead air' while waiting for responses). If its `put` method is CPU intensive (maybe it must wrap the provided `byte[]` in an envelope of sorts, involving a lot of memory copies) then having more than one thread could also improve throughput (but maybe a bad idea, since it's scary if you're spending so much CPU on putting stuff into the query cache). Otherwise, it's probably not useful to have more than one thread.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org