You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/11/17 17:22:20 UTC

[GitHub] [incubator-druid] gianm commented on issue #8884: Get rid of instance-level ThreadLocal in ConcurrentGrouper

gianm commented on issue #8884: Get rid of instance-level ThreadLocal in ConcurrentGrouper
URL: https://github.com/apache/incubator-druid/issues/8884#issuecomment-554767182
 
 
   An interesting analysis.
   
   > This means that creating concurrent and non-concurrent groupers should be un-generalized (currently generalized in `RowBasedGrouperHelper.createGrouperAccumulatorPair()` method), but this seems to be beneficial because this looks like a false abstraction which only adds coupling and confusion of execution paths (harder to see which grouper type is created and used when) rather than lifts any logic. For that matter, `ConcurrentGrouper` doesn't need to implement `Grouper` interface, too, providing _only_ `aggregate(int threadNumber, key)` method, but not standard `Grouper`'s methods `aggregate(key)` and `aggregate(key, keyHash)`. I think it would also make things clearer because it would become more evident for people who try to unwind the logic of the subsystem in their heads where `ConcurrentGrouper` could and could not appear.
   
   I also think splitting ConcurrentGrouper out of the Grouper interface would make things clearer. Or maybe redesigning the Grouper interface somehow. The various implementations have a lot of similarities, but I think the attempt to unify everything into a single interface was not done well.
   
   I'm not sure how I feel about the specific idea of introducing a thread number in the concurrent version. It seems error-prone, but may perform better enough than the instance-level threadlocals that it could be worth it.
   
   By the way, the vectorized version is already split out (see VectorGrouper).
   
   > Also, it means that `concurrencyHint` is not actually a "hint", that is, something completely heuristical/performance-related which could be any number, but actually a parameter which should have a very specific meaning of the actual number of concurrent threads calling `aggregate()` on the instance of `ConcurrentGrouper`.
   
   Yeah, it's more than a hint. It is the exact parallelism level.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org