You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/11/17 13:38:54 UTC

[GitHub] [druid] cheddar commented on issue #11933: Discussion: operator structure for Druid queries?

cheddar commented on issue #11933:
URL: https://github.com/apache/druid/issues/11933#issuecomment-971591578


   That smaller stack trace definitely does seem nicer, I looked through the prototype branch and, honestly, wasn't able to fully wrap my head around the difference just from trying to dive into the code that was done.  From your description (and the stack trace) it looks like it's a totally different implementation that might be ignoring the Toolchests and Factories, but it seems like it's still using those?  
   
   Once I found the `Operator` interface itself, if I'm understanding it correctly, I believe it is semantically equivalent to the `Sequence` interface.  `Iterator open()` is semantically equivalent to `Yielder<> toYielder()` and `close()` is semantically equivalent to closing that yielder.  `Sequence` also comes with a method that allows for accumulation with a callback, which was useful at one point but who knows how helpful it is anymore (it was pervasive in the way back ago times when Sequences only had the accumulate method and there was no such thing as a Yielder).  The only seeming semantic difference between them would be the FragmentContext which is being passed directly into the Operator such that the Operator can close over it in the iterator implementation, where the equivalent thing doesn't happen directly on the Sequence, but rather happens on the QueryRunner call.
   
   Honestly, the more I think about it, the more I think that the reason your stack trace is so much cleaner is because your Operators aren't using generic map/concat type operations but instead all of the logic is bundled together into each class which are implementing the Iterator interface.  If someone were to implement the proposed Operators by doing, e.g. a `Iterators.map(input.open(context), a -> a + 2)` I think you would get the same ugly stack traces with Operators as well..  As such, I think that what this proposal actually boils down to is a suggestion that we stop using map/concat/filter style operators and instead have named classes that do specific things so that the stacktrace is more pretty.  Fwiw, making the stack trace more pretty is super useful, so doing that is definitely nice.  There's a chance that the whole callback/accumulator structure of the `Sequence` object still makes the stack traces more ugly, but I think that's not actually the case.  The real thing th
 at makes the stack traces so ugly is that we've got all these `WrappingSequence` and `ConcatSequence` and other things like that which are masking the actual work being done because the lambda that they do the work with doesn't actually make it into the stack trace.
   
   So, going back to your description, in general, I think that you can actually think of the current `QueryRunner` as the `QueryPlanner` (at least, at the segment-level anyway), and the Sequence as the `Operator`.
   
   
   All that said, I could be missing some major and I don't think there's any love for any specific way of writing the code.  If things can be better, let's make them better.  Some of the high level concerns:
   
   1) Technically speaking, the QueryToolChest and QueryRunnerFactories are extension points.  It would be good to get clarity on whether this proposal eliminates them as extension points, replaces them with a new extension point, or just magically keeps using them.  Depending on which one of those it is, strategies might vary.
   2) Query performance and memory management are important.  Not to say that the proposal makes those worse, but swapping out the underlying implementation in-place creates a relatively high burden for making sure that there is no adverse impact to performance and memory management of the process in the face of many concurrent queries.  Given that I think the proposal is just a stylistic adjustment to the current code without any new or different semantic constructs being added, making that adjustment to stop using concat/map/WrappingSequence, etc. shouldn't actually need to come with any sort of impact on performance, so if that understanding is correct, this is maybe not a big deal.
   3) Some parts of the physical plan cannot be determined until the query reaches the segment.  For example, if a query is expecting a numerical column, it might still get a String column in one segment and a numerical column in another segment, it needs to be intelligent enough to not balk at this and instead align the String column to be numerical at query time.  From your explanation, some of me thought the proposal is to move planning to a top-level thing before segments are visited, but the Operators seem to be running and planning at the segment-level, in which case, this is doing per-segment planning.
   4) There might be other requirements that I'm not thinking of immediately, but these are the ones that come to mind first.
   
   Depending on which understanding of the proposal is actually correct, the path to introducing the changes in the "safest" possible manner will be different.  Rather than speculating on all of the paths, it probably makes the most sense to come to a conclusion on what is actually different about the proposal as that is the primary input into which path to implementation makes the most sense.  Just to summarize, I think getting to that understanding of what is different can probably come from answering two questions:
   
   1) Why is the Operator interface not just different methods for the same semantics as what we have today?
   2) What happens to the QueryToolChest and QueryRunnerFactory interfaces in the Operator world?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org