You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/11/15 01:55:40 UTC

[GitHub] [druid] clintropolis commented on pull request #11917: restore and deprecate AggregatorFactory methods

clintropolis commented on pull request #11917:
URL: https://github.com/apache/druid/pull/11917#issuecomment-968442515


   Just to weigh in - despite raising this PR I'm actually in the camp that @gianm is in on this, else I probably would've done #11713 like this in the first place. I guess in my head I've been assuming whatever we eventually call Druid 1.0 is where I have been considering taking up more conservative stances on extension point changes (but maybe also 1.0 means nothing because people use Druid in production environments and have for years so isn't a real excuse).
   
   Assuming that those who implement custom aggs are at least recompiling their extension every release (which they really really should be doing), then the changes in #11713 seem like not a very big deal to me, because it would fail to compile and so be noticed immediately and has a very easy fix that shouldn't actually require extensive testing. This isn't changing the interface of the aggregator implementations themselves, which I think would be more involved for extension writers and require more care, just requiring factories to provide additional information.
   
   >On the argument of increasing support burden, that argument doesn't apply here as the "unknown complex" is needed anyway, you'll note that this change isn't introducing UNKNOWN_COMPLEX it is just using it in a default implementation.
   
   This isn't quite accurate, not implementing `getFinalizedColumnType` (as it has been renamed in this PR) is a loss of information when it has to fall back on `UNKNOWN_COMPLEX` compared to aggregators that properly implement the interface (`UNKNOWN_COMPLEX` was added in #11713 as a short term solution for serde compat stuff, and eventually should be removed). Prior to #11713 there was no possible way to get what actual complex type an aggregator would finalize to, since there was no `getFinalizedColumnTypeName` method (part of the work of #11713 was going through all existing aggregator implementations and filling in the correct type information). 
   
   The engine (and future changes I would like to do to it) would generally prefer to have the complete type information however, which is only available by actually implementing the new methods. I think it could be fairly argued that nothing is broken (yet) by falling back to unknown complex since we haven't heavily leaned into everything #11713 made possible (and since that PR introduced it it acknowledges that things are incomplete and stuff has to compensate for it). It does makes it harder to actually take advantage of this new information though, since it draws out the period that we still need to handle the unknown case. 
   
   
   On the subject of this PR only impacting a single release, I'm not entirely sure that is true either unless we just leave these functions named as they are renamed in this PR - `getColumnType` and `getFinalizedColumnType`. To get fully back into the state of #11713, e.g. with the methods named `getType` and `getFinalizedType`, I think it would take at least 3 releases to get done if we do it incrementally like this. 0.23 which would deprecate and add the new methods, 0.24 which could remove the old versions, and possibly make new versions of `getType` and `getFinalizedType` back to how they are in #11713 with default implementations that call `getColumnType` and `getFinalizedColumnType` which would then be deprecated, and then finally in 0.25 we can remove `getColumnType` and `getFinalizedColumnType`. This disrupts extension writers at least twice instead of once, which seems worse?
   
   I guess I obviously don't feel so strongly as to block this change either, lest I wouldn't have opened this PR, but I still don't personally feel like we gain much to make it strictly necessary either, and it definitely isn't without pain, so I'm at best a +0 on this (talked up from a -1). 
   
   +1 for having this discussion though... should we take it to the dev list to try to get more feedback on this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org