You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@druid.apache.org by "paul-rogers (via GitHub)" <gi...@apache.org> on 2023/03/10 20:14:05 UTC

[GitHub] [druid] paul-rogers commented on issue #13816: Improve MSQ Rollup Experience with Catalog Integration

paul-rogers commented on issue #13816:
URL: https://github.com/apache/druid/issues/13816#issuecomment-1464364704

Thanks, @vogievetsky for the comment. The auto-detection of rollup was in response to someone who _didn't_ like the idea of a flag.

As it turns out, the approach outlined here is not actually achievable. It will work better for the catalog to not be in the "rollup-or-not" "measure-or-dimension" business, but rather just to state storage types. Rollup then becomes a property of the ingestion query, not the datasource. This allows a use case in which early data is detail and later data is rolled up.

Also, it turns out that our aggregations are not quite ready for the level of metadata envisioned here. All we really can know is the storage type. Thus, a simple `long` or a `SUM(long)`, `MIN(long)` and `MAX(long)` are all the same at the physical level, so the catalog actually cannot tell them apart. Again, it is up to each query to choose an aggregate that works for that ingestion.

So, the revised proposal will be that the user specifies the storage type, as a native Druid type. Even there, it turns out that the Calcite planner only knows about finalized types, not intermediate types. There is thought that, eventually, Druid will offer distinct functions for intermediate and final aggregators. That is some time off.

Or, the catalog could list the finalized type and validate the finalized aggregators against that type, even though MSQ will actually use some other type for intermediate aggregates.

So, in the short term, perhaps the catalog will apply only to detail tables, but not rollup because type information in that case is not sufficient to allow any meaningful validation. Once the project leads sort out how MSQ aggregation will work, the catalog can implement whatever choices we make.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org