You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2022/01/10 10:02:39 UTC
[GitHub] [druid] haragr opened a new issue #12135: High querytime for topN queries with injective=false namespace after upgrade to druid 0.22
haragr opened a new issue #12135:
URL: https://github.com/apache/druid/issues/12135
After upgrade to 0.22.0 version, we started observing significantly higher query and cpu times for namespaces with injective=false compared to 0.18.2. No config change was made from our end and we ensured that data remains the same.
For v18, the querytime was around 450ms.
For v22, the querytime was around 80,000ms
The time difference was at historical service. Downgrading only the historicals fixed the issue.
Datasource1 details (For 1 day of data)-> Total size is around 270G
Segment count is around 550
Rows per segment = 802,500
Dimension1 cardinality is around 1000
Namespace1 details ->
Count = 294,101
Injective = false
Query ->
`{"dimension": {"type": "lookup","dimension": "dimension1","outputName": "Result","name": "namespace1","injective": false,"retainMissingValue": true},"threshold": 50,"metric": {"type": "numeric","metric": "metric1"},"context": {"queryId": "queryID123","useCache":false,"vectorize":false,"debug":true,"timeout":200000},"queryType": "topN","dataSource": "datSource1","granularity": "All","filter": {"type": "and","fields": [{"type": "or","fields": [{"type": "in","dimension": "dimension1","extractionFn": {"type": "registeredLookup","lookup": "namespace1","retainMissingValue": true},"values": ["123 - test string ","234 - test string ",null]}]}]},"intervals": ["2021-12-05T00:00:00Z/2021-12-06T00:00:00Z"],"aggregations": [{"type": "filtered","aggregator": {"type": "longSum","name": "metric1","fieldName": "Views"},"filter": {"type": "and","fields": [{"type": "and","fields": [{"type": "not","field": {"type": "selector","dimension": "dimension2","value": "0"}}]}]}}],"postAggregations": []}`
Below are the flame graphs for druid historical service when only the query in question was running.
[flame_graph.zip](https://github.com/apache/druid/files/7838014/flame_graph.zip)
In the flamegraphs, we noticied a major portion was spent in query processing in v0.22.0.
If we follow the topN_datasource_1 stack to the top , we notice that the classes used for string aggregation are different across versions.
In v0.22.0 -
> Lorg/apache/druid/query/topn/types/StringTopNColumnAggregatesProcessor:::scanAndAggregateWithCardinalityUnknown
In v0.18.1 -
> Lorg/apache/druid/query/topn/types/StringTopNColumnAggregatesProcessor:::scanAndAggregateWithCardinalityKnown
Also in v18 , there is usage of class
> Lorg/apache/druid/query/topn/HeapBasedTopNAlgorithm:::scanAndAggregate
which is missing in v22.
There was no change in underlying data between the tests.
Can anyone please have a look and see if this is a genuine issue with druid.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org