You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2022/09/01 20:30:54 UTC

[GitHub] [druid] gianm commented on issue #13006: It seams that cache does not help Improve query efficiency

gianm commented on issue #13006:
URL: https://github.com/apache/druid/issues/13006#issuecomment-1234748679

I wouldn't say there is _no_ benefit to caching scan queries. But certainly the expected benefit (vs. cost) is higher for queries that do aggregations. I expect we'll add caching for scan queries at some point, in a way that mitigates the costs, like by having a max payload size that we'll cache.

Btw, based on reading your query, the number one, most effective thing you can do to get good performance is secondary partitioning on `idea_instrument`. See https://imply.io/blog/multi-dimensional-range-partitioning/ for details on why this is great. If you are not already doing this I strongly recommend it. In current releases you can do this with `partitionsSpec` on batch ingest like:

```json
"partitionsSpec" : {
"type": "range",
"partitionDimensions":["idea_instrument"],
"targetRowsPerSegment" : 5000000
}
```

In next release (24.0) you can do this with SQL-based ingest using:

```
CLUSTERED BY idea_instrument
```

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org