You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@druid.apache.org by Phaneendra Kumar Divi <ph...@gmail.com> on 2020/08/03 11:50:57 UTC

Sparse Data and Performance

Hello There,

We are considering using Druid for storing custom events from various tenants. Modelling the event attributes as dimensions on a schema-less Druid datasource makes it a very sparse table as we have over 10K unique dimensions which could potentially grow over time. The GroupBy queries for aggregation are reasonably responsive however, we have use cases which need raw data to be retrieved and displayed with scan queries, despite applying filters the big problem is all the nulls druid returns for the non-existent columns values in each row. This makes the JSON response bulky to deal with hence contributing to latencies. Is there are workaround for this?

Regards,
Phani

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@druid.apache.org
For additional commands, e-mail: dev-help@druid.apache.org