You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Mu Kong <ko...@gmail.com> on 2017/12/26 01:46:51 UTC

Is there a way to make the broker merge big result set faster?

Hi, community,

I have a subquery running slow on druid cluster.

The *inner query* yield fields:

*SELECT D1, D2, D3, MAX(M1) as MAX_M1*
*FROM SOME_TABLE*
*GROUP BY D1, D2, D3*

Then, the outer query looks like:

*SELECT D1, D2, SUM(MAX_M1)*
*FROM INNER_QUERY*
*GROUP BY D1, D2*

The D3 is a high cardinality dimension, which makes the result set of the
inner query very huge.
But still, the inner query itself takes 1~2 seconds to "process" and
transfer the data to the broker.

The outer query, however, takes 40 seconds to process.

As far as I understand how broker work with the historicals, I think the
druid simply fetch the result of each segment directly from historicals'
memory for the inner query,
so that there isn't any computation when druid deals with the inner query.
However, as the inner query finishes, all the results from the historicals
will be passed to one single broker for merging the result.
In my case, because the result set from the inner query is tremendous, this
phase takes a long time to finish.

I think the situation mentioned in this thread is quite similar to my case:
https://groups.google.com/d/msg/druid-user/ir7hRpxg0PI/3oqCDAwoPjMJ
Gian mentioned "Historical merging", and I have tried that by disabling the
broker cache, but it didn't really make the query faster.

Is there any other way to make broker merge faster?

Thanks!


Best regards,
Mu