You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@calcite.apache.org by Julian Hyde <jh...@apache.org> on 2017/07/01 01:29:36 UTC

Re: Questions about the multiplier in cost computation for Druid Query

Is it true that we'd always want to push the limit down to Druid,
regardless of whether the limit is large or small? If so, and if this
is not happening, there is a bug in the cost model; please log it.

On Fri, Jun 30, 2017 at 1:56 PM, Junxian Wu
<ru...@yahoo-inc.com.invalid> wrote:
> Hi Dev Community,
> In the middle of the usage of Calcite on Druid, I tried to run some simple query like "select * from table_name limit 100". Although the SELECT query in druid is super inefficient but by limiting the output row, it can still return the result fast.
> However now the cost computation could make the planner tend to handle the LIMIT RelNode (fetch() in sort node) in the memory of Calcite JVM. In my case, When the number of LIMIT is larger than 7, the limit will not get pushed in and since the total amount of data is large and will make the JVM out of memory. By changing the multiplier of cost when sort node get pushed in to a small number, a larger LIMIT can be pushed in. This logic does not seem correct because when the LIMIT is fetching more row, we should tend to handle it in database(Druid) side instead of in memory. Should we redesign the cost computation so it will have the correct logic?
> Thank you.

Re: Questions about the multiplier in cost computation for Druid Query

Posted by Julian Hyde <jh...@apache.org>.

OK, contributions welcome.

> On Jul 2, 2017, at 9:25 AM, JD Zheng <jd...@gmail.com> wrote:
> 
> Julian, I think it’s good idea to always push down the limit. But regarding to Jungian’s case, there’s a simpler fix. This is a bug in the computing of cost that I have already reported: https://issues.apache.org/jira/browse/CALCITE-1842
> The simple fix is to switch the two parameters. That will solve the problem and push the limit down.
> 
> -JD
> 
>> On Jun 30, 2017, at 6:29 PM, Julian Hyde <jh...@apache.org> wrote:
>> 
>> Is it true that we'd always want to push the limit down to Druid,
>> regardless of whether the limit is large or small? If so, and if this
>> is not happening, there is a bug in the cost model; please log it.
>> 
>> On Fri, Jun 30, 2017 at 1:56 PM, Junxian Wu
>> <ru...@yahoo-inc.com.invalid> wrote:
>>> Hi Dev Community,
>>> In the middle of the usage of Calcite on Druid, I tried to run some simple query like "select * from table_name limit 100". Although the SELECT query in druid is super inefficient but by limiting the output row, it can still return the result fast.
>>> However now the cost computation could make the planner tend to handle the LIMIT RelNode (fetch() in sort node) in the memory of Calcite JVM. In my case, When the number of LIMIT is larger than 7, the limit will not get pushed in and since the total amount of data is large and will make the JVM out of memory. By changing the multiplier of cost when sort node get pushed in to a small number, a larger LIMIT can be pushed in. This logic does not seem correct because when the LIMIT is fetching more row, we should tend to handle it in database(Druid) side instead of in memory. Should we redesign the cost computation so it will have the correct logic?
>>> Thank you.
>

Re: Questions about the multiplier in cost computation for Druid Query

Posted by JD Zheng <jd...@gmail.com>.

Julian, I think it’s good idea to always push down the limit. But regarding to Jungian’s case, there’s a simpler fix. This is a bug in the computing of cost that I have already reported: https://issues.apache.org/jira/browse/CALCITE-1842
The simple fix is to switch the two parameters. That will solve the problem and push the limit down.

-JD

> On Jun 30, 2017, at 6:29 PM, Julian Hyde <jh...@apache.org> wrote:
> 
> Is it true that we'd always want to push the limit down to Druid,
> regardless of whether the limit is large or small? If so, and if this
> is not happening, there is a bug in the cost model; please log it.
> 
> On Fri, Jun 30, 2017 at 1:56 PM, Junxian Wu
> <ru...@yahoo-inc.com.invalid> wrote:
>> Hi Dev Community,
>> In the middle of the usage of Calcite on Druid, I tried to run some simple query like "select * from table_name limit 100". Although the SELECT query in druid is super inefficient but by limiting the output row, it can still return the result fast.
>> However now the cost computation could make the planner tend to handle the LIMIT RelNode (fetch() in sort node) in the memory of Calcite JVM. In my case, When the number of LIMIT is larger than 7, the limit will not get pushed in and since the total amount of data is large and will make the JVM out of memory. By changing the multiplier of cost when sort node get pushed in to a small number, a larger LIMIT can be pushed in. This logic does not seem correct because when the LIMIT is fetching more row, we should tend to handle it in database(Druid) side instead of in memory. Should we redesign the cost computation so it will have the correct logic?
>> Thank you.