You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@calcite.apache.org by JD Zheng <jd...@gmail.com> on 2017/06/13 17:47:12 UTC

Bug in Sort.computeSelfCost()?

Hi, 

Our team currently uses calcite druid-adaptor to query data in druid. We found that at some cases when the limit is over 10 and the data set has lots of dimensions, the limit is not pushed down to druid.

We looked further at the cost calculation of the different plans, and found that the following code in Sort.java looks suspicious:

 
  @Override public RelOptCost computeSelfCost(RelOptPlanner planner,
      RelMetadataQuery mq) {
    // Higher cost if rows are wider discourages pushing a project through a
    // sort.
    double rowCount = mq.getRowCount(this);
    double bytesPerRow = getRowType().getFieldCount() * 4;
    return planner.getCostFactory().makeCost(
        Util.nLogN(rowCount) * bytesPerRow, rowCount, 0);
 


And the definition of makeCost is:

public interface RelOptCostFactory {
  /**
   * Creates a cost object.
   */
  RelOptCost makeCost(double rowCount, double cpu, double io);



So, the first parameter should be rowCount, the second is cpu. 

It seems that caller is feeding the wrong parameters.

Once we switch these two parameters, it works out fine: the limit is pushed down to the druid query.


Are we doing the right thing by switching the parameters? Is it a bug here or there’s any reason we feed the parameters this way?




By the way, we found some dead code in VolcanoCost.java



Does it mean that we don’t need to bother feed in the cpu cost and io cost, these costs should be somehow modeled in rowcounts?


Thanks,

-Jiandan

Re: Bug in Sort.computeSelfCost()?

Posted by Julian Hyde <jh...@apache.org>.

> On Jun 14, 2017, at 12:05 PM, JD Zheng <jd...@gmail.com> wrote:
> 
> One more question. For the first issue, is the solution switching the two parameters? 

Yes, I believe so.

But I’d run the test suite to make sure there aren’t any unintended consequences.

Julian

Re: Bug in Sort.computeSelfCost()?

Posted by JD Zheng <jd...@gmail.com>.

One more question. For the first issue, is the solution switching the two parameters? Thanks,

-JD

> On Jun 14, 2017, at 11:41 AM, Julian Hyde <jh...@apache.org> wrote:
> 
> That does look like a bug. Can you log it please.
> 
> The reason for the “if (false)” is that we found that costs don’t work well if they form only a partial order, not a total order. If you have two RelNodes R1 and R2 in an equivalent set, and they have costs C1 and C2, and neither C1 <= C2 nor C2 <= C1 is true, which is the Volcano planner to pick? It will tend to pick the one that it saw first, and that is bad news because it is arbitrary and non-deterministic.
> 
> So, we should probably find a way to convert a RelOptCost to a totally-ordered value, such as by applying weights to cpu, io and memory cost and returning a double. (Anyone have a better idea?) Can you log a bug for that also.
> 
> Julian
> 
> 
>> On Jun 13, 2017, at 10:47 AM, JD Zheng <jd...@gmail.com> wrote:
>> 
>> Hi, 
>> 
>> Our team currently uses calcite druid-adaptor to query data in druid. We found that at some cases when the limit is over 10 and the data set has lots of dimensions, the limit is not pushed down to druid.
>> 
>> We looked further at the cost calculation of the different plans, and found that the following code in Sort.java looks suspicious:
>> 
>> 
>>  @Override public RelOptCost computeSelfCost(RelOptPlanner planner,
>>      RelMetadataQuery mq) {
>>    // Higher cost if rows are wider discourages pushing a project through a
>>    // sort.
>>    double rowCount = mq.getRowCount(this);
>>    double bytesPerRow = getRowType().getFieldCount() * 4;
>>    return planner.getCostFactory().makeCost(
>>        Util.nLogN(rowCount) * bytesPerRow, rowCount, 0);
>> 
>> 
>> 
>> And the definition of makeCost is:
>> 
>> public interface RelOptCostFactory {
>>  /**
>>   * Creates a cost object.
>>   */
>>  RelOptCost makeCost(double rowCount, double cpu, double io);
>> 
>> 
>> 
>> So, the first parameter should be rowCount, the second is cpu. 
>> 
>> It seems that caller is feeding the wrong parameters.
>> 
>> Once we switch these two parameters, it works out fine: the limit is pushed down to the druid query.
>> 
>> 
>> Are we doing the right thing by switching the parameters? Is it a bug here or there’s any reason we feed the parameters this way?
>> 
>> 
>> 
>> 
>> By the way, we found some dead code in VolcanoCost.java
>> <PastedGraphic-1.tiff>
>> 
>> 
>> Does it mean that we don’t need to bother feed in the cpu cost and io cost, these costs should be somehow modeled in rowcounts?
>> 
>> 
>> Thanks,
>> 
>> -Jiandan
>

Re: Bug in Sort.computeSelfCost()?

Posted by Julian Hyde <jh...@gmail.com>.

You probably would want (100, 100, 100) to come out as less than (99, 10000000, 10000000). So I don’t think a tie-break is a good idea. It is deterministic but it is arbitrary and frequently wrong.

Julian


> On Jun 14, 2017, at 12:03 PM, JD Zheng <jd...@gmail.com> wrote:
> 
> Sure. I’ll log them. 
> 
> As to make cost model totally ordered, can we add some kind of signature that uniquely identify Relnode to break the tie? So the order will be <cost(rowCount, cpuCost, ioCost), Relnode.signature>.
> 
> -jiandan
> 
> 
>> On Jun 14, 2017, at 11:41 AM, Julian Hyde <jh...@apache.org> wrote:
>> 
>> That does look like a bug. Can you log it please.
>> 
>> The reason for the “if (false)” is that we found that costs don’t work well if they form only a partial order, not a total order. If you have two RelNodes R1 and R2 in an equivalent set, and they have costs C1 and C2, and neither C1 <= C2 nor C2 <= C1 is true, which is the Volcano planner to pick? It will tend to pick the one that it saw first, and that is bad news because it is arbitrary and non-deterministic.
>> 
>> So, we should probably find a way to convert a RelOptCost to a totally-ordered value, such as by applying weights to cpu, io and memory cost and returning a double. (Anyone have a better idea?) Can you log a bug for that also.
>> 
>> Julian
>> 
>> 
>>> On Jun 13, 2017, at 10:47 AM, JD Zheng <jd...@gmail.com> wrote:
>>> 
>>> Hi, 
>>> 
>>> Our team currently uses calcite druid-adaptor to query data in druid. We found that at some cases when the limit is over 10 and the data set has lots of dimensions, the limit is not pushed down to druid.
>>> 
>>> We looked further at the cost calculation of the different plans, and found that the following code in Sort.java looks suspicious:
>>> 
>>> 
>>> @Override public RelOptCost computeSelfCost(RelOptPlanner planner,
>>>     RelMetadataQuery mq) {
>>>   // Higher cost if rows are wider discourages pushing a project through a
>>>   // sort.
>>>   double rowCount = mq.getRowCount(this);
>>>   double bytesPerRow = getRowType().getFieldCount() * 4;
>>>   return planner.getCostFactory().makeCost(
>>>       Util.nLogN(rowCount) * bytesPerRow, rowCount, 0);
>>> 
>>> 
>>> 
>>> And the definition of makeCost is:
>>> 
>>> public interface RelOptCostFactory {
>>> /**
>>>  * Creates a cost object.
>>>  */
>>> RelOptCost makeCost(double rowCount, double cpu, double io);
>>> 
>>> 
>>> 
>>> So, the first parameter should be rowCount, the second is cpu. 
>>> 
>>> It seems that caller is feeding the wrong parameters.
>>> 
>>> Once we switch these two parameters, it works out fine: the limit is pushed down to the druid query.
>>> 
>>> 
>>> Are we doing the right thing by switching the parameters? Is it a bug here or there’s any reason we feed the parameters this way?
>>> 
>>> 
>>> 
>>> 
>>> By the way, we found some dead code in VolcanoCost.java
>>> <PastedGraphic-1.tiff>
>>> 
>>> 
>>> Does it mean that we don’t need to bother feed in the cpu cost and io cost, these costs should be somehow modeled in rowcounts?
>>> 
>>> 
>>> Thanks,
>>> 
>>> -Jiandan
>> 
>

Re: Bug in Sort.computeSelfCost()?

Posted by JD Zheng <jd...@gmail.com>.

Sure. I’ll log them. 

As to make cost model totally ordered, can we add some kind of signature that uniquely identify Relnode to break the tie? So the order will be <cost(rowCount, cpuCost, ioCost), Relnode.signature>.

-jiandan


> On Jun 14, 2017, at 11:41 AM, Julian Hyde <jh...@apache.org> wrote:
> 
> That does look like a bug. Can you log it please.
> 
> The reason for the “if (false)” is that we found that costs don’t work well if they form only a partial order, not a total order. If you have two RelNodes R1 and R2 in an equivalent set, and they have costs C1 and C2, and neither C1 <= C2 nor C2 <= C1 is true, which is the Volcano planner to pick? It will tend to pick the one that it saw first, and that is bad news because it is arbitrary and non-deterministic.
> 
> So, we should probably find a way to convert a RelOptCost to a totally-ordered value, such as by applying weights to cpu, io and memory cost and returning a double. (Anyone have a better idea?) Can you log a bug for that also.
> 
> Julian
> 
> 
>> On Jun 13, 2017, at 10:47 AM, JD Zheng <jd...@gmail.com> wrote:
>> 
>> Hi, 
>> 
>> Our team currently uses calcite druid-adaptor to query data in druid. We found that at some cases when the limit is over 10 and the data set has lots of dimensions, the limit is not pushed down to druid.
>> 
>> We looked further at the cost calculation of the different plans, and found that the following code in Sort.java looks suspicious:
>> 
>> 
>>  @Override public RelOptCost computeSelfCost(RelOptPlanner planner,
>>      RelMetadataQuery mq) {
>>    // Higher cost if rows are wider discourages pushing a project through a
>>    // sort.
>>    double rowCount = mq.getRowCount(this);
>>    double bytesPerRow = getRowType().getFieldCount() * 4;
>>    return planner.getCostFactory().makeCost(
>>        Util.nLogN(rowCount) * bytesPerRow, rowCount, 0);
>> 
>> 
>> 
>> And the definition of makeCost is:
>> 
>> public interface RelOptCostFactory {
>>  /**
>>   * Creates a cost object.
>>   */
>>  RelOptCost makeCost(double rowCount, double cpu, double io);
>> 
>> 
>> 
>> So, the first parameter should be rowCount, the second is cpu. 
>> 
>> It seems that caller is feeding the wrong parameters.
>> 
>> Once we switch these two parameters, it works out fine: the limit is pushed down to the druid query.
>> 
>> 
>> Are we doing the right thing by switching the parameters? Is it a bug here or there’s any reason we feed the parameters this way?
>> 
>> 
>> 
>> 
>> By the way, we found some dead code in VolcanoCost.java
>> <PastedGraphic-1.tiff>
>> 
>> 
>> Does it mean that we don’t need to bother feed in the cpu cost and io cost, these costs should be somehow modeled in rowcounts?
>> 
>> 
>> Thanks,
>> 
>> -Jiandan
>

Re: Bug in Sort.computeSelfCost()?

Posted by Julian Hyde <jh...@apache.org>.

That does look like a bug. Can you log it please.

The reason for the “if (false)” is that we found that costs don’t work well if they form only a partial order, not a total order. If you have two RelNodes R1 and R2 in an equivalent set, and they have costs C1 and C2, and neither C1 <= C2 nor C2 <= C1 is true, which is the Volcano planner to pick? It will tend to pick the one that it saw first, and that is bad news because it is arbitrary and non-deterministic.

So, we should probably find a way to convert a RelOptCost to a totally-ordered value, such as by applying weights to cpu, io and memory cost and returning a double. (Anyone have a better idea?) Can you log a bug for that also.

Julian


> On Jun 13, 2017, at 10:47 AM, JD Zheng <jd...@gmail.com> wrote:
> 
> Hi, 
> 
> Our team currently uses calcite druid-adaptor to query data in druid. We found that at some cases when the limit is over 10 and the data set has lots of dimensions, the limit is not pushed down to druid.
> 
> We looked further at the cost calculation of the different plans, and found that the following code in Sort.java looks suspicious:
> 
>  
>   @Override public RelOptCost computeSelfCost(RelOptPlanner planner,
>       RelMetadataQuery mq) {
>     // Higher cost if rows are wider discourages pushing a project through a
>     // sort.
>     double rowCount = mq.getRowCount(this);
>     double bytesPerRow = getRowType().getFieldCount() * 4;
>     return planner.getCostFactory().makeCost(
>         Util.nLogN(rowCount) * bytesPerRow, rowCount, 0);
>  
> 
> 
> And the definition of makeCost is:
> 
> public interface RelOptCostFactory {
>   /**
>    * Creates a cost object.
>    */
>   RelOptCost makeCost(double rowCount, double cpu, double io);
> 
> 
> 
> So, the first parameter should be rowCount, the second is cpu. 
> 
> It seems that caller is feeding the wrong parameters.
> 
> Once we switch these two parameters, it works out fine: the limit is pushed down to the druid query.
> 
> 
> Are we doing the right thing by switching the parameters? Is it a bug here or there’s any reason we feed the parameters this way?
> 
> 
> 
> 
> By the way, we found some dead code in VolcanoCost.java
> <PastedGraphic-1.tiff>
> 
> 
> Does it mean that we don’t need to bother feed in the cpu cost and io cost, these costs should be somehow modeled in rowcounts?
> 
> 
> Thanks,
> 
> -Jiandan