You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/03/13 03:00:14 UTC

[GitHub] [druid] jihoonson opened a new pull request #10987: Improve SQL planner performance

jihoonson opened a new pull request #10987:
URL: https://github.com/apache/druid/pull/10987


   ### Description
   
   This PR fixes 3 issues.
   
   - Implicit EOF in the Antlr grammar file. This makes Antlr parser to think that something might be wrong when it sees EOF and [creates a `NoViableAltException` instance](https://github.com/antlr/antlr4/blob/master/runtime/Java/src/org/antlr/v4/runtime/atn/ParserATNSimulator.java#L467). This could be problematic for queries that Calcite needs to explore large space of candidate queries as the Antlr parser will create exceptions over again even though those exceptions are ignored. The below flame graph shows initializing exceptions was one of the most expensive parts for [the giant union query in SqlBenchmark](https://github.com/apache/druid/blob/master/benchmarks/src/test/java/org/apache/druid/benchmark/query/SqlBenchmark.java#L152). [Implicit EOF can also cause `NoViableAltException` to be thrown incorrectly sometimes](https://github.com/antlr/antlr4/issues/118). This PR adds explicit EOF in the grammar file.
   
   ![Screenshot from 2021-03-12 18-36-01](https://user-images.githubusercontent.com/2322288/111016287-d5fb1380-8361-11eb-812f-3e5d7cffeca6.png)
   
   - `getClass().getSimpleName()` in `BottomUpTransform.checkedProcess()` can be expensive if it is called lots of times. This PR changes it to use `assert`.
   - Druid can parse the same expressions over again while it explores the search space of candidate plans. This PR adds caching parsed expressions per query. This could also be useful when the search space is large. Caching is disabled by default.
   
   Here is a flame graph after applying all these changes.
   
   ![Screenshot from 2021-03-05 17-30-02](https://user-images.githubusercontent.com/2322288/111016361-3d18c800-8362-11eb-97a2-2fc31ea8f6c4.png)
   
   Finally, here is the comparison of `SqlBenchmark.planSql()` results. Only the query 19 (the giant union query) shows a noticeable improvement (~25%). However, in my testing after I quadrupled the query 19, the gain in planning time was smaller (~11%) than with the original query. This was because of too many candidate plans to explore. I think parallelizing query planning can help in this case, but didn't investigate that idea much because it should be done in Calcite.
   
   ```
   master
   
   Benchmark             (query)  (rowsPerSegment)  (vectorize)  Mode  Cnt    Score   Error  Units
   SqlBenchmark.planSql        0                 5        false  avgt   15    0.563 ± 0.005  ms/op
   SqlBenchmark.planSql        1                 5        false  avgt   15    0.603 ± 0.004  ms/op
   SqlBenchmark.planSql        2                 5        false  avgt   15    0.662 ± 0.007  ms/op
   SqlBenchmark.planSql        3                 5        false  avgt   15    0.808 ± 0.009  ms/op
   SqlBenchmark.planSql        4                 5        false  avgt   15    1.156 ± 0.014  ms/op
   SqlBenchmark.planSql        5                 5        false  avgt   15    1.291 ± 0.011  ms/op
   SqlBenchmark.planSql        6                 5        false  avgt   15    1.504 ± 0.011  ms/op
   SqlBenchmark.planSql        7                 5        false  avgt   15    1.270 ± 0.013  ms/op
   SqlBenchmark.planSql        8                 5        false  avgt   15    2.044 ± 0.013  ms/op
   SqlBenchmark.planSql        9                 5        false  avgt   15    1.982 ± 0.022  ms/op
   SqlBenchmark.planSql       10                 5        false  avgt   15    0.707 ± 0.005  ms/op
   SqlBenchmark.planSql       11                 5        false  avgt   15    0.741 ± 0.002  ms/op
   SqlBenchmark.planSql       12                 5        false  avgt   15    0.624 ± 0.006  ms/op
   SqlBenchmark.planSql       13                 5        false  avgt   15    0.874 ± 0.007  ms/op
   SqlBenchmark.planSql       14                 5        false  avgt   15    0.980 ± 0.008  ms/op
   SqlBenchmark.planSql       15                 5        false  avgt   15    0.628 ± 0.007  ms/op
   SqlBenchmark.planSql       16                 5        false  avgt   15    0.745 ± 0.005  ms/op
   SqlBenchmark.planSql       17                 5        false  avgt   15    0.956 ± 0.009  ms/op
   SqlBenchmark.planSql       18                 5        false  avgt   15    1.078 ± 0.011  ms/op
   SqlBenchmark.planSql       19                 5        false  avgt   15  129.221 ± 0.725  ms/op
   
   patch
   
   Benchmark             (query)  (rowsPerSegment)  (vectorize)  Mode  Cnt   Score   Error  Units
   SqlBenchmark.planSql        0                 5        false  avgt   15   0.541 ± 0.003  ms/op
   SqlBenchmark.planSql        1                 5        false  avgt   15   0.608 ± 0.004  ms/op
   SqlBenchmark.planSql        2                 5        false  avgt   15   0.650 ± 0.003  ms/op
   SqlBenchmark.planSql        3                 5        false  avgt   15   0.700 ± 0.008  ms/op
   SqlBenchmark.planSql        4                 5        false  avgt   15   1.131 ± 0.007  ms/op
   SqlBenchmark.planSql        5                 5        false  avgt   15   1.121 ± 0.004  ms/op
   SqlBenchmark.planSql        6                 5        false  avgt   15   1.315 ± 0.015  ms/op
   SqlBenchmark.planSql        7                 5        false  avgt   15   1.166 ± 0.011  ms/op
   SqlBenchmark.planSql        8                 5        false  avgt   15   1.769 ± 0.014  ms/op
   SqlBenchmark.planSql        9                 5        false  avgt   15   1.587 ± 0.017  ms/op
   SqlBenchmark.planSql       10                 5        false  avgt   15   0.710 ± 0.003  ms/op
   SqlBenchmark.planSql       11                 5        false  avgt   15   0.732 ± 0.005  ms/op
   SqlBenchmark.planSql       12                 5        false  avgt   15   0.584 ± 0.005  ms/op
   SqlBenchmark.planSql       13                 5        false  avgt   15   0.801 ± 0.005  ms/op
   SqlBenchmark.planSql       14                 5        false  avgt   15   0.894 ± 0.005  ms/op
   SqlBenchmark.planSql       15                 5        false  avgt   15   0.583 ± 0.007  ms/op
   SqlBenchmark.planSql       16                 5        false  avgt   15   0.691 ± 0.003  ms/op
   SqlBenchmark.planSql       17                 5        false  avgt   15   0.820 ± 0.005  ms/op
   SqlBenchmark.planSql       18                 5        false  avgt   15   0.937 ± 0.007  ms/op
   SqlBenchmark.planSql       19                 5        false  avgt   15  97.453 ± 0.540  ms/op
   ```
   
   <hr>
   
   ##### Key changed/added classes in this PR
    * `Expr.g4`
    * `BottomUpTransform`
    * `CachingExprParser`
   
   <hr>
   
   <!-- Check the items by putting "x" in the brackets for the done things. Not all of these items apply to every PR. Remove the items which are not done or not relevant to the PR. None of the items from the checklist below are strictly necessary, but it would be very helpful if you at least self-review the PR. -->
   
   This PR has:
   - [x] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [x] added documentation for new or modified features or behaviors.
   - [x] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jokeyee commented on pull request #10987: Improve SQL planner performance

Posted by GitBox <gi...@apache.org>.
jokeyee commented on pull request #10987:
URL: https://github.com/apache/druid/pull/10987#issuecomment-802628706


   @jihoonson 
   
   >   ##`就是`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jokeyee commented on pull request #10987: Improve SQL planner performance

Posted by GitBox <gi...@apache.org>.
jokeyee commented on pull request #10987:
URL: https://github.com/apache/druid/pull/10987#issuecomment-802765291


   **### _
   
   - 
   
   - [ ] 1. ss
   
   _**


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jokeyee removed a comment on pull request #10987: Improve SQL planner performance

Posted by GitBox <gi...@apache.org>.
jokeyee removed a comment on pull request #10987:
URL: https://github.com/apache/druid/pull/10987#issuecomment-802765291






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jokeyee edited a comment on pull request #10987: Improve SQL planner performance

Posted by GitBox <gi...@apache.org>.
jokeyee edited a comment on pull request #10987:
URL: https://github.com/apache/druid/pull/10987#issuecomment-802765291






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on pull request #10987: Caching parsed expressions in sql planner

Posted by GitBox <gi...@apache.org>.
jihoonson commented on pull request #10987:
URL: https://github.com/apache/druid/pull/10987#issuecomment-808949183


   I created https://github.com/apache/druid/pull/11041 so that we can focus on caching expressions in this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org