You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/03/13 03:00:14 UTC
[GitHub] [druid] jihoonson opened a new pull request #10987: Improve SQL planner performance
jihoonson opened a new pull request #10987:
URL: https://github.com/apache/druid/pull/10987
### Description
This PR fixes 3 issues.
- Implicit EOF in the Antlr grammar file. This makes Antlr parser to think that something might be wrong when it sees EOF and [creates a `NoViableAltException` instance](https://github.com/antlr/antlr4/blob/master/runtime/Java/src/org/antlr/v4/runtime/atn/ParserATNSimulator.java#L467). This could be problematic for queries that Calcite needs to explore large space of candidate queries as the Antlr parser will create exceptions over again even though those exceptions are ignored. The below flame graph shows initializing exceptions was one of the most expensive parts for [the giant union query in SqlBenchmark](https://github.com/apache/druid/blob/master/benchmarks/src/test/java/org/apache/druid/benchmark/query/SqlBenchmark.java#L152). [Implicit EOF can also cause `NoViableAltException` to be thrown incorrectly sometimes](https://github.com/antlr/antlr4/issues/118). This PR adds explicit EOF in the grammar file.
![Screenshot from 2021-03-12 18-36-01](https://user-images.githubusercontent.com/2322288/111016287-d5fb1380-8361-11eb-812f-3e5d7cffeca6.png)
- `getClass().getSimpleName()` in `BottomUpTransform.checkedProcess()` can be expensive if it is called lots of times. This PR changes it to use `assert`.
- Druid can parse the same expressions over again while it explores the search space of candidate plans. This PR adds caching parsed expressions per query. This could also be useful when the search space is large. Caching is disabled by default.
Here is a flame graph after applying all these changes.
![Screenshot from 2021-03-05 17-30-02](https://user-images.githubusercontent.com/2322288/111016361-3d18c800-8362-11eb-97a2-2fc31ea8f6c4.png)
Finally, here is the comparison of `SqlBenchmark.planSql()` results. Only the query 19 (the giant union query) shows a noticeable improvement (~25%). However, in my testing after I quadrupled the query 19, the gain in planning time was smaller (~11%) than with the original query. This was because of too many candidate plans to explore. I think parallelizing query planning can help in this case, but didn't investigate that idea much because it should be done in Calcite.
```
master
Benchmark (query) (rowsPerSegment) (vectorize) Mode Cnt Score Error Units
SqlBenchmark.planSql 0 5 false avgt 15 0.563 ± 0.005 ms/op
SqlBenchmark.planSql 1 5 false avgt 15 0.603 ± 0.004 ms/op
SqlBenchmark.planSql 2 5 false avgt 15 0.662 ± 0.007 ms/op
SqlBenchmark.planSql 3 5 false avgt 15 0.808 ± 0.009 ms/op
SqlBenchmark.planSql 4 5 false avgt 15 1.156 ± 0.014 ms/op
SqlBenchmark.planSql 5 5 false avgt 15 1.291 ± 0.011 ms/op
SqlBenchmark.planSql 6 5 false avgt 15 1.504 ± 0.011 ms/op
SqlBenchmark.planSql 7 5 false avgt 15 1.270 ± 0.013 ms/op
SqlBenchmark.planSql 8 5 false avgt 15 2.044 ± 0.013 ms/op
SqlBenchmark.planSql 9 5 false avgt 15 1.982 ± 0.022 ms/op
SqlBenchmark.planSql 10 5 false avgt 15 0.707 ± 0.005 ms/op
SqlBenchmark.planSql 11 5 false avgt 15 0.741 ± 0.002 ms/op
SqlBenchmark.planSql 12 5 false avgt 15 0.624 ± 0.006 ms/op
SqlBenchmark.planSql 13 5 false avgt 15 0.874 ± 0.007 ms/op
SqlBenchmark.planSql 14 5 false avgt 15 0.980 ± 0.008 ms/op
SqlBenchmark.planSql 15 5 false avgt 15 0.628 ± 0.007 ms/op
SqlBenchmark.planSql 16 5 false avgt 15 0.745 ± 0.005 ms/op
SqlBenchmark.planSql 17 5 false avgt 15 0.956 ± 0.009 ms/op
SqlBenchmark.planSql 18 5 false avgt 15 1.078 ± 0.011 ms/op
SqlBenchmark.planSql 19 5 false avgt 15 129.221 ± 0.725 ms/op
patch
Benchmark (query) (rowsPerSegment) (vectorize) Mode Cnt Score Error Units
SqlBenchmark.planSql 0 5 false avgt 15 0.541 ± 0.003 ms/op
SqlBenchmark.planSql 1 5 false avgt 15 0.608 ± 0.004 ms/op
SqlBenchmark.planSql 2 5 false avgt 15 0.650 ± 0.003 ms/op
SqlBenchmark.planSql 3 5 false avgt 15 0.700 ± 0.008 ms/op
SqlBenchmark.planSql 4 5 false avgt 15 1.131 ± 0.007 ms/op
SqlBenchmark.planSql 5 5 false avgt 15 1.121 ± 0.004 ms/op
SqlBenchmark.planSql 6 5 false avgt 15 1.315 ± 0.015 ms/op
SqlBenchmark.planSql 7 5 false avgt 15 1.166 ± 0.011 ms/op
SqlBenchmark.planSql 8 5 false avgt 15 1.769 ± 0.014 ms/op
SqlBenchmark.planSql 9 5 false avgt 15 1.587 ± 0.017 ms/op
SqlBenchmark.planSql 10 5 false avgt 15 0.710 ± 0.003 ms/op
SqlBenchmark.planSql 11 5 false avgt 15 0.732 ± 0.005 ms/op
SqlBenchmark.planSql 12 5 false avgt 15 0.584 ± 0.005 ms/op
SqlBenchmark.planSql 13 5 false avgt 15 0.801 ± 0.005 ms/op
SqlBenchmark.planSql 14 5 false avgt 15 0.894 ± 0.005 ms/op
SqlBenchmark.planSql 15 5 false avgt 15 0.583 ± 0.007 ms/op
SqlBenchmark.planSql 16 5 false avgt 15 0.691 ± 0.003 ms/op
SqlBenchmark.planSql 17 5 false avgt 15 0.820 ± 0.005 ms/op
SqlBenchmark.planSql 18 5 false avgt 15 0.937 ± 0.007 ms/op
SqlBenchmark.planSql 19 5 false avgt 15 97.453 ± 0.540 ms/op
```
<hr>
##### Key changed/added classes in this PR
* `Expr.g4`
* `BottomUpTransform`
* `CachingExprParser`
<hr>
<!-- Check the items by putting "x" in the brackets for the done things. Not all of these items apply to every PR. Remove the items which are not done or not relevant to the PR. None of the items from the checklist below are strictly necessary, but it would be very helpful if you at least self-review the PR. -->
This PR has:
- [x] been self-reviewed.
- [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
- [x] added documentation for new or modified features or behaviors.
- [x] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
- [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
- [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
- [x] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
- [ ] added integration tests.
- [ ] been tested in a test Druid cluster.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] jokeyee commented on pull request #10987: Improve SQL planner performance
Posted by GitBox <gi...@apache.org>.
jokeyee commented on pull request #10987:
URL: https://github.com/apache/druid/pull/10987#issuecomment-802628706
@jihoonson
> ##`就是`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] jokeyee commented on pull request #10987: Improve SQL planner performance
Posted by GitBox <gi...@apache.org>.
jokeyee commented on pull request #10987:
URL: https://github.com/apache/druid/pull/10987#issuecomment-802765291
**### _
-
- [ ] 1. ss
_**
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] jokeyee removed a comment on pull request #10987: Improve SQL planner performance
Posted by GitBox <gi...@apache.org>.
jokeyee removed a comment on pull request #10987:
URL: https://github.com/apache/druid/pull/10987#issuecomment-802765291
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] jokeyee edited a comment on pull request #10987: Improve SQL planner performance
Posted by GitBox <gi...@apache.org>.
jokeyee edited a comment on pull request #10987:
URL: https://github.com/apache/druid/pull/10987#issuecomment-802765291
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] jihoonson commented on pull request #10987: Caching parsed expressions in sql planner
Posted by GitBox <gi...@apache.org>.
jihoonson commented on pull request #10987:
URL: https://github.com/apache/druid/pull/10987#issuecomment-808949183
I created https://github.com/apache/druid/pull/11041 so that we can focus on caching expressions in this PR.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org