You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@shardingsphere.apache.org by GitBox <gi...@apache.org> on 2020/11/18 09:38:35 UTC
[GitHub] [shardingsphere] wu-sheng opened a new issue #8208: Benchmark(local) result of SQLParserEngine(MySQL)
wu-sheng opened a new issue #8208:
URL: https://github.com/apache/shardingsphere/issues/8208
Hi @tristaZero , as promised, I run some tests about the performance of SQL Parser. I choose the MySQL grammar randomly.
Here is the tested SQLs, with the codes I run in the tests.
```java
public class SQLParseBenchmark {
@Test
@Benchmark
public void parseSimpleSQLPerf() {
String sqlStatement = "select * from table_1 where col = ?";
ParseTree tree = new SQLParserEngine("MySQL").parse(sqlStatement, false);
}
@Test
@Benchmark
public void parseComplexSQLPerf() {
String sqlStatement = "SELECT \n" +
" productname, \n" +
" buyprice\n" +
"FROM\n" +
" products p1\n" +
"WHERE\n" +
" buyprice > (SELECT \n" +
" AVG(buyprice)\n" +
" FROM\n" +
" products\n" +
" WHERE\n" +
" productline = p1.productline)";
ParseTree tree = new SQLParserEngine("MySQL").parse(sqlStatement, false);
}
@Test
@Benchmark
public void parseComplexSQL2Perf() {
String sqlStatement = "SELECT \n" +
" customerNumber, \n" +
" customerName\n" +
"FROM\n" +
" customers\n" +
"WHERE\n" +
" EXISTS( SELECT \n" +
" orderNumber, SUM(priceEach * quantityOrdered)\n" +
" FROM\n" +
" orderdetails\n" +
" INNER JOIN\n" +
" orders USING (orderNumber)\n" +
" WHERE\n" +
" customerNumber = customers.customerNumber\n" +
" GROUP BY orderNumber\n" +
" HAVING SUM(priceEach * quantityOrdered) > 60000);";
ParseTree tree = new SQLParserEngine("MySQL").parse(sqlStatement, false);
}
}
```
# Single Thread Case W/O Cache
```
Benchmark Mode Cnt Score Error Units
SQLParseBenchmark.parseSimpleSQLPerf thrpt 5 200801.158 ± 28973.845 ops/s
SQLParseBenchmark.parseComplexSQLPerf thrpt 5 56151.571 ± 3432.966 ops/s
SQLParseBenchmark.parseComplexSQL2Perf thrpt 5 34157.511 ± 3750.865 ops/s
```
The length of SQL affects the performance pretty much, drop from 200k to 34k, with the complexity of SQLs. And have 10%+- fluctuation.
# Active Cache With 1k/100k possible SQLs
I simply added a variable `I` to provide 1k different SQL samples to warm up the cache like this.
```java
public class SQLParseBenchmark {
public static int POSSIBILITY = 1000;
public static int I = 1;
@Test
@Benchmark
public void parseSimpleSQLPerf() {
String sqlStatement = "select * from table_1 where col = " + I;
ParseTree tree = new SQLParserEngine("MySQL").parse(sqlStatement, true);
if (I > POSSIBILITY) {
I = 1;
} else {
I++;
}
}
@Test
@Benchmark
public void parseComplexSQLPerf() {
String sqlStatement = "SELECT \n" +
" productname, \n" +
" buyprice\n" +
"FROM\n" +
" products p1" + I +
" WHERE\n" +
" buyprice > (SELECT \n" +
" AVG(buyprice)\n" +
" FROM\n" +
" products\n" +
" WHERE\n" +
" productline = p1.productline)";
ParseTree tree = new SQLParserEngine("MySQL").parse(sqlStatement, true);
if (I > POSSIBILITY) {
I = 1;
} else {
I++;
}
}
@Test
@Benchmark
public void parseComplexSQL2Perf() {
String sqlStatement = "SELECT \n" +
" customerNumber, \n" +
" customerName\n" +
"FROM\n" +
" customers" + I +
" WHERE\n" +
" EXISTS( SELECT \n" +
" orderNumber, SUM(priceEach * quantityOrdered)\n" +
" FROM\n" +
" orderdetails\n" +
" INNER JOIN\n" +
" orders USING (orderNumber)\n" +
" WHERE\n" +
" customerNumber = customers.customerNumber\n" +
" GROUP BY orderNumber\n" +
" HAVING SUM(priceEach * quantityOrdered) > 60000);";
ParseTree tree = new SQLParserEngine("MySQL").parse(sqlStatement, true);
if (I > POSSIBILITY) {
I = 1;
} else {
I++;
}
}
public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder().include(SQLParseBenchmark.class.getSimpleName())
.forks(1)
.warmupIterations(3)
.threads(1)
.syncIterations(false)
.measurementIterations(5)
.build();
new Runner(opt).run();
}
}
```
```
Benchmark Mode Cnt Score Error Units
SQLParseBenchmark.parseSimpleSQLPerf thrpt 5 5055988.471 ± 900336.628 ops/s
SQLParseBenchmark.parseComplexSQLPerf thrpt 5 2760262.638 ± 395728.318 ops/s
SQLParseBenchmark.parseComplexSQL2Perf thrpt 5 1686125.536 ± 476530.762 ops/s
```
The performance increases clearly, for the most complex SQL case, it increases from 34k to 1.6m, nearly 47x.
Then I change the possibility of SQLs to 100k, to make LRU cache not very effective, interesting things happened.
```
Benchmark Mode Cnt Score Error Units
SQLParseBenchmark.parseSimpleSQLPerf thrpt 5 126939.502 ± 31545.825 ops/s
SQLParseBenchmark.parseComplexSQLPerf thrpt 5 28216.660 ± 11769.703 ops/s
SQLParseBenchmark.parseComplexSQL2Perf thrpt 5 18382.000 ± 6581.555 ops/s
```
If the LRU cache doesn't work efficiently, it is worse than no cache. 34k down to 18k, nearly -50%
# Concurrency Tests
I switch to the concurrency cases, first, LRU cache works(1k SQL samples), and activate 6 threads
```
SQLParseBenchmark.parseSimpleSQLPerf thrpt 5 10535380.660 ± 417418.673 ops/s
SQLParseBenchmark.parseComplexSQLPerf thrpt 5 7126066.718 ± 1634052.358 ops/s
SQLParseBenchmark.parseComplexSQL2Perf thrpt 5 6251270.156 ± 1361498.623 ops/s
```
168k -> 625k, 4x faster in 6 threads(My Local Laptop is 4Core Intel Core i7, MacBook Pro (15-inch, 2017)).
Then make LRU fails again(100k SQL samples), and still activate 6 threads.
```
SQLParseBenchmark.parseSimpleSQLPerf thrpt 5 159424.392 ± 66291.903 ops/s
SQLParseBenchmark.parseComplexSQLPerf thrpt 5 42207.768 ± 9511.269 ops/s
SQLParseBenchmark.parseComplexSQL2Perf thrpt 5 27640.844 ± 11868.306 ops/s
```
Only 18k -> 27k.
# One More
I am just curious about the performance of no LRU in the concurrency case, then I keep the 6 threads and close the LRU
```
Benchmark Mode Cnt Score Error Units
SQLParseBenchmark.parseSimpleSQLPerf thrpt 5 572351.472 ± 27852.061 ops/s
SQLParseBenchmark.parseComplexSQLPerf thrpt 5 164481.782 ± 24937.811 ops/s
SQLParseBenchmark.parseComplexSQL2Perf thrpt 5 88742.504 ± 81617.211 ops/s
```
The most interesting thing happened, it could provide 88k, comparing to 27k with LRU ON but with large data set.
# Conclusion
The concurrent performance makes me a little concerned (not a block) to use this in the trace analysis core. Because, the performance of the OAP backend(one node) could be 10k+ segments/s very easily, in each segment, there could be 5-10 SQL statements normally. And the execution of the analysis would be concurrently for sure.
So, the current performance seems an impact on the current status.
Could you have a deeper discussion about what we could do to improve this?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [shardingsphere] wu-sheng commented on issue #8208: Benchmark(local) result of SQLParserEngine(MySQL)
Posted by GitBox <gi...@apache.org>.
wu-sheng commented on issue #8208:
URL: https://github.com/apache/shardingsphere/issues/8208#issuecomment-729752442
> but the most popular scenario is using same SQL pattern with parameter marker which is most suitable for using cache
Yes, I just tested for the worse case. SkyWalking as an APM facing countless strange performance issues. We are trying to avoid to be the next one.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [shardingsphere] terrymanu commented on issue #8208: Benchmark(local) result of SQLParserEngine(MySQL)
Posted by GitBox <gi...@apache.org>.
terrymanu commented on issue #8208:
URL: https://github.com/apache/shardingsphere/issues/8208#issuecomment-729748169
Thank you for the great performance test report.
The performance of parse engine can spilt for using cache and without cache.
It is true for low performance if missing cache hit, but the most popular scenario is using same SQL pattern with parameter marker which is most suitable for using cache.
So we can talk about improve the performance for cache hit. We are using a static guava cache for whole SQL parse result. firstly, I will try to improve the performance for it.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [shardingsphere] jingshanglu commented on issue #8208: Benchmark(local) result of SQLParserEngine(MySQL)
Posted by GitBox <gi...@apache.org>.
jingshanglu commented on issue #8208:
URL: https://github.com/apache/shardingsphere/issues/8208#issuecomment-737629944
@wu-sheng Now,cache in parser engine is not static, and create a new API to permit user customize the initial size of it. So, creating multiple parser engines will lead to a linear improvement in performance.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org