You are viewing a plain text version of this content. The canonical link for it is here.

Posted to notifications@shardingsphere.apache.org by GitBox <gi...@apache.org> on 2020/11/18 09:38:35 UTC

[GitHub] [shardingsphere] wu-sheng opened a new issue #8208: Benchmark(local) result of SQLParserEngine(MySQL)

wu-sheng opened a new issue #8208:
URL: https://github.com/apache/shardingsphere/issues/8208


   Hi @tristaZero , as promised, I run some tests about the performance of SQL Parser. I choose the MySQL grammar randomly.
   
   Here is the tested SQLs, with the codes I run in the tests.
   ```java
   public class SQLParseBenchmark {
       @Test
       @Benchmark
       public void parseSimpleSQLPerf() {
           String sqlStatement = "select * from table_1 where col = ?";
           ParseTree tree = new SQLParserEngine("MySQL").parse(sqlStatement, false);
       }
   
       @Test
       @Benchmark
       public void parseComplexSQLPerf() {
           String sqlStatement = "SELECT \n" +
               "    productname, \n" +
               "    buyprice\n" +
               "FROM\n" +
               "    products p1\n" +
               "WHERE\n" +
               "    buyprice > (SELECT \n" +
               "            AVG(buyprice)\n" +
               "        FROM\n" +
               "            products\n" +
               "        WHERE\n" +
               "            productline = p1.productline)";
           ParseTree tree = new SQLParserEngine("MySQL").parse(sqlStatement, false);
       }
   
       @Test
       @Benchmark
       public void parseComplexSQL2Perf() {
           String sqlStatement = "SELECT \n" +
               "    customerNumber, \n" +
               "    customerName\n" +
               "FROM\n" +
               "    customers\n" +
               "WHERE\n" +
               "    EXISTS( SELECT \n" +
               "            orderNumber, SUM(priceEach * quantityOrdered)\n" +
               "        FROM\n" +
               "            orderdetails\n" +
               "                INNER JOIN\n" +
               "            orders USING (orderNumber)\n" +
               "        WHERE\n" +
               "            customerNumber = customers.customerNumber\n" +
               "        GROUP BY orderNumber\n" +
               "        HAVING SUM(priceEach * quantityOrdered) > 60000);";
           ParseTree tree = new SQLParserEngine("MySQL").parse(sqlStatement, false);
       }
   }
   ```
   
   # Single Thread Case W/O Cache
   ```
   Benchmark                                Mode  Cnt       Score       Error  Units
   SQLParseBenchmark.parseSimpleSQLPerf    thrpt    5  200801.158 ± 28973.845  ops/s
   SQLParseBenchmark.parseComplexSQLPerf   thrpt    5   56151.571 ±  3432.966  ops/s
   SQLParseBenchmark.parseComplexSQL2Perf  thrpt    5   34157.511 ±  3750.865  ops/s
   ```
   The length of SQL affects the performance pretty much, drop from 200k to 34k, with the complexity of SQLs. And have 10%+- fluctuation.
   
   # Active Cache With 1k/100k possible SQLs
   I simply added a variable `I` to provide 1k different SQL samples to warm up the cache like this.
   ```java
   public class SQLParseBenchmark {
       public static int POSSIBILITY = 1000;
       public static int I = 1;
   
       @Test
       @Benchmark
       public void parseSimpleSQLPerf() {
           String sqlStatement = "select * from table_1 where col = " + I;
           ParseTree tree = new SQLParserEngine("MySQL").parse(sqlStatement, true);
           if (I > POSSIBILITY) {
               I = 1;
           } else {
               I++;
           }
       }
   
       @Test
       @Benchmark
       public void parseComplexSQLPerf() {
           String sqlStatement = "SELECT \n" +
               "    productname, \n" +
               "    buyprice\n" +
               "FROM\n" +
               "    products p1" + I +
               " WHERE\n" +
               "    buyprice > (SELECT \n" +
               "            AVG(buyprice)\n" +
               "        FROM\n" +
               "            products\n" +
               "        WHERE\n" +
               "            productline = p1.productline)";
           ParseTree tree = new SQLParserEngine("MySQL").parse(sqlStatement, true);
           if (I > POSSIBILITY) {
               I = 1;
           } else {
               I++;
           }
       }
   
       @Test
       @Benchmark
       public void parseComplexSQL2Perf() {
           String sqlStatement = "SELECT \n" +
               "    customerNumber, \n" +
               "    customerName\n" +
               "FROM\n" +
               "    customers" + I +
               " WHERE\n" +
               "    EXISTS( SELECT \n" +
               "            orderNumber, SUM(priceEach * quantityOrdered)\n" +
               "        FROM\n" +
               "            orderdetails\n" +
               "                INNER JOIN\n" +
               "            orders USING (orderNumber)\n" +
               "        WHERE\n" +
               "            customerNumber = customers.customerNumber\n" +
               "        GROUP BY orderNumber\n" +
               "        HAVING SUM(priceEach * quantityOrdered) > 60000);";
           ParseTree tree = new SQLParserEngine("MySQL").parse(sqlStatement, true);
           if (I > POSSIBILITY) {
               I = 1;
           } else {
               I++;
           }
       }
   
       public static void main(String[] args) throws RunnerException {
           Options opt = new OptionsBuilder().include(SQLParseBenchmark.class.getSimpleName())
                                             .forks(1)
                                             .warmupIterations(3)
                                             .threads(1)
                                             .syncIterations(false)
                                             .measurementIterations(5)
                                             .build();
   
           new Runner(opt).run();
       }
   }
   ```
   
   ```
   Benchmark                                Mode  Cnt        Score        Error  Units
   SQLParseBenchmark.parseSimpleSQLPerf    thrpt    5  5055988.471 ± 900336.628  ops/s
   SQLParseBenchmark.parseComplexSQLPerf   thrpt    5  2760262.638 ± 395728.318  ops/s
   SQLParseBenchmark.parseComplexSQL2Perf  thrpt    5  1686125.536 ± 476530.762  ops/s
   ```
   The performance increases clearly, for the most complex SQL case, it increases from 34k to 1.6m, nearly 47x.
   
   Then I change the possibility of SQLs to 100k, to make LRU cache not very effective, interesting things happened. 
   ```
   Benchmark                                Mode  Cnt       Score       Error  Units
   SQLParseBenchmark.parseSimpleSQLPerf    thrpt    5  126939.502 ± 31545.825  ops/s
   SQLParseBenchmark.parseComplexSQLPerf   thrpt    5   28216.660 ± 11769.703  ops/s
   SQLParseBenchmark.parseComplexSQL2Perf  thrpt    5   18382.000 ±  6581.555  ops/s
   ```
   If the LRU cache doesn't work efficiently, it is worse than no cache. 34k down to 18k, nearly -50%
   
   # Concurrency Tests
   I switch to the concurrency cases, first, LRU cache works(1k SQL samples), and activate 6 threads
   ```
   SQLParseBenchmark.parseSimpleSQLPerf    thrpt    5  10535380.660 ±  417418.673  ops/s
   SQLParseBenchmark.parseComplexSQLPerf   thrpt    5   7126066.718 ± 1634052.358  ops/s
   SQLParseBenchmark.parseComplexSQL2Perf  thrpt    5   6251270.156 ± 1361498.623  ops/s
   ```
   168k -> 625k, 4x faster in 6 threads(My Local Laptop is 4Core Intel Core i7, MacBook Pro (15-inch, 2017)).
   
   Then make LRU fails again(100k SQL samples), and still activate 6 threads.
   ```
   SQLParseBenchmark.parseSimpleSQLPerf    thrpt    5  159424.392 ± 66291.903  ops/s
   SQLParseBenchmark.parseComplexSQLPerf   thrpt    5   42207.768 ±  9511.269  ops/s
   SQLParseBenchmark.parseComplexSQL2Perf  thrpt    5   27640.844 ± 11868.306  ops/s
   ```
   Only 18k -> 27k.
   
   # One More
   I am just curious about the performance of no LRU in the concurrency case, then I keep the 6 threads and close the LRU
   ```
   Benchmark                                Mode  Cnt       Score       Error  Units
   SQLParseBenchmark.parseSimpleSQLPerf    thrpt    5  572351.472 ± 27852.061  ops/s
   SQLParseBenchmark.parseComplexSQLPerf   thrpt    5  164481.782 ± 24937.811  ops/s
   SQLParseBenchmark.parseComplexSQL2Perf  thrpt    5   88742.504 ± 81617.211  ops/s
   ```
   The most interesting thing happened, it could provide 88k, comparing to 27k with LRU ON but with large data set.
   
   # Conclusion
   The concurrent performance makes me a little concerned (not a block) to use this in the trace analysis core. Because, the performance of the OAP backend(one node) could be 10k+ segments/s very easily, in each segment, there could be 5-10 SQL statements normally. And the execution of the analysis would be concurrently for sure. 
   So, the current performance seems an impact on the current status. 
   
   Could you have a deeper discussion about what we could do to improve this?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [shardingsphere] wu-sheng commented on issue #8208: Benchmark(local) result of SQLParserEngine(MySQL)

Posted by GitBox <gi...@apache.org>.

wu-sheng commented on issue #8208:
URL: https://github.com/apache/shardingsphere/issues/8208#issuecomment-729752442


   > but the most popular scenario is using same SQL pattern with parameter marker which is most suitable for using cache
   
   Yes, I just tested for the worse case. SkyWalking as an APM facing countless strange performance issues. We are trying to avoid to be the next one.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [shardingsphere] terrymanu commented on issue #8208: Benchmark(local) result of SQLParserEngine(MySQL)

Posted by GitBox <gi...@apache.org>.

terrymanu commented on issue #8208:
URL: https://github.com/apache/shardingsphere/issues/8208#issuecomment-729748169


   Thank you for the great performance test report.
   The performance of parse engine can spilt for using cache and without cache. 
   It is true for low performance if missing cache hit, but the most popular scenario is using same SQL pattern with parameter marker which is most suitable for using cache.
   So we can talk about improve the performance for cache hit. We are using a static guava cache for whole SQL parse result. firstly, I will try to improve the performance for it.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [shardingsphere] jingshanglu commented on issue #8208: Benchmark(local) result of SQLParserEngine(MySQL)

Posted by GitBox <gi...@apache.org>.

jingshanglu commented on issue #8208:
URL: https://github.com/apache/shardingsphere/issues/8208#issuecomment-737629944


   @wu-sheng Now,cache in  parser engine is not static, and create a new API to permit user customize the initial size of it. So, creating multiple parser engines will lead to a linear improvement in performance.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org