You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@superset.apache.org by GitBox <gi...@apache.org> on 2022/04/06 16:56:38 UTC

[GitHub] [superset] dvchristianbors opened a new pull request, #19572: Disable sqlparse calls for Chart Data requests to Improve Querying Performance

dvchristianbors opened a new pull request, #19572:
URL: https://github.com/apache/superset/pull/19572

   <!---
   Please write the PR title following the conventions at https://www.conventionalcommits.org/en/v1.0.0/
   Example:
   fix(dashboard): load charts correctly
   -->
   fix(chartdata): improve querying performance for long `where ... in (...)` statements.
   
   ### SUMMARY
   <!--- Describe the change below, including rationale and design decisions -->
   As described in https://github.com/andialbrecht/sqlparse/issues/621, calling `sqlparse.format` and `sqlparse.parse` functions with grouping queries will cause quadratic runtime, ending in very long computation times for queries with large statements, e.g., long `IN` statements with a large number of keys.
   
   In Chart Data, the sqlparse functions are needlessly called, this has several reasons:
   1. The ChartData component will automatically generate most of the structure of the final query. It is possible, however, that users can add custom SQLs for filters, columns, etc.) - However, validation of the query will also happen in `connectors/sqla/models/get_query_str_extended`, so if erroneos sql code is produced, we will still receive an error. 
   2. The structure of the finished SQL query will be valid in every case, and the query will also *never* consist of multiple consecutive queries. Hence, it is not necessary to parse the query for a possible sequence of multiple queries. (which happens in [models/core.py](https://github.com/apache/superset/blob/master/superset/models/core.py#L424).
   3. The SQL automatically created by [compile_sqla_query(sqlaq.sqla_query)](https://github.com/apache/superset/blob/master/superset/connectors/sqla/models.py#L798) will already result in a readable sql. Reformating the sql using `sqlparser.format(sql, reindent=True)` to reindent will only have a minor improvement in readability while sacrificing computation time.
   
   Looking at the issue #19567 , querying times vary significantly depending on `IN` clause key size. Omitting the `parse()` and `format()` functions from the code used by ChartData will solve this issue, without having any impediments.
   
   ### BEFORE/AFTER ANIMATED GIF
   #### Before
   ![superset_querying_performance_before](https://user-images.githubusercontent.com/84898946/162026777-517ee3fe-9874-497e-a412-3f82dd949c39.gif)
   
   #### After
   ![superset_querying_performance_after](https://user-images.githubusercontent.com/84898946/162026768-f267b9a4-b9ac-4d9e-b4f4-2f5739b2ff62.gif)
   
   ### TESTING INSTRUCTIONS
   Please see issue #19567 for detailed steps to reproduce.
   I also added a test case that showcases the computational penalty in [this commit](c6ea536aa581423dd6ce373a2a3b85062ef6f6cf)
   
   ### ADDITIONAL INFORMATION
   <!--- Check any relevant boxes with "x" -->
   <!--- HINT: Include "Fixes #nnn" if you are fixing an existing issue -->
   - [x] Has associated issue: #19567
   - [ ] Required feature flags:
   - [ ] Changes UI
   - [ ] Includes DB Migration (follow approval process in [SIP-59](https://github.com/apache/superset/issues/13351))
     - [ ] Migration is atomic, supports rollback & is backwards-compatible
     - [ ] Confirm DB migration upgrade and downgrade tested
     - [ ] Runtime estimates and downtime expectations provided
   - [ ] Introduces new feature or API
   - [ ] Removes existing feature or API
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


Re: [PR] fix(chartdata): disable sqlparse calls for chart data requests to improve querying performance [superset]

Posted by "rusackas (via GitHub)" <gi...@apache.org>.
rusackas commented on PR #19572:
URL: https://github.com/apache/superset/pull/19572#issuecomment-1946869210

   I was tempted to close the related issue as stale, but would love to know if anyone on this thread has interest in rebasing/rekindling this effort? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


Re: [PR] fix(chartdata): disable sqlparse calls for chart data requests to improve querying performance [superset]

Posted by "rumbin (via GitHub)" <gi...@apache.org>.
rumbin commented on PR #19572:
URL: https://github.com/apache/superset/pull/19572#issuecomment-1820512679

   Update on our investigations:
   
   * We could not successfully apply https://github.com/andialbrecht/sqlparse/pull/710.
   * Therefore, we resorted to applying the changes of this very PR and it does wonders in terms of performance improvement.
   
   Thank you @dvchristianbors for providing this patch.
   Now I only hope that this PR will be accepted in some way...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


Re: [PR] fix(chartdata): disable sqlparse calls for chart data requests to improve querying performance [superset]

Posted by "rumbin (via GitHub)" <gi...@apache.org>.
rumbin commented on PR #19572:
URL: https://github.com/apache/superset/pull/19572#issuecomment-1818892242

   We also ran into this performance issue lately. Having some rather complex queries on a dfashboard drastically increases the dashboard load time, as the inefficiency of sqlparse keeps the gunicorn workers busy. This, in-turn, causes queries of the dashboard to be queued until a worker is free again.
   So, all in all, the effect on dashboard load times is immense, unless we throw lots of gunicorn workers and many CPU cores at this problem.
   
   @villebro, given the fact that there is virtually no activity in the sqlparse project and considering @dvmarkusvogl's comment above, I feel that following the approach suggested in this PR is the best option we have.
   What do you think?
   
   PS: We'll try patching sqlparse by virtue of https://github.com/andialbrecht/sqlparse/pull/710 and report back...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


Re: [PR] fix(chartdata): disable sqlparse calls for chart data requests to improve querying performance [superset]

Posted by "john-bodley (via GitHub)" <gi...@apache.org>.
john-bodley commented on PR #19572:
URL: https://github.com/apache/superset/pull/19572#issuecomment-1961929319

   @dvchristianbors, @betodealmeida recently introduced [[SIP-117] Improve SQL parsing](https://github.com/apache/superset/issues/26786) which proposes that we use `sqlglot` for all of our SQL parsing. Rather than disabling the `sqlparse` calls (which likely serve some purpose)  to improve query performance, would you mind trying `sqlglot` instead?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


Re: [PR] fix(chartdata): disable sqlparse calls for chart data requests to improve querying performance [superset]

Posted by "WojtekWaga (via GitHub)" <gi...@apache.org>.
WojtekWaga commented on PR #19572:
URL: https://github.com/apache/superset/pull/19572#issuecomment-1820943345

   Hey @dvchristianbors, I applied this patch manually as there were some incompatible changes in the codebase and it helped. Time from clicking chart refresh to getting the query sent to db dropped significantly. Also show query is much faster now but (as expected) it shows uglier query now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


Re: [PR] fix(chartdata): disable sqlparse calls for chart data requests to improve querying performance [superset]

Posted by "dvchristianbors (via GitHub)" <gi...@apache.org>.
dvchristianbors commented on PR #19572:
URL: https://github.com/apache/superset/pull/19572#issuecomment-1959340327

   > Perfectly fine. Thanks again!
   
   Hello @rusackas, the PR was updated and is now ready for review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] villebro commented on pull request #19572: fix(chartdata): disable sqlparse calls for chart data requests to improve querying performance

Posted by GitBox <gi...@apache.org>.
villebro commented on PR #19572:
URL: https://github.com/apache/superset/pull/19572#issuecomment-1091543412

   Thanks for the contribution @dvchristianbors . While I agree this performance hit is very problematic, we currently rely heavily on `sqlparse` for some very critical validation logic. Therefore, some of the changes in this PR will not be possible in their current form.
   
   I looked into the original issue that you linked here, and noticed there seems to be open PRs on `sqlparse` that may address some of the observed performance issues. From our perspective, the optimal solution would be to fix the performance issues in the underlying library first, and only then optimize in Superset where needed. Out of curiosity, have you tested running Superset with a fork of `sqlparse` that has applied some of those open PRs? It would be interesting to see if they have a significant impact on performance..


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] dvmarkusvogl commented on pull request #19572: fix(chartdata): disable sqlparse calls for chart data requests to improve querying performance

Posted by GitBox <gi...@apache.org>.
dvmarkusvogl commented on PR #19572:
URL: https://github.com/apache/superset/pull/19572#issuecomment-1091847836

   > we currently rely heavily on sqlparse **for some very critical validation logic**
   SQL-Parses self-description:
   (pypi): sqlparse is a **non-validating** SQL parser for Python 
   (github) A **non-validating** SQL parser module for Python
   
   They seem very keen on not being used for validation.
   IMO the pythonic way would be to properly handle the database-response, and the fix above at least reduces the existing complexity.
   
   Also, the project had [3 days of activity](https://github.com/andialbrecht/sqlparse/commits/master) since October 2020, so I wouldn't expect much from them or rely to heavily from a project that hat 200+ open issues.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


Re: [PR] fix(chartdata): disable sqlparse calls for chart data requests to improve querying performance [superset]

Posted by "dvchristianbors (via GitHub)" <gi...@apache.org>.
dvchristianbors commented on PR #19572:
URL: https://github.com/apache/superset/pull/19572#issuecomment-1820736702

   > Update on our investigations:
   > 
   > * We could not successfully apply https://github.com/andialbrecht/sqlparse/pull/710.
   > * Therefore, we resorted to applying the changes of this very PR and it does wonders in terms of performance improvement.
   > 
   > Thank you @dvchristianbors for providing this patch.
   > Now I only hope that this PR will be accepted in some way...
   
   Thanks for letting us know. Did you test these changes with a recent branch? Our changes were made on a rather old version.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


Re: [PR] fix(chartdata): disable sqlparse calls for chart data requests to improve querying performance [superset]

Posted by "rusackas (via GitHub)" <gi...@apache.org>.
rusackas commented on PR #19572:
URL: https://github.com/apache/superset/pull/19572#issuecomment-1959999971

   Looks like the [pre-commit hooks](https://superset.apache.org/docs/contributing/hooks-and-linting/) need to be run to solve at least some (if not all) of the CI issues.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] dvchristianbors commented on pull request #19572: fix(chartdata): disable sqlparse calls for chart data requests to improve querying performance

Posted by GitBox <gi...@apache.org>.
dvchristianbors commented on PR #19572:
URL: https://github.com/apache/superset/pull/19572#issuecomment-1091813640

   > Thanks for the contribution @dvchristianbors . While I agree this performance hit is very problematic, we currently rely heavily on `sqlparse` for some very critical validation logic. Therefore, some of the changes in this PR will not be possible in their current form.
   > 
   > I looked into the original issue that you linked here, and noticed there seems to be open PRs on `sqlparse` that may address some of the observed performance issues. From our perspective, the optimal solution would be to fix the performance issues in the underlying library first, and only then optimize in Superset where needed. Out of curiosity, have you tested running Superset with a fork of `sqlparse` that has applied some of those open PRs? It would be interesting to see if they have a significant impact on performance..
   
   Thanks for your comments. I agree that this would be the best option. I forked sqlparse and merged the PRs which supposedly addressed the issues, however I cannot confirm that they did improve performance (at least for the tested use cases). 
   
   I will further investigate the issue in sqlparse itself, and propose new changes as soon as I found a better solution.
   @villebro Should I post my updates and revised solution here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


Re: [PR] fix(chartdata): disable sqlparse calls for chart data requests to improve querying performance [superset]

Posted by "dvchristianbors (via GitHub)" <gi...@apache.org>.
dvchristianbors commented on PR #19572:
URL: https://github.com/apache/superset/pull/19572#issuecomment-1946973193

   > I was tempted to close the related issue as stale, but would love to know if anyone on this thread has interest in rebasing/rekindling this effort?
   
   Yes, I would still be interested to finish this feature. If a rebase next week is teill fine.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


Re: [PR] fix(chartdata): disable sqlparse calls for chart data requests to improve querying performance [superset]

Posted by "rusackas (via GitHub)" <gi...@apache.org>.
rusackas commented on PR #19572:
URL: https://github.com/apache/superset/pull/19572#issuecomment-1947018941

   Perfectly fine. Thanks again!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


Re: [PR] fix(chartdata): disable sqlparse calls for chart data requests to improve querying performance [superset]

Posted by "rusackas (via GitHub)" <gi...@apache.org>.
rusackas commented on PR #19572:
URL: https://github.com/apache/superset/pull/19572#issuecomment-1961922058

   Approving CI run 🤞 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


Re: [PR] fix(chartdata): disable sqlparse calls for chart data requests to improve querying performance [superset]

Posted by "dvchristianbors (via GitHub)" <gi...@apache.org>.
dvchristianbors commented on PR #19572:
URL: https://github.com/apache/superset/pull/19572#issuecomment-1980276699

   > @dvchristianbors, @betodealmeida recently introduced [[SIP-117] Improve SQL parsing](https://github.com/apache/superset/issues/26786) which proposes that we use `sqlglot` for all of our SQL parsing. Rather than disabling the `sqlparse` calls (which likely serve some purpose) to improve query performance, would you mind trying `sqlglot` instead?
   
   I replaced the usage of sqlparse with sqlglot for splitting up multiple sql statements. However, sqlglot does not include the formatting function that was specifically removed due to performance issues here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


Re: [PR] fix(chartdata): disable sqlparse calls for chart data requests to improve querying performance [superset]

Posted by "betodealmeida (via GitHub)" <gi...@apache.org>.
betodealmeida commented on PR #19572:
URL: https://github.com/apache/superset/pull/19572#issuecomment-1981008138

   Note that it is possible to pretty-format a SQL query with sqlglot, but you need to know the dialect:
   
   https://github.com/apache/superset/pull/26767/files#diff-30f4c6ffdcb1f78a9e1ebbb60e1f297b379c181534d5a185a4cd37b1b16ac6f8R409
   
   @dvchristianbors do you mind leaving a TODO comment to tentatively re-add the SQL formatting once SIP-117 is merged? (I saw the PR is approved but has conflicts, I'll work on them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org