You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@superset.apache.org by GitBox <gi...@apache.org> on 2022/10/25 17:41:02 UTC

[GitHub] [superset] john-bodley opened a new pull request, #21936: fix: Crash caused by numpy.vectorize

john-bodley opened a new pull request, #21936:
URL: https://github.com/apache/superset/pull/21936

   <!---
   Please write the PR title following the conventions at https://www.conventionalcommits.org/en/v1.0.0/
   Example:
   fix(dashboard): load charts correctly
   -->
   
   ### SUMMARY
   
   We (Airbnb) has a user report an error where in SQL Lab a query would run for infinitum when the row limit was increased. The issue was the Celery worker crashed with the following error:
   
   ```
   WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL).
   ```
   
   It turns out the root cause was a call to `numpy.vectorize` function which per [here](https://stackoverflow.com/questions/7078371/how-to-avoid-enormous-additional-memory-consumption-when-using-numpy-vectorize) can consume copious amounts of memory. The `numpy.vectorize` function is only used once in the code base, and though there may be some slowdown, the fix was merely to un-vectorize the logic using a iterator per the Numpy [documentation](https://numpy.org/doc/stable/reference/arrays.nditer.html#modifying-array-values).
   
   <!--- Describe the change below, including rationale and design decisions -->
   
   ### BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
   <!--- Skip this if not applicable -->
   
   ### TESTING INSTRUCTIONS
   
   CI.
   
   ### ADDITIONAL INFORMATION
   <!--- Check any relevant boxes with "x" -->
   <!--- HINT: Include "Fixes #nnn" if you are fixing an existing issue -->
   - [ ] Has associated issue:
   - [ ] Required feature flags:
   - [ ] Changes UI
   - [ ] Includes DB Migration (follow approval process in [SIP-59](https://github.com/apache/superset/issues/13351))
     - [ ] Migration is atomic, supports rollback & is backwards-compatible
     - [ ] Confirm DB migration upgrade and downgrade tested
     - [ ] Runtime estimates and downtime expectations provided
   - [ ] Introduces new feature or API
   - [ ] Removes existing feature or API
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] john-bodley commented on a diff in pull request #21936: fix: Crash caused by numpy.vectorize

Posted by GitBox <gi...@apache.org>.
john-bodley commented on code in PR #21936:
URL: https://github.com/apache/superset/pull/21936#discussion_r1004825386


##########
superset/result_set.py:
##########
@@ -63,8 +63,14 @@ def stringify(obj: Any) -> str:
 
 
 def stringify_values(array: np.ndarray) -> np.ndarray:
-    vstringify = np.vectorize(stringify)
-    return vstringify(array)
+    result = np.copy(array)
+  
+     
+    with np.nditer(result, flags=["refs_ok"], op_flags=["readwrite"]) as it:
+        for obj in it:
+            obj[...] = stringify(obj)

Review Comment:
   I've never come across the `...` in the context of Numpy before, but this logic was lifted from their [official documentation](https://numpy.org/doc/stable/reference/arrays.nditer.html#modifying-array-values).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] john-bodley commented on a diff in pull request #21936: fix: Crash caused by numpy.vectorize

Posted by GitBox <gi...@apache.org>.
john-bodley commented on code in PR #21936:
URL: https://github.com/apache/superset/pull/21936#discussion_r1004824179


##########
superset/result_set.py:
##########
@@ -63,8 +63,12 @@ def stringify(obj: Any) -> str:
 
 
 def stringify_values(array: np.ndarray) -> np.ndarray:
-    vstringify = np.vectorize(stringify)
-    return vstringify(array)
+    result = np.copy(array)
+
+    for obj in np.nditer(result, flags=["refs_ok"], op_flags=["readwrite"]):
+        obj[...] = stringify(obj)

Review Comment:
   I've never come across the `...` in the context of Numpy before, but this logic was lifted from their [official documentation](https://numpy.org/doc/stable/reference/arrays.nditer.html#modifying-array-values) sans the `with` block.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] john-bodley commented on pull request #21936: fix: Crash caused by numpy.vectorize

Posted by GitBox <gi...@apache.org>.
john-bodley commented on PR #21936:
URL: https://github.com/apache/superset/pull/21936#issuecomment-1292708887

   @villebro, per your comment, 
   
   >  I assume this is stringifying a query result that contains lots of massive objects or something?
   
   Yes this was the case. The type in question was an array of string and it barfed when the row limit was increased from 10,000 to 100,000 records.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] john-bodley merged pull request #21936: fix: Crash caused by numpy.vectorize

Posted by GitBox <gi...@apache.org>.
john-bodley merged PR #21936:
URL: https://github.com/apache/superset/pull/21936


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] codecov[bot] commented on pull request #21936: fix: Crash caused by numpy.vectorize

Posted by GitBox <gi...@apache.org>.
codecov[bot] commented on PR #21936:
URL: https://github.com/apache/superset/pull/21936#issuecomment-1291103163

   # [Codecov](https://codecov.io/gh/apache/superset/pull/21936?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#21936](https://codecov.io/gh/apache/superset/pull/21936?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (12d7a75) into [master](https://codecov.io/gh/apache/superset/commit/1388f21ee34251b6ef83beb009ba0901e4067848?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (1388f21) will **increase** coverage by `0.07%`.
   > The diff coverage is `100.00%`.
   
   > :exclamation: Current head 12d7a75 differs from pull request most recent head 83d8739. Consider uploading reports for the commit 83d8739 to get more accurate results
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #21936      +/-   ##
   ==========================================
   + Coverage   66.85%   66.92%   +0.07%     
   ==========================================
     Files        1807     1807              
     Lines       69190    69192       +2     
     Branches     7402     7402              
   ==========================================
   + Hits        46258    46309      +51     
   + Misses      21021    20972      -49     
     Partials     1911     1911              
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hive | `52.93% <100.00%> (?)` | |
   | mysql | `78.37% <100.00%> (+<0.01%)` | :arrow_up: |
   | postgres | `78.43% <100.00%> (-0.01%)` | :arrow_down: |
   | presto | `52.83% <100.00%> (+<0.01%)` | :arrow_up: |
   | python | `81.43% <100.00%> (+0.14%)` | :arrow_up: |
   | sqlite | `?` | |
   | unit | `51.08% <0.00%> (-0.01%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/superset/pull/21936?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [superset/result\_set.py](https://codecov.io/gh/apache/superset/pull/21936/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvcmVzdWx0X3NldC5weQ==) | `96.35% <100.00%> (-1.43%)` | :arrow_down: |
   | [superset/db\_engine\_specs/sqlite.py](https://codecov.io/gh/apache/superset/pull/21936/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGJfZW5naW5lX3NwZWNzL3NxbGl0ZS5weQ==) | `89.28% <0.00%> (-7.15%)` | :arrow_down: |
   | [superset/reports/commands/log\_prune.py](https://codecov.io/gh/apache/superset/pull/21936/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvcmVwb3J0cy9jb21tYW5kcy9sb2dfcHJ1bmUucHk=) | `85.71% <0.00%> (-3.58%)` | :arrow_down: |
   | [superset/utils/celery.py](https://codecov.io/gh/apache/superset/pull/21936/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvdXRpbHMvY2VsZXJ5LnB5) | `86.20% <0.00%> (-3.45%)` | :arrow_down: |
   | [superset/connectors/sqla/utils.py](https://codecov.io/gh/apache/superset/pull/21936/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY29ubmVjdG9ycy9zcWxhL3V0aWxzLnB5) | `88.23% <0.00%> (-1.97%)` | :arrow_down: |
   | [superset/commands/importers/v1/utils.py](https://codecov.io/gh/apache/superset/pull/21936/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY29tbWFuZHMvaW1wb3J0ZXJzL3YxL3V0aWxzLnB5) | `92.20% <0.00%> (-1.30%)` | :arrow_down: |
   | [superset/views/core.py](https://codecov.io/gh/apache/superset/pull/21936/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvdmlld3MvY29yZS5weQ==) | `75.45% <0.00%> (-0.61%)` | :arrow_down: |
   | [superset/common/query\_object.py](https://codecov.io/gh/apache/superset/pull/21936/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY29tbW9uL3F1ZXJ5X29iamVjdC5weQ==) | `93.87% <0.00%> (-0.52%)` | :arrow_down: |
   | [superset/db\_engine\_specs/base.py](https://codecov.io/gh/apache/superset/pull/21936/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGJfZW5naW5lX3NwZWNzL2Jhc2UucHk=) | `89.38% <0.00%> (ø)` | |
   | ... and [4 more](https://codecov.io/gh/apache/superset/pull/21936/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] john-bodley commented on a diff in pull request #21936: fix: Crash caused by numpy.vectorize

Posted by GitBox <gi...@apache.org>.
john-bodley commented on code in PR #21936:
URL: https://github.com/apache/superset/pull/21936#discussion_r1004824179


##########
superset/result_set.py:
##########
@@ -63,8 +63,12 @@ def stringify(obj: Any) -> str:
 
 
 def stringify_values(array: np.ndarray) -> np.ndarray:
-    vstringify = np.vectorize(stringify)
-    return vstringify(array)
+    result = np.copy(array)
+
+    for obj in np.nditer(result, flags=["refs_ok"], op_flags=["readwrite"]):
+        obj[...] = stringify(obj)

Review Comment:
   I've never come across the `...` in the context of Numpy before, but this logic was lifted from their [official documentation](https://numpy.org/doc/stable/reference/arrays.nditer.html#modifying-array-values) sans the `with` block.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org