You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@superset.apache.org by GitBox <gi...@apache.org> on 2022/07/06 18:16:03 UTC

[GitHub] [superset] eschutho opened a new pull request, #20629: fix: pandas bug when data is blank on post-processing

eschutho opened a new pull request, #20629:
URL: https://github.com/apache/superset/pull/20629

   ### SUMMARY
   There's a bug in post processing for tables and pivot-table charts when the data is empty for both json and csv formats. We'll now just return the original results instead of trying to apply any post-processing on it.
   
   ### BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
   Currently raises a 500
   
   ### TESTING INSTRUCTIONS
   Create a chart with no results for a pivot table and then try to export pivoted results as csv. This would also break on alert/reports when formatting the report as a csv. 
   
   ### ADDITIONAL INFORMATION
   <!--- Check any relevant boxes with "x" -->
   <!--- HINT: Include "Fixes #nnn" if you are fixing an existing issue -->
   - [ ] Has associated issue:
   - [ ] Required feature flags:
   - [ ] Changes UI
   - [ ] Includes DB Migration (follow approval process in [SIP-59](https://github.com/apache/superset/issues/13351))
     - [ ] Migration is atomic, supports rollback & is backwards-compatible
     - [ ] Confirm DB migration upgrade and downgrade tested
     - [ ] Runtime estimates and downtime expectations provided
   - [ ] Introduces new feature or API
   - [ ] Removes existing feature or API
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] john-bodley commented on a diff in pull request #20629: fix: pandas bug when data is blank on post-processing

Posted by GitBox <gi...@apache.org>.
john-bodley commented on code in PR #20629:
URL: https://github.com/apache/superset/pull/20629#discussion_r915159759


##########
superset/charts/post_processing.py:
##########
@@ -325,9 +325,16 @@ def apply_post_process(
 
     for query in result["queries"]:
         if query["result_format"] == ChartDataResultFormat.JSON:
-            df = pd.DataFrame.from_dict(query["data"])
+            try:
+                df = pd.DataFrame.from_dict(query["data"])
+            except ValueError:  # no data error
+                return result

Review Comment:
   What is `query["data"]` in this case? Would it be preferable to first check whether `query["data"]` is valid? Per your unit tests it seems like this might be an empty string—the worst of the worst—and maybe we could/should fix this upstream and have it be `None`, i.e., the following works:
   
   ```python
   >>> import pandas as pd
   >>> pd.DataFrame.from_dict(None)
   Empty DataFrame
   Columns: []
   Index: []
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] eschutho commented on a diff in pull request #20629: fix: pandas bug when data is blank on post-processing

Posted by GitBox <gi...@apache.org>.
eschutho commented on code in PR #20629:
URL: https://github.com/apache/superset/pull/20629#discussion_r916347837


##########
superset/charts/post_processing.py:
##########
@@ -325,9 +325,16 @@ def apply_post_process(
 
     for query in result["queries"]:
         if query["result_format"] == ChartDataResultFormat.JSON:
-            df = pd.DataFrame.from_dict(query["data"])
+            try:
+                df = pd.DataFrame.from_dict(query["data"])
+            except ValueError:  # no data error
+                return result

Review Comment:
   @john-bodley I tested out None and empty string with both the CSV and JSON formats, and JSON will error for both, and CSV only on `''` as you noted. I'm concerned that that logic of checking for validity could get even more complicated if the code changes in any way. WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] betodealmeida commented on a diff in pull request #20629: fix: pandas bug when data is blank on post-processing

Posted by GitBox <gi...@apache.org>.
betodealmeida commented on code in PR #20629:
URL: https://github.com/apache/superset/pull/20629#discussion_r915131543


##########
superset/charts/post_processing.py:
##########
@@ -325,9 +325,16 @@ def apply_post_process(
 
     for query in result["queries"]:
         if query["result_format"] == ChartDataResultFormat.JSON:
-            df = pd.DataFrame.from_dict(query["data"])
+            try:
+                df = pd.DataFrame.from_dict(query["data"])
+            except ValueError:  # no data error
+                return result

Review Comment:
   I think it would be better to `continue` here and in line 336, since other queries might have data (and if they also don't we'll end up returning `result` unmodified).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] codecov[bot] commented on pull request #20629: fix: pandas bug when data is blank on post-processing

Posted by GitBox <gi...@apache.org>.
codecov[bot] commented on PR #20629:
URL: https://github.com/apache/superset/pull/20629#issuecomment-1176537475

   # [Codecov](https://codecov.io/gh/apache/superset/pull/20629?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#20629](https://codecov.io/gh/apache/superset/pull/20629?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (7e0ec8f) into [master](https://codecov.io/gh/apache/superset/commit/b39a3d8f78d5cf8b53a1b473a4214047a2302d6e?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (b39a3d8) will **decrease** coverage by `11.93%`.
   > The diff coverage is `100.00%`.
   
   ```diff
   @@             Coverage Diff             @@
   ##           master   #20629       +/-   ##
   ===========================================
   - Coverage   66.82%   54.88%   -11.94%     
   ===========================================
     Files        1752     1752               
     Lines       65616    65626       +10     
     Branches     6938     6938               
   ===========================================
   - Hits        43849    36022     -7827     
   - Misses      20007    27844     +7837     
     Partials     1760     1760               
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hive | `?` | |
   | mysql | `?` | |
   | postgres | `?` | |
   | presto | `53.67% <0.00%> (-0.01%)` | :arrow_down: |
   | python | `58.15% <100.00%> (-24.75%)` | :arrow_down: |
   | sqlite | `?` | |
   | unit | `50.81% <100.00%> (+0.13%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/superset/pull/20629?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [superset/charts/post\_processing.py](https://codecov.io/gh/apache/superset/pull/20629/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvY2hhcnRzL3Bvc3RfcHJvY2Vzc2luZy5weQ==) | `89.78% <100.00%> (+22.60%)` | :arrow_up: |
   | [superset/utils/dashboard\_import\_export.py](https://codecov.io/gh/apache/superset/pull/20629/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvdXRpbHMvZGFzaGJvYXJkX2ltcG9ydF9leHBvcnQucHk=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [superset/key\_value/commands/upsert.py](https://codecov.io/gh/apache/superset/pull/20629/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQva2V5X3ZhbHVlL2NvbW1hbmRzL3Vwc2VydC5weQ==) | `0.00% <0.00%> (-89.59%)` | :arrow_down: |
   | [superset/key\_value/commands/update.py](https://codecov.io/gh/apache/superset/pull/20629/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQva2V5X3ZhbHVlL2NvbW1hbmRzL3VwZGF0ZS5weQ==) | `0.00% <0.00%> (-89.37%)` | :arrow_down: |
   | [superset/key\_value/commands/delete.py](https://codecov.io/gh/apache/superset/pull/20629/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQva2V5X3ZhbHVlL2NvbW1hbmRzL2RlbGV0ZS5weQ==) | `0.00% <0.00%> (-85.30%)` | :arrow_down: |
   | [superset/db\_engines/hive.py](https://codecov.io/gh/apache/superset/pull/20629/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGJfZW5naW5lcy9oaXZlLnB5) | `0.00% <0.00%> (-85.19%)` | :arrow_down: |
   | [superset/key\_value/commands/delete\_expired.py](https://codecov.io/gh/apache/superset/pull/20629/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQva2V5X3ZhbHVlL2NvbW1hbmRzL2RlbGV0ZV9leHBpcmVkLnB5) | `0.00% <0.00%> (-80.77%)` | :arrow_down: |
   | [superset/dashboards/commands/importers/v0.py](https://codecov.io/gh/apache/superset/pull/20629/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGFzaGJvYXJkcy9jb21tYW5kcy9pbXBvcnRlcnMvdjAucHk=) | `15.62% <0.00%> (-76.25%)` | :arrow_down: |
   | [superset/datasets/commands/update.py](https://codecov.io/gh/apache/superset/pull/20629/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGF0YXNldHMvY29tbWFuZHMvdXBkYXRlLnB5) | `25.88% <0.00%> (-68.24%)` | :arrow_down: |
   | [superset/datasets/commands/create.py](https://codecov.io/gh/apache/superset/pull/20629/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGF0YXNldHMvY29tbWFuZHMvY3JlYXRlLnB5) | `30.18% <0.00%> (-67.93%)` | :arrow_down: |
   | ... and [279 more](https://codecov.io/gh/apache/superset/pull/20629/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/superset/pull/20629?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/superset/pull/20629?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [b39a3d8...7e0ec8f](https://codecov.io/gh/apache/superset/pull/20629?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] eschutho commented on a diff in pull request #20629: fix: pandas bug when data is blank on post-processing

Posted by GitBox <gi...@apache.org>.
eschutho commented on code in PR #20629:
URL: https://github.com/apache/superset/pull/20629#discussion_r916356381


##########
superset/charts/post_processing.py:
##########
@@ -325,9 +325,16 @@ def apply_post_process(
 
     for query in result["queries"]:
         if query["result_format"] == ChartDataResultFormat.JSON:
-            df = pd.DataFrame.from_dict(query["data"])
+            try:
+                df = pd.DataFrame.from_dict(query["data"])
+            except ValueError:  # no data error
+                return result

Review Comment:
   based on @john-bodley's comment, i added a few more tests when data is `None` and found a case where it errors later in the code, so I put a nullish check like he suggested instead of the try/except.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] eschutho merged pull request #20629: fix: pandas bug when data is blank on post-processing

Posted by GitBox <gi...@apache.org>.
eschutho merged PR #20629:
URL: https://github.com/apache/superset/pull/20629


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] john-bodley commented on a diff in pull request #20629: fix: pandas bug when data is blank on post-processing

Posted by GitBox <gi...@apache.org>.
john-bodley commented on code in PR #20629:
URL: https://github.com/apache/superset/pull/20629#discussion_r915159759


##########
superset/charts/post_processing.py:
##########
@@ -325,9 +325,16 @@ def apply_post_process(
 
     for query in result["queries"]:
         if query["result_format"] == ChartDataResultFormat.JSON:
-            df = pd.DataFrame.from_dict(query["data"])
+            try:
+                df = pd.DataFrame.from_dict(query["data"])
+            except ValueError:  # no data error
+                return result

Review Comment:
   What is `query["data"]` in this case? Would it be preferable to first check whether `query["data"]` is valid?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] eschutho commented on a diff in pull request #20629: fix: pandas bug when data is blank on post-processing

Posted by GitBox <gi...@apache.org>.
eschutho commented on code in PR #20629:
URL: https://github.com/apache/superset/pull/20629#discussion_r916347837


##########
superset/charts/post_processing.py:
##########
@@ -325,9 +325,16 @@ def apply_post_process(
 
     for query in result["queries"]:
         if query["result_format"] == ChartDataResultFormat.JSON:
-            df = pd.DataFrame.from_dict(query["data"])
+            try:
+                df = pd.DataFrame.from_dict(query["data"])
+            except ValueError:  # no data error
+                return result

Review Comment:
   @john-bodley I tested out None and empty string with both the CSV and JSON formats, and JSON will error for both, and CSV only on `''` as you noted. I'm concerned that that logic of checking for validity could get even more complicated if the code changes in any way. WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] eschutho commented on a diff in pull request #20629: fix: pandas bug when data is blank on post-processing

Posted by GitBox <gi...@apache.org>.
eschutho commented on code in PR #20629:
URL: https://github.com/apache/superset/pull/20629#discussion_r916346456


##########
superset/charts/post_processing.py:
##########
@@ -325,9 +325,16 @@ def apply_post_process(
 
     for query in result["queries"]:
         if query["result_format"] == ChartDataResultFormat.JSON:
-            df = pd.DataFrame.from_dict(query["data"])
+            try:
+                df = pd.DataFrame.from_dict(query["data"])
+            except ValueError:  # no data error
+                return result

Review Comment:
   Ok, I see what you're saying. I also added a new test for multiple queries. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] betodealmeida commented on a diff in pull request #20629: fix: pandas bug when data is blank on post-processing

Posted by GitBox <gi...@apache.org>.
betodealmeida commented on code in PR #20629:
URL: https://github.com/apache/superset/pull/20629#discussion_r915371756


##########
superset/charts/post_processing.py:
##########
@@ -325,9 +325,16 @@ def apply_post_process(
 
     for query in result["queries"]:
         if query["result_format"] == ChartDataResultFormat.JSON:
-            df = pd.DataFrame.from_dict(query["data"])
+            try:
+                df = pd.DataFrame.from_dict(query["data"])
+            except ValueError:  # no data error
+                return result

Review Comment:
   If you `continue` here it would skip to the next step of the `for query in result["queries"]` loop, so it wouldn't get more errors. There could be other queries in `result["queries"]` that have non-blank data.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] eschutho commented on a diff in pull request #20629: fix: pandas bug when data is blank on post-processing

Posted by GitBox <gi...@apache.org>.
eschutho commented on code in PR #20629:
URL: https://github.com/apache/superset/pull/20629#discussion_r915274329


##########
superset/charts/post_processing.py:
##########
@@ -325,9 +325,16 @@ def apply_post_process(
 
     for query in result["queries"]:
         if query["result_format"] == ChartDataResultFormat.JSON:
-            df = pd.DataFrame.from_dict(query["data"])
+            try:
+                df = pd.DataFrame.from_dict(query["data"])
+            except ValueError:  # no data error
+                return result

Review Comment:
   @betodealmeida if the data is None or '', is there any value in continuing this process rather than returning early? AFAICT we'll continue to get more errors down below as well.  Per @john-bodley's point, I can do a nullish check instead of the try/except if we want to be more specific to these errors. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org