You are viewing a plain text version of this content. The canonical link for it is here.

Posted to notifications@superset.apache.org by GitBox <gi...@apache.org> on 2022/08/07 08:05:12 UTC

[GitHub] [superset] EugeneTorap opened a new pull request, #21002: chore: Support Python 3.10 and bump pandas 1.4 and pyarrow 6

EugeneTorap opened a new pull request, #21002:
URL: https://github.com/apache/superset/pull/21002

   Fix #19986 issue when a user tries to install superset using Python 3.10 because pyarrow 5.0.0 doesn't have a wheel for Python 3.10
   
   ### SUMMARY
   In order to use Python 3.10 in superset we need to bump PyArrow (from 5.0.0 to 6.0.1)
   Also bump Pandas to latest minor (from 1.3.4 to 1.4.3).
   
   Pandas 1.4 added a wheel for Python 3.9, Apple Silicon
   
   Pandas 1.4 introduced support for using pyarrow as an engine for reading CSVs, which brings performance improvements (see https://pandas.pydata.org/docs/whatsnew/v1.4.0.html#multi-threaded-csv-reading-with-a-new-csv-engine-based-on-pyarrow for details). Therefore engine="pyarrow" has been added everywhere we're calling pd.read_csv.
   
   ### BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
   <!--- Skip this if not applicable -->
   
   ### TESTING INSTRUCTIONS
   <!--- Required! What steps can be taken to manually verify the changes? -->
   
   ### ADDITIONAL INFORMATION
   <!--- Check any relevant boxes with "x" -->
   <!--- HINT: Include "Fixes #nnn" if you are fixing an existing issue -->
   - [ ] Has associated issue:
   - [ ] Required feature flags:
   - [ ] Changes UI
   - [ ] Includes DB Migration (follow approval process in [SIP-59](https://github.com/apache/superset/issues/13351))
     - [ ] Migration is atomic, supports rollback & is backwards-compatible
     - [ ] Confirm DB migration upgrade and downgrade tested
     - [ ] Runtime estimates and downtime expectations provided
   - [ ] Introduces new feature or API
   - [ ] Removes existing feature or API
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org

[GitHub] [superset] EugeneTorap commented on a diff in pull request #21002: chore: Support Python 3.10 and bump pandas 1.4 and pyarrow 6

Posted by GitBox <gi...@apache.org>.

EugeneTorap commented on code in PR #21002:
URL: https://github.com/apache/superset/pull/21002#discussion_r940915810


##########
superset/examples/bart_lines.py:
##########
@@ -34,7 +34,7 @@ def load_bart_lines(only_metadata: bool = False, force: bool = False) -> None:
     table_exists = database.has_table_by_name(tbl_name)
 
     if not only_metadata and (not table_exists or force):
-        content = get_example_data("bart-lines.json.gz")
+        content = get_example_data("bart-lines.json.gz", make_bytes=True)

Review Comment:
   @betodealmeida Because `get_example_data` returns `bytes` and pandas version(1.4.0) resulting TypeError, `make_bytes=True` make `content = BytesIO(content)` 
   We can also convert bytes to string after `get_example_data`.
   And without it we got such error:
   
   > File "/home/runner/work/superset/superset/superset/cli/examples.py", line 113, in load_examples
       load_examples_run(load_test_data, load_big_data, only_metadata, force)
     File "/home/runner/work/superset/superset/superset/cli/examples.py", line 49, in load_examples_run
       examples.load_world_bank_health_n_pop(only_metadata, force)
     File "/home/runner/work/superset/superset/superset/examples/world_bank.py", line [60](https://github.com/apache/superset/runs/7710893773?check_suite_focus=true#step:8:61), in load_world_bank_health_n_pop
       pdf = pd.read_json(data)
   TypeError: Expected file path name or file-like object, got <class 'bytes'> type
   Error: Process completed with exit code 1.
   
   ![image](https://user-images.githubusercontent.com/29536522/183571826-c7e85066-3b2d-4de7-b8d9-521b5f774af7.png)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org

[GitHub] [superset] villebro commented on a diff in pull request #21002: chore: Support Python 3.10 and bump pandas 1.4 and pyarrow 6

Posted by GitBox <gi...@apache.org>.

villebro commented on code in PR #21002:
URL: https://github.com/apache/superset/pull/21002#discussion_r942754943


##########
tests/unit_tests/pandas_postprocessing/test_contribution.py:
##########
@@ -74,7 +74,7 @@ def test_contribution():
         rename_columns=["pct_a"],
     )
     assert processed_df.columns.tolist() == ["a", "b", "c", "pct_a"]
-    assert_array_equal(processed_df["a"].tolist(), [1, 3, nan])
-    assert_array_equal(processed_df["b"].tolist(), [1, 9, nan])
-    assert_array_equal(processed_df["c"].tolist(), [nan, nan, nan])
+    assert_array_equal(processed_df["a"].tolist(), [1, 3, 0])
+    assert_array_equal(processed_df["b"].tolist(), [1, 9, 0])
+    assert_array_equal(processed_df["c"].tolist(), [0, 0, 0])

Review Comment:
   @betodealmeida I agree completely, for this case it does make sense IMO. If needed I can put aside some time this week for stepping through the contribution code to see where the difference is happening so we can pinpoint what's changed. I probably didn't say it here yet, but this is a great PR and helps keep Superset up to date, so I really appreciate the work done here by @EugeneTorap 🙂👍 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org

[GitHub] [superset] EugeneTorap commented on pull request #21002: chore: Support Python 3.10 and bump pandas 1.4 and pyarrow 6

Posted by GitBox <gi...@apache.org>.

EugeneTorap commented on PR #21002:
URL: https://github.com/apache/superset/pull/21002#issuecomment-1208579078

   How should I fix this test?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org

[GitHub] [superset] betodealmeida commented on a diff in pull request #21002: chore: Support Python 3.10 and bump pandas 1.4 and pyarrow 6

Posted by GitBox <gi...@apache.org>.

betodealmeida commented on code in PR #21002:
URL: https://github.com/apache/superset/pull/21002#discussion_r940633847


##########
tests/unit_tests/pandas_postprocessing/test_contribution.py:
##########
@@ -74,7 +74,7 @@ def test_contribution():
         rename_columns=["pct_a"],
     )
     assert processed_df.columns.tolist() == ["a", "b", "c", "pct_a"]
-    assert_array_equal(processed_df["a"].tolist(), [1, 3, nan])
-    assert_array_equal(processed_df["b"].tolist(), [1, 9, nan])
-    assert_array_equal(processed_df["c"].tolist(), [nan, nan, nan])
+    assert_array_equal(processed_df["a"].tolist(), [1, 3, 0])
+    assert_array_equal(processed_df["b"].tolist(), [1, 9, 0])
+    assert_array_equal(processed_df["c"].tolist(), [0, 0, 0])

Review Comment:
   I don't think we want this change... a `nan` is very different from a 0.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org

[GitHub] [superset] EugeneTorap commented on a diff in pull request #21002: chore: Support Python 3.10 and bump pandas 1.4 and pyarrow 6

Posted by GitBox <gi...@apache.org>.

EugeneTorap commented on code in PR #21002:
URL: https://github.com/apache/superset/pull/21002#discussion_r945456755


##########
tests/unit_tests/pandas_postprocessing/test_contribution.py:
##########
@@ -74,7 +74,7 @@ def test_contribution():
         rename_columns=["pct_a"],
     )
     assert processed_df.columns.tolist() == ["a", "b", "c", "pct_a"]
-    assert_array_equal(processed_df["a"].tolist(), [1, 3, nan])
-    assert_array_equal(processed_df["b"].tolist(), [1, 9, nan])
-    assert_array_equal(processed_df["c"].tolist(), [nan, nan, nan])
+    assert_array_equal(processed_df["a"].tolist(), [1, 3, 0])
+    assert_array_equal(processed_df["b"].tolist(), [1, 9, 0])
+    assert_array_equal(processed_df["c"].tolist(), [0, 0, 0])

Review Comment:
   Yes, I need the help. Anyone who has write access can add the fix
   ![image](https://user-images.githubusercontent.com/29536522/184587714-6bc8ab52-90a6-4aec-8466-9d7d9591379c.png)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org

[GitHub] [superset] villebro commented on a diff in pull request #21002: chore: Support Python 3.10 and bump pandas 1.4 and pyarrow 6

Posted by GitBox <gi...@apache.org>.

villebro commented on code in PR #21002:
URL: https://github.com/apache/superset/pull/21002#discussion_r943321989


##########
tests/unit_tests/pandas_postprocessing/test_contribution.py:
##########
@@ -74,7 +74,7 @@ def test_contribution():
         rename_columns=["pct_a"],
     )
     assert processed_df.columns.tolist() == ["a", "b", "c", "pct_a"]
-    assert_array_equal(processed_df["a"].tolist(), [1, 3, nan])
-    assert_array_equal(processed_df["b"].tolist(), [1, 9, nan])
-    assert_array_equal(processed_df["c"].tolist(), [nan, nan, nan])
+    assert_array_equal(processed_df["a"].tolist(), [1, 3, 0])
+    assert_array_equal(processed_df["b"].tolist(), [1, 9, 0])
+    assert_array_equal(processed_df["c"].tolist(), [0, 0, 0])

Review Comment:
   @zhaoyongjie it looks like `select_dtypes` used to return a copy, but now returns a reference to the original series, making the changes happen in place. I suspect it's caused by this: https://github.com/pandas-dev/pandas/pull/42611



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org

[GitHub] [superset] betodealmeida commented on a diff in pull request #21002: chore: Support Python 3.10 and bump pandas 1.4 and pyarrow 6

Posted by GitBox <gi...@apache.org>.

betodealmeida commented on code in PR #21002:
URL: https://github.com/apache/superset/pull/21002#discussion_r945862693


##########
tests/unit_tests/pandas_postprocessing/test_contribution.py:
##########
@@ -74,7 +74,7 @@ def test_contribution():
         rename_columns=["pct_a"],
     )
     assert processed_df.columns.tolist() == ["a", "b", "c", "pct_a"]
-    assert_array_equal(processed_df["a"].tolist(), [1, 3, nan])
-    assert_array_equal(processed_df["b"].tolist(), [1, 9, nan])
-    assert_array_equal(processed_df["c"].tolist(), [nan, nan, nan])
+    assert_array_equal(processed_df["a"].tolist(), [1, 3, 0])
+    assert_array_equal(processed_df["b"].tolist(), [1, 9, 0])
+    assert_array_equal(processed_df["c"].tolist(), [0, 0, 0])

Review Comment:
   > The test is making sure that columns `a`, `b` and `c` are unchanged, and that the contribution of `a` should be placed in a new column `pct_a`.
   
   Ah, I missed that, I thought those were contributions. Good catch!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org

[GitHub] [superset] villebro commented on a diff in pull request #21002: chore: Support Python 3.10 and bump pandas 1.4 and pyarrow 6

Posted by GitBox <gi...@apache.org>.

villebro commented on code in PR #21002:
URL: https://github.com/apache/superset/pull/21002#discussion_r946474608


##########
tests/unit_tests/pandas_postprocessing/test_contribution.py:
##########
@@ -74,7 +74,7 @@ def test_contribution():
         rename_columns=["pct_a"],
     )
     assert processed_df.columns.tolist() == ["a", "b", "c", "pct_a"]
-    assert_array_equal(processed_df["a"].tolist(), [1, 3, nan])
-    assert_array_equal(processed_df["b"].tolist(), [1, 9, nan])
-    assert_array_equal(processed_df["c"].tolist(), [nan, nan, nan])
+    assert_array_equal(processed_df["a"].tolist(), [1, 3, 0])
+    assert_array_equal(processed_df["b"].tolist(), [1, 9, 0])
+    assert_array_equal(processed_df["c"].tolist(), [0, 0, 0])

Review Comment:
   FYI I've pushed the fix that makes the test changes unnecessary and I'm following up with @EugeneTorap to resolve a remaining issue



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org

[GitHub] [superset] codecov[bot] commented on pull request #21002: chore: Support Python 3.10 and bump pandas 1.4 and pyarrow 6

Posted by GitBox <gi...@apache.org>.

codecov[bot] commented on PR #21002:
URL: https://github.com/apache/superset/pull/21002#issuecomment-1207375611

   # [Codecov](https://codecov.io/gh/apache/superset/pull/21002?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#21002](https://codecov.io/gh/apache/superset/pull/21002?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (85369fe) into [master](https://codecov.io/gh/apache/superset/commit/e214e1ace616c3fdd40fcf64c501e08407feb8b3?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (e214e1a) will **decrease** coverage by `11.65%`.
   > The diff coverage is `18.75%`.
   
   ```diff
   @@             Coverage Diff             @@
   ##           master   #21002       +/-   ##
   ===========================================
   - Coverage   66.34%   54.68%   -11.66%     
   ===========================================
     Files        1767     1767               
     Lines       67312    67284       -28     
     Branches     7144     7138        -6     
   ===========================================
   - Hits        44656    36797     -7859     
   - Misses      20828    28661     +7833     
   + Partials     1828     1826        -2     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hive | `?` | |
   | mysql | `?` | |
   | postgres | `?` | |
   | presto | `53.05% <12.50%> (+<0.01%)` | :arrow_up: |
   | python | `57.42% <18.75%> (-24.05%)` | :arrow_down: |
   | sqlite | `?` | |
   | unit | `50.47% <12.50%> (-0.01%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/superset/pull/21002?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [superset/datasets/commands/importers/v1/utils.py](https://codecov.io/gh/apache/superset/pull/21002/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGF0YXNldHMvY29tbWFuZHMvaW1wb3J0ZXJzL3YxL3V0aWxzLnB5) | `55.95% <0.00%> (-2.39%)` | :arrow_down: |
   | [superset/examples/bart\_lines.py](https://codecov.io/gh/apache/superset/pull/21002/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZXhhbXBsZXMvYmFydF9saW5lcy5weQ==) | `0.00% <0.00%> (ø)` | |
   | [superset/examples/birth\_names.py](https://codecov.io/gh/apache/superset/pull/21002/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZXhhbXBsZXMvYmlydGhfbmFtZXMucHk=) | `72.07% <0.00%> (ø)` | |
   | [superset/examples/country\_map.py](https://codecov.io/gh/apache/superset/pull/21002/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZXhhbXBsZXMvY291bnRyeV9tYXAucHk=) | `0.00% <0.00%> (ø)` | |
   | [superset/examples/energy.py](https://codecov.io/gh/apache/superset/pull/21002/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZXhhbXBsZXMvZW5lcmd5LnB5) | `0.00% <0.00%> (ø)` | |
   | [superset/examples/flights.py](https://codecov.io/gh/apache/superset/pull/21002/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZXhhbXBsZXMvZmxpZ2h0cy5weQ==) | `0.00% <0.00%> (ø)` | |
   | [superset/examples/long\_lat.py](https://codecov.io/gh/apache/superset/pull/21002/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZXhhbXBsZXMvbG9uZ19sYXQucHk=) | `0.00% <0.00%> (ø)` | |
   | [superset/examples/multiformat\_time\_series.py](https://codecov.io/gh/apache/superset/pull/21002/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZXhhbXBsZXMvbXVsdGlmb3JtYXRfdGltZV9zZXJpZXMucHk=) | `0.00% <0.00%> (ø)` | |
   | [superset/examples/paris.py](https://codecov.io/gh/apache/superset/pull/21002/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZXhhbXBsZXMvcGFyaXMucHk=) | `0.00% <0.00%> (ø)` | |
   | [superset/examples/random\_time\_series.py](https://codecov.io/gh/apache/superset/pull/21002/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZXhhbXBsZXMvcmFuZG9tX3RpbWVfc2VyaWVzLnB5) | `0.00% <0.00%> (ø)` | |
   | ... and [310 more](https://codecov.io/gh/apache/superset/pull/21002/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   :mega: Codecov can now indicate which changes are the most critical in Pull Requests. [Learn more](https://about.codecov.io/product/feature/runtime-insights/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org

[GitHub] [superset] villebro commented on a diff in pull request #21002: chore: Support Python 3.10 and bump pandas 1.4 and pyarrow 6

Posted by GitBox <gi...@apache.org>.

villebro commented on code in PR #21002:
URL: https://github.com/apache/superset/pull/21002#discussion_r945985488


##########
tests/unit_tests/pandas_postprocessing/test_contribution.py:
##########
@@ -74,7 +74,7 @@ def test_contribution():
         rename_columns=["pct_a"],
     )
     assert processed_df.columns.tolist() == ["a", "b", "c", "pct_a"]
-    assert_array_equal(processed_df["a"].tolist(), [1, 3, nan])
-    assert_array_equal(processed_df["b"].tolist(), [1, 9, nan])
-    assert_array_equal(processed_df["c"].tolist(), [nan, nan, nan])
+    assert_array_equal(processed_df["a"].tolist(), [1, 3, 0])
+    assert_array_equal(processed_df["b"].tolist(), [1, 9, 0])
+    assert_array_equal(processed_df["c"].tolist(), [0, 0, 0])

Review Comment:
   FYI I opened a bug report on Pandas detailing the breaking change: https://github.com/pandas-dev/pandas/issues/48090



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org

[GitHub] [superset] EugeneTorap commented on a diff in pull request #21002: chore: Support Python 3.10 and bump pandas 1.4 and pyarrow 6

Posted by GitBox <gi...@apache.org>.

EugeneTorap commented on code in PR #21002:
URL: https://github.com/apache/superset/pull/21002#discussion_r946043915


##########
tests/unit_tests/pandas_postprocessing/test_contribution.py:
##########
@@ -74,7 +74,7 @@ def test_contribution():
         rename_columns=["pct_a"],
     )
     assert processed_df.columns.tolist() == ["a", "b", "c", "pct_a"]
-    assert_array_equal(processed_df["a"].tolist(), [1, 3, nan])
-    assert_array_equal(processed_df["b"].tolist(), [1, 9, nan])
-    assert_array_equal(processed_df["c"].tolist(), [nan, nan, nan])
+    assert_array_equal(processed_df["a"].tolist(), [1, 3, 0])
+    assert_array_equal(processed_df["b"].tolist(), [1, 9, 0])
+    assert_array_equal(processed_df["c"].tolist(), [0, 0, 0])

Review Comment:
   @villebro Can we rewrite our logic to not depends it is zero or nan?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org

[GitHub] [superset] EugeneTorap commented on pull request #21002: chore: Support Python 3.10 and bump pandas 1.4 and pyarrow 6

Posted by GitBox <gi...@apache.org>.

EugeneTorap commented on PR #21002:
URL: https://github.com/apache/superset/pull/21002#issuecomment-1208558644

   @hughhhh @betodealmeida Can you review it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org

[GitHub] [superset] villebro commented on a diff in pull request #21002: chore: Support Python 3.10 and bump pandas 1.4 and pyarrow 6

Posted by GitBox <gi...@apache.org>.

villebro commented on code in PR #21002:
URL: https://github.com/apache/superset/pull/21002#discussion_r942456223


##########
tests/unit_tests/pandas_postprocessing/test_contribution.py:
##########
@@ -74,7 +74,7 @@ def test_contribution():
         rename_columns=["pct_a"],
     )
     assert processed_df.columns.tolist() == ["a", "b", "c", "pct_a"]
-    assert_array_equal(processed_df["a"].tolist(), [1, 3, nan])
-    assert_array_equal(processed_df["b"].tolist(), [1, 9, nan])
-    assert_array_equal(processed_df["c"].tolist(), [nan, nan, nan])
+    assert_array_equal(processed_df["a"].tolist(), [1, 3, 0])
+    assert_array_equal(processed_df["b"].tolist(), [1, 9, 0])
+    assert_array_equal(processed_df["c"].tolist(), [0, 0, 0])

Review Comment:
   @zhaoyongjie how do you feel about this change? It seems weird that `nan` would now be `0` here. While it may be ok in this case, it would be important to understand the mechanism behind this change, as it might be introducing another regression that we don't have test coverage for right now..



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org

[GitHub] [superset] EugeneTorap commented on a diff in pull request #21002: chore: Support Python 3.10 and bump pandas 1.4 and pyarrow 6

Posted by GitBox <gi...@apache.org>.

EugeneTorap commented on code in PR #21002:
URL: https://github.com/apache/superset/pull/21002#discussion_r945456755


##########
tests/unit_tests/pandas_postprocessing/test_contribution.py:
##########
@@ -74,7 +74,7 @@ def test_contribution():
         rename_columns=["pct_a"],
     )
     assert processed_df.columns.tolist() == ["a", "b", "c", "pct_a"]
-    assert_array_equal(processed_df["a"].tolist(), [1, 3, nan])
-    assert_array_equal(processed_df["b"].tolist(), [1, 9, nan])
-    assert_array_equal(processed_df["c"].tolist(), [nan, nan, nan])
+    assert_array_equal(processed_df["a"].tolist(), [1, 3, 0])
+    assert_array_equal(processed_df["b"].tolist(), [1, 9, 0])
+    assert_array_equal(processed_df["c"].tolist(), [0, 0, 0])

Review Comment:
   Yes, I need the help. Anyone who has a write access can add the fix and then merge the PR.
   ![image](https://user-images.githubusercontent.com/29536522/184587714-6bc8ab52-90a6-4aec-8466-9d7d9591379c.png)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org

[GitHub] [superset] villebro commented on a diff in pull request #21002: chore: Support Python 3.10 and bump pandas 1.4 and pyarrow 6

Posted by GitBox <gi...@apache.org>.

villebro commented on code in PR #21002:
URL: https://github.com/apache/superset/pull/21002#discussion_r945602901


##########
tests/unit_tests/pandas_postprocessing/test_contribution.py:
##########
@@ -74,7 +74,7 @@ def test_contribution():
         rename_columns=["pct_a"],
     )
     assert processed_df.columns.tolist() == ["a", "b", "c", "pct_a"]
-    assert_array_equal(processed_df["a"].tolist(), [1, 3, nan])
-    assert_array_equal(processed_df["b"].tolist(), [1, 9, nan])
-    assert_array_equal(processed_df["c"].tolist(), [nan, nan, nan])
+    assert_array_equal(processed_df["a"].tolist(), [1, 3, 0])
+    assert_array_equal(processed_df["b"].tolist(), [1, 9, 0])
+    assert_array_equal(processed_df["c"].tolist(), [0, 0, 0])

Review Comment:
   @ghrom Python 3.10 is not currently officially supported by Superset; please refer to the currently supported versions: https://github.com/apache/superset/blob/394d62ee51b6feb6e23c2b24505ee469460db146/setup.py#L183-L186
   The project is happy to help add support for Python 3.10, but we simply can't merge this PR before ensuring that the regressions that it introduces have been fixed. Again, I'm very happy that this test exists, as otherwise we probably would have already merged this PR, introducing a regression.
   
   In the meantime, use a supported version of Python when running Superset (`pyenv` and `virtualenv` are great tools for this).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org

[GitHub] [superset] betodealmeida commented on a diff in pull request #21002: chore: Support Python 3.10 and bump pandas 1.4 and pyarrow 6

Posted by GitBox <gi...@apache.org>.

betodealmeida commented on code in PR #21002:
URL: https://github.com/apache/superset/pull/21002#discussion_r940767419


##########
superset/examples/bart_lines.py:
##########
@@ -34,7 +34,7 @@ def load_bart_lines(only_metadata: bool = False, force: bool = False) -> None:
     table_exists = database.has_table_by_name(tbl_name)
 
     if not only_metadata and (not table_exists or force):
-        content = get_example_data("bart-lines.json.gz")
+        content = get_example_data("bart-lines.json.gz", make_bytes=True)

Review Comment:
   Can you give some context on why `make_bytes` is needed here?
   
   Looking at [the docs](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_json.html) it seems that `read_json` still accepts a string. Passing `make_bytes=True` will wrap the string in a `BytesIO`, which seems unnecessary.



##########
superset/charts/post_processing.py:
##########
@@ -334,7 +334,7 @@ def apply_post_process(
         if query["result_format"] == ChartDataResultFormat.JSON:
             df = pd.DataFrame.from_dict(query["data"])
         elif query["result_format"] == ChartDataResultFormat.CSV:
-            df = pd.read_csv(StringIO(query["data"]))
+            df = pd.read_csv(StringIO(query["data"]), engine="pyarrow")

Review Comment:
   The docs mention that
   
   > New in version 1.4.0: The “pyarrow” engine was added as an experimental engine, and some features are unsupported, or may not work correctly, with this engine.
   
   So we should stick to the default to prevent any breaking changes. Any reason why you changed this to pyarrow, other than performance?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org

[GitHub] [superset] betodealmeida commented on pull request #21002: chore: Support Python 3.10 and bump pandas 1.4 and pyarrow 6

Posted by GitBox <gi...@apache.org>.

betodealmeida commented on PR #21002:
URL: https://github.com/apache/superset/pull/21002#issuecomment-1208679609

   > How should I fix this test? Pandas returns 0 instead of nan for the API
   
   Taking another look, I guess 0 makes sense from a contribution point of view. It should be fine in this case.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org

[GitHub] [superset] EugeneTorap commented on pull request #21002: chore: Support Python 3.10 and bump pandas 1.4 and pyarrow 6

Posted by GitBox <gi...@apache.org>.

EugeneTorap commented on PR #21002:
URL: https://github.com/apache/superset/pull/21002#issuecomment-1216396558

   @betodealmeida @villebro Can you review again?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org

[GitHub] [superset] EugeneTorap commented on a diff in pull request #21002: chore: Support Python 3.10 and bump pandas 1.4 and pyarrow 6

Posted by GitBox <gi...@apache.org>.

EugeneTorap commented on code in PR #21002:
URL: https://github.com/apache/superset/pull/21002#discussion_r940915810


##########
superset/examples/bart_lines.py:
##########
@@ -34,7 +34,7 @@ def load_bart_lines(only_metadata: bool = False, force: bool = False) -> None:
     table_exists = database.has_table_by_name(tbl_name)
 
     if not only_metadata and (not table_exists or force):
-        content = get_example_data("bart-lines.json.gz")
+        content = get_example_data("bart-lines.json.gz", make_bytes=True)

Review Comment:
   @betodealmeida Because `get_example_data` returns `bytes` and pandas version(1.4.0) resulting TypeError, `make_bytes=True` make `content = BytesIO(content)` 
   We can also use `StringIO` for it.
   And without it we got such error:
   
   > File "/home/runner/work/superset/superset/superset/cli/examples.py", line 113, in load_examples
       load_examples_run(load_test_data, load_big_data, only_metadata, force)
     File "/home/runner/work/superset/superset/superset/cli/examples.py", line 49, in load_examples_run
       examples.load_world_bank_health_n_pop(only_metadata, force)
     File "/home/runner/work/superset/superset/superset/examples/world_bank.py", line [60](https://github.com/apache/superset/runs/7710893773?check_suite_focus=true#step:8:61), in load_world_bank_health_n_pop
       pdf = pd.read_json(data)
   TypeError: Expected file path name or file-like object, got <class 'bytes'> type
   Error: Process completed with exit code 1.
   
   ![image](https://user-images.githubusercontent.com/29536522/183571826-c7e85066-3b2d-4de7-b8d9-521b5f774af7.png)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org

[GitHub] [superset] EugeneTorap commented on a diff in pull request #21002: chore: Support Python 3.10 and bump pandas 1.4 and pyarrow 6

Posted by GitBox <gi...@apache.org>.

EugeneTorap commented on code in PR #21002:
URL: https://github.com/apache/superset/pull/21002#discussion_r940929368


##########
superset/charts/post_processing.py:
##########
@@ -334,7 +334,7 @@ def apply_post_process(
         if query["result_format"] == ChartDataResultFormat.JSON:
             df = pd.DataFrame.from_dict(query["data"])
         elif query["result_format"] == ChartDataResultFormat.CSV:
-            df = pd.read_csv(StringIO(query["data"]))
+            df = pd.read_csv(StringIO(query["data"]), engine="pyarrow")

Review Comment:
   Only performance. Removed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org

[GitHub] [superset] EugeneTorap commented on pull request #21002: chore: Support Python 3.10 and bump pandas 1.4 and pyarrow 6

Posted by GitBox <gi...@apache.org>.

EugeneTorap commented on PR #21002:
URL: https://github.com/apache/superset/pull/21002#issuecomment-1210666943

   @betodealmeida @villebro Can you review again?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org

[GitHub] [superset] EugeneTorap commented on a diff in pull request #21002: chore: Support Python 3.10 and bump pandas 1.4 and pyarrow 6

Posted by GitBox <gi...@apache.org>.

EugeneTorap commented on code in PR #21002:
URL: https://github.com/apache/superset/pull/21002#discussion_r945456755


##########
tests/unit_tests/pandas_postprocessing/test_contribution.py:
##########
@@ -74,7 +74,7 @@ def test_contribution():
         rename_columns=["pct_a"],
     )
     assert processed_df.columns.tolist() == ["a", "b", "c", "pct_a"]
-    assert_array_equal(processed_df["a"].tolist(), [1, 3, nan])
-    assert_array_equal(processed_df["b"].tolist(), [1, 9, nan])
-    assert_array_equal(processed_df["c"].tolist(), [nan, nan, nan])
+    assert_array_equal(processed_df["a"].tolist(), [1, 3, 0])
+    assert_array_equal(processed_df["b"].tolist(), [1, 9, 0])
+    assert_array_equal(processed_df["c"].tolist(), [0, 0, 0])

Review Comment:
   Yes, I need the help. Anyone who has a write access can add the fix
   ![image](https://user-images.githubusercontent.com/29536522/184587714-6bc8ab52-90a6-4aec-8466-9d7d9591379c.png)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org

[GitHub] [superset] villebro commented on a diff in pull request #21002: chore: Support Python 3.10 and bump pandas 1.4 and pyarrow 6

Posted by GitBox <gi...@apache.org>.

villebro commented on code in PR #21002:
URL: https://github.com/apache/superset/pull/21002#discussion_r945424360


##########
tests/unit_tests/pandas_postprocessing/test_contribution.py:
##########
@@ -74,7 +74,7 @@ def test_contribution():
         rename_columns=["pct_a"],
     )
     assert processed_df.columns.tolist() == ["a", "b", "c", "pct_a"]
-    assert_array_equal(processed_df["a"].tolist(), [1, 3, nan])
-    assert_array_equal(processed_df["b"].tolist(), [1, 9, nan])
-    assert_array_equal(processed_df["c"].tolist(), [nan, nan, nan])
+    assert_array_equal(processed_df["a"].tolist(), [1, 3, 0])
+    assert_array_equal(processed_df["b"].tolist(), [1, 9, 0])
+    assert_array_equal(processed_df["c"].tolist(), [0, 0, 0])

Review Comment:
   @ghrom to be clear, without addressing the breaking change caused by upgrading to Pandas 1.4 we would be introducing a regression. I have asked @EugeneTorap if he needs help addressing the issue and I am happy to help if needed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org

[GitHub] [superset] ghrom commented on a diff in pull request #21002: chore: Support Python 3.10 and bump pandas 1.4 and pyarrow 6

Posted by GitBox <gi...@apache.org>.

ghrom commented on code in PR #21002:
URL: https://github.com/apache/superset/pull/21002#discussion_r945598608


##########
tests/unit_tests/pandas_postprocessing/test_contribution.py:
##########
@@ -74,7 +74,7 @@ def test_contribution():
         rename_columns=["pct_a"],
     )
     assert processed_df.columns.tolist() == ["a", "b", "c", "pct_a"]
-    assert_array_equal(processed_df["a"].tolist(), [1, 3, nan])
-    assert_array_equal(processed_df["b"].tolist(), [1, 9, nan])
-    assert_array_equal(processed_df["c"].tolist(), [nan, nan, nan])
+    assert_array_equal(processed_df["a"].tolist(), [1, 3, 0])
+    assert_array_equal(processed_df["b"].tolist(), [1, 9, 0])
+    assert_array_equal(processed_df["c"].tolist(), [0, 0, 0])

Review Comment:
   > @ghrom to be clear, without addressing the breaking change caused by upgrading to Pandas 1.4 we would be introducing a regression. I have asked @EugeneTorap if he needs help addressing the issue and I am happy to help if needed.
   
   Please introduce yourself to the concept of the greater vs. lesser evil. The current state of the pip repository is BROKEN. You are worried about your tests?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org

[GitHub] [superset] zhaoyongjie commented on a diff in pull request #21002: chore: Support Python 3.10 and bump pandas 1.4 and pyarrow 6

Posted by GitBox <gi...@apache.org>.

zhaoyongjie commented on code in PR #21002:
URL: https://github.com/apache/superset/pull/21002#discussion_r943310829


##########
tests/unit_tests/pandas_postprocessing/test_contribution.py:
##########
@@ -74,7 +74,7 @@ def test_contribution():
         rename_columns=["pct_a"],
     )
     assert processed_df.columns.tolist() == ["a", "b", "c", "pct_a"]
-    assert_array_equal(processed_df["a"].tolist(), [1, 3, nan])
-    assert_array_equal(processed_df["b"].tolist(), [1, 9, nan])
-    assert_array_equal(processed_df["c"].tolist(), [nan, nan, nan])
+    assert_array_equal(processed_df["a"].tolist(), [1, 3, 0])
+    assert_array_equal(processed_df["b"].tolist(), [1, 9, 0])
+    assert_array_equal(processed_df["c"].tolist(), [0, 0, 0])

Review Comment:
   Hi @EugeneTorap , I am tracking this test case, the `nan` changed to `0` from [here](https://github.com/apache/superset/blob/4f1996dba8e35ee958048b726750247ec8e518aa/superset/utils/pandas_postprocessing/contribution.py#L51-L52). let's look over what's changes from `select_dtypes` in the upgrade.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org

[GitHub] [superset] betodealmeida commented on a diff in pull request #21002: chore: Support Python 3.10 and bump pandas 1.4 and pyarrow 6

Posted by GitBox <gi...@apache.org>.

betodealmeida commented on code in PR #21002:
URL: https://github.com/apache/superset/pull/21002#discussion_r947942417


##########
superset/examples/helpers.py:
##########
@@ -73,14 +70,5 @@ def get_slice_json(defaults: Dict[Any, Any], **kwargs: Any) -> str:
     return json.dumps(defaults_copy, indent=4, sort_keys=True)
 
 
-def get_example_data(
-    filepath: str, is_gzip: bool = True, make_bytes: bool = False
-) -> BytesIO:
-    content = request.urlopen(  # pylint: disable=consider-using-with
-        f"{BASE_URL}{filepath}?raw=true"
-    ).read()
-    if is_gzip:
-        content = zlib.decompress(content, zlib.MAX_WBITS | 16)
-    if make_bytes:
-        content = BytesIO(content)
-    return content
+def get_example_url(filepath: str) -> str:
+    return f"{BASE_URL}{filepath}?raw=true"

Review Comment:
   Nice!



##########
superset/utils/pandas_postprocessing/contribution.py:
##########
@@ -49,6 +49,9 @@ def contribution(
     """
     contribution_df = df.copy()
     numeric_df = contribution_df.select_dtypes(include=["number", Decimal])
+    # TODO: copy needed due to following regression in 1.4, remove if not needed:
+    # https://github.com/pandas-dev/pandas/issues/48090
+    numeric_df = numeric_df.copy()

Review Comment:
   Awesome!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org

[GitHub] [superset] villebro commented on a diff in pull request #21002: chore: Support Python 3.10 and bump pandas 1.4 and pyarrow 6

Posted by GitBox <gi...@apache.org>.

villebro commented on code in PR #21002:
URL: https://github.com/apache/superset/pull/21002#discussion_r945985488


##########
tests/unit_tests/pandas_postprocessing/test_contribution.py:
##########
@@ -74,7 +74,7 @@ def test_contribution():
         rename_columns=["pct_a"],
     )
     assert processed_df.columns.tolist() == ["a", "b", "c", "pct_a"]
-    assert_array_equal(processed_df["a"].tolist(), [1, 3, nan])
-    assert_array_equal(processed_df["b"].tolist(), [1, 9, nan])
-    assert_array_equal(processed_df["c"].tolist(), [nan, nan, nan])
+    assert_array_equal(processed_df["a"].tolist(), [1, 3, 0])
+    assert_array_equal(processed_df["b"].tolist(), [1, 9, 0])
+    assert_array_equal(processed_df["c"].tolist(), [0, 0, 0])

Review Comment:
   FYI I opened a bug report detailing the breaking change: https://github.com/pandas-dev/pandas/issues/48090



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org

[GitHub] [superset] EugeneTorap commented on a diff in pull request #21002: chore: Support Python 3.10 and bump pandas 1.4 and pyarrow 6

Posted by GitBox <gi...@apache.org>.

EugeneTorap commented on code in PR #21002:
URL: https://github.com/apache/superset/pull/21002#discussion_r945456755


##########
tests/unit_tests/pandas_postprocessing/test_contribution.py:
##########
@@ -74,7 +74,7 @@ def test_contribution():
         rename_columns=["pct_a"],
     )
     assert processed_df.columns.tolist() == ["a", "b", "c", "pct_a"]
-    assert_array_equal(processed_df["a"].tolist(), [1, 3, nan])
-    assert_array_equal(processed_df["b"].tolist(), [1, 9, nan])
-    assert_array_equal(processed_df["c"].tolist(), [nan, nan, nan])
+    assert_array_equal(processed_df["a"].tolist(), [1, 3, 0])
+    assert_array_equal(processed_df["b"].tolist(), [1, 9, 0])
+    assert_array_equal(processed_df["c"].tolist(), [0, 0, 0])

Review Comment:
   Yes, I need the help. Anyone who has a write access can add the fix and then to merge the PR.
   ![image](https://user-images.githubusercontent.com/29536522/184587714-6bc8ab52-90a6-4aec-8466-9d7d9591379c.png)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org

[GitHub] [superset] cwegener commented on pull request #21002: chore: Support Python 3.10 and bump pandas 1.4 and pyarrow 6

Posted by GitBox <gi...@apache.org>.

cwegener commented on PR #21002:
URL: https://github.com/apache/superset/pull/21002#issuecomment-1221749111

   Nice work! Going to test this out very soon.
   
   I know that there used to be the problem of and empty result set from SQLalchemy causing an Exception in pandas when using PyArrow 6.0 and higher, leading to unfriendly error messages in Explore (and charts on dashboards) instead of the friendly "No data" message.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org

[GitHub] [superset] betodealmeida commented on a diff in pull request #21002: chore: Support Python 3.10 and bump pandas 1.4 and pyarrow 6

Posted by GitBox <gi...@apache.org>.

betodealmeida commented on code in PR #21002:
URL: https://github.com/apache/superset/pull/21002#discussion_r942564689


##########
tests/unit_tests/pandas_postprocessing/test_contribution.py:
##########
@@ -74,7 +74,7 @@ def test_contribution():
         rename_columns=["pct_a"],
     )
     assert processed_df.columns.tolist() == ["a", "b", "c", "pct_a"]
-    assert_array_equal(processed_df["a"].tolist(), [1, 3, nan])
-    assert_array_equal(processed_df["b"].tolist(), [1, 9, nan])
-    assert_array_equal(processed_df["c"].tolist(), [nan, nan, nan])
+    assert_array_equal(processed_df["a"].tolist(), [1, 3, 0])
+    assert_array_equal(processed_df["b"].tolist(), [1, 9, 0])
+    assert_array_equal(processed_df["c"].tolist(), [0, 0, 0])

Review Comment:
   @villebro in this case this is the result of the `contribution` post processing function, so I think it makes sense — if something is not present it's contribution is zero. But I agree, we definitely don't want nans everywhere to become zeros... it would be nice to have more coverage on tests that manipulate data to ensure we're not returning wrong results (the worst kind of bug!).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org

[GitHub] [superset] villebro commented on a diff in pull request #21002: chore: Support Python 3.10 and bump pandas 1.4 and pyarrow 6

Posted by GitBox <gi...@apache.org>.

villebro commented on code in PR #21002:
URL: https://github.com/apache/superset/pull/21002#discussion_r943155044


##########
tests/unit_tests/pandas_postprocessing/test_contribution.py:
##########
@@ -74,7 +74,7 @@ def test_contribution():
         rename_columns=["pct_a"],
     )
     assert processed_df.columns.tolist() == ["a", "b", "c", "pct_a"]
-    assert_array_equal(processed_df["a"].tolist(), [1, 3, nan])
-    assert_array_equal(processed_df["b"].tolist(), [1, 9, nan])
-    assert_array_equal(processed_df["c"].tolist(), [nan, nan, nan])
+    assert_array_equal(processed_df["a"].tolist(), [1, 3, 0])
+    assert_array_equal(processed_df["b"].tolist(), [1, 9, 0])
+    assert_array_equal(processed_df["c"].tolist(), [0, 0, 0])

Review Comment:
   I looked more closely at this, and I think this needs to be fixed. The test is making sure that columns `a`, `b` and `c` are unchanged, and that the contribution of `a` should be placed in a new column `pct_a`. Therefore this would in fact count as a regression, as the original values should not be mutated. So I would make sure the test does the following:
   - these three assertions are unchanged
   - add an additional assertion to make sure `pct_a` is in fact the correct contribution. I expect it should be `[0.25, 0.75, 0]`
   
   @EugeneTorap let me know if you need help with this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org

[GitHub] [superset] betodealmeida merged pull request #21002: chore: Support Python 3.10 and bump pandas 1.4 and pyarrow 6

Posted by GitBox <gi...@apache.org>.

betodealmeida merged PR #21002:
URL: https://github.com/apache/superset/pull/21002


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org

[GitHub] [superset] EugeneTorap commented on a diff in pull request #21002: chore: Support Python 3.10 and bump pandas 1.4 and pyarrow 6

Posted by GitBox <gi...@apache.org>.

EugeneTorap commented on code in PR #21002:
URL: https://github.com/apache/superset/pull/21002#discussion_r940928864


##########
superset/examples/bart_lines.py:
##########
@@ -34,7 +34,7 @@ def load_bart_lines(only_metadata: bool = False, force: bool = False) -> None:
     table_exists = database.has_table_by_name(tbl_name)
 
     if not only_metadata and (not table_exists or force):
-        content = get_example_data("bart-lines.json.gz")
+        content = get_example_data("bart-lines.json.gz", make_bytes=True)

Review Comment:
   @betodealmeida I've converted `bytes` to `string` using `decode("utf-8")`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org

[GitHub] [superset] ghrom commented on a diff in pull request #21002: chore: Support Python 3.10 and bump pandas 1.4 and pyarrow 6

Posted by GitBox <gi...@apache.org>.

ghrom commented on code in PR #21002:
URL: https://github.com/apache/superset/pull/21002#discussion_r945642733


##########
tests/unit_tests/pandas_postprocessing/test_contribution.py:
##########
@@ -74,7 +74,7 @@ def test_contribution():
         rename_columns=["pct_a"],
     )
     assert processed_df.columns.tolist() == ["a", "b", "c", "pct_a"]
-    assert_array_equal(processed_df["a"].tolist(), [1, 3, nan])
-    assert_array_equal(processed_df["b"].tolist(), [1, 9, nan])
-    assert_array_equal(processed_df["c"].tolist(), [nan, nan, nan])
+    assert_array_equal(processed_df["a"].tolist(), [1, 3, 0])
+    assert_array_equal(processed_df["b"].tolist(), [1, 9, 0])
+    assert_array_equal(processed_df["c"].tolist(), [0, 0, 0])

Review Comment:
   Your buggy tests are failing the project. Good luck with your 'great tools' perpetuating the dll hell.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org

[GitHub] [superset] villebro commented on a diff in pull request #21002: chore: Support Python 3.10 and bump pandas 1.4 and pyarrow 6

Posted by GitBox <gi...@apache.org>.

villebro commented on code in PR #21002:
URL: https://github.com/apache/superset/pull/21002#discussion_r946056092


##########
tests/unit_tests/pandas_postprocessing/test_contribution.py:
##########
@@ -74,7 +74,7 @@ def test_contribution():
         rename_columns=["pct_a"],
     )
     assert processed_df.columns.tolist() == ["a", "b", "c", "pct_a"]
-    assert_array_equal(processed_df["a"].tolist(), [1, 3, nan])
-    assert_array_equal(processed_df["b"].tolist(), [1, 9, nan])
-    assert_array_equal(processed_df["c"].tolist(), [nan, nan, nan])
+    assert_array_equal(processed_df["a"].tolist(), [1, 3, 0])
+    assert_array_equal(processed_df["b"].tolist(), [1, 9, 0])
+    assert_array_equal(processed_df["c"].tolist(), [0, 0, 0])

Review Comment:
   > @villebro Can we rewrite our logic to not depends it is zero or nan?
   
   @EugeneTorap sure we can, I can probably do it tomorrow. But I wanted to make the Pandas project aware of the regression first to also know what the plan is going forward; is this an intended breaking change or will it be fixed in a forthcoming patch release.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org

[GitHub] [superset] ghrom commented on a diff in pull request #21002: chore: Support Python 3.10 and bump pandas 1.4 and pyarrow 6

Posted by GitBox <gi...@apache.org>.

ghrom commented on code in PR #21002:
URL: https://github.com/apache/superset/pull/21002#discussion_r945344198


##########
tests/unit_tests/pandas_postprocessing/test_contribution.py:
##########
@@ -74,7 +74,7 @@ def test_contribution():
         rename_columns=["pct_a"],
     )
     assert processed_df.columns.tolist() == ["a", "b", "c", "pct_a"]
-    assert_array_equal(processed_df["a"].tolist(), [1, 3, nan])
-    assert_array_equal(processed_df["b"].tolist(), [1, 9, nan])
-    assert_array_equal(processed_df["c"].tolist(), [nan, nan, nan])
+    assert_array_equal(processed_df["a"].tolist(), [1, 3, 0])
+    assert_array_equal(processed_df["b"].tolist(), [1, 9, 0])
+    assert_array_equal(processed_df["c"].tolist(), [0, 0, 0])

Review Comment:
   I'm concerned that you are blocking the change which is fixing the current broken installation :(



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org