You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@superset.apache.org by GitBox <gi...@apache.org> on 2022/07/19 03:20:57 UTC

[GitHub] [superset] john-bodley opened a new pull request, #20760: John bodley fix 20151

john-bodley opened a new pull request, #20760:
URL: https://github.com/apache/superset/pull/20760

   <!---
   Please write the PR title following the conventions at https://www.conventionalcommits.org/en/v1.0.0/
   Example:
   fix(dashboard): load charts correctly
   -->
   
   ### SUMMARY
   
   Regrettably https://github.com/apache/superset/pull/20151 wasn't suffice is the result set was stored prior to downloading the CSV file. More specifically Pandas coerces an integer array with `None` to a float—likely because of the Numpy coercion, i.e., 
   
   ```python
   >>> pd.DataFrame.from_records([{"foo": 1}, {"foo": None}])
      foo
   0  1.0
   1  NaN
   ```
   
   The fix is to explicitly define the dtype, i.e., 
   
   ```python
   >>> pd.DataFrame(data=[{"foo": 1}, {"foo": None}], dtype=object)
       foo
   0     1
   1  None
   ```
   
   ### BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
   <!--- Skip this if not applicable -->
   
   ### TESTING INSTRUCTIONS
   
   CI.
   
   ### ADDITIONAL INFORMATION
   <!--- Check any relevant boxes with "x" -->
   <!--- HINT: Include "Fixes #nnn" if you are fixing an existing issue -->
   - [ ] Has associated issue:
   - [ ] Required feature flags:
   - [ ] Changes UI
   - [ ] Includes DB Migration (follow approval process in [SIP-59](https://github.com/apache/superset/issues/13351))
     - [ ] Migration is atomic, supports rollback & is backwards-compatible
     - [ ] Confirm DB migration upgrade and downgrade tested
     - [ ] Runtime estimates and downtime expectations provided
   - [ ] Introduces new feature or API
   - [ ] Removes existing feature or API
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] john-bodley merged pull request #20760: fix(csv): Do not coerce persisted data integer columns to float

Posted by GitBox <gi...@apache.org>.
john-bodley merged PR #20760:
URL: https://github.com/apache/superset/pull/20760


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] john-bodley commented on a diff in pull request #20760: fix(csv): Do not coerce persisted data integer columns to float

Posted by GitBox <gi...@apache.org>.
john-bodley commented on code in PR #20760:
URL: https://github.com/apache/superset/pull/20760#discussion_r924023120


##########
superset/utils/csv.py:
##########
@@ -65,7 +65,6 @@ def escape_value(value: str) -> str:
 
 
 def df_to_escaped_csv(df: pd.DataFrame, **kwargs: Any) -> Any:
-    escape_values = lambda v: escape_value(v) if isinstance(v, str) else v

Review Comment:
   Unused code.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] john-bodley commented on a diff in pull request #20760: fix(csv): Do not coerce persisted data integer columns to float

Posted by GitBox <gi...@apache.org>.
john-bodley commented on code in PR #20760:
URL: https://github.com/apache/superset/pull/20760#discussion_r924024074


##########
superset/views/core.py:
##########
@@ -2502,8 +2502,7 @@ def csv(  # pylint: disable=no-self-use,too-many-locals
             obj = _deserialize_results_payload(
                 payload, query, cast(bool, results_backend_use_msgpack)
             )
-            columns = [c["name"] for c in obj["columns"]]
-            df = pd.DataFrame.from_records(obj["data"], columns=columns)
+            df = pd.DataFrame(data=obj["data"], dtype=object)

Review Comment:
   No need to specify column names as they're present in the data. Furthermore—per the Pandas documentation, 
   
   > Column labels to use for resulting frame when data does not have them, defaulting to RangeIndex(0, 1, 2, …, n). If data contains column labels, will perform column selection instead.
   
   the `columns` option doesn't rename existing columns if column labels are present, but performs column selection instead, i.e., 
   
   ```python
   >>> pd.DataFrame(data=[{"foo": 1}, {"foo": None}], dtype=object, columns=["foo"])
       foo
   0     1
   1  None
   
   >>> pd.DataFrame(data=[{"foo": 1}, {"foo": None}], dtype=object, columns=["bar"])
      bar
   0  NaN
   1  NaN
   ``` 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] mbcsa commented on pull request #20760: fix(csv): Do not coerce persisted data integer columns to float

Posted by GitBox <gi...@apache.org>.
mbcsa commented on PR #20760:
URL: https://github.com/apache/superset/pull/20760#issuecomment-1199781738

   Hi @john-bodley 
   
   This fix introduces a new problem when user exports CSV file from a cached Query.
   I've created a new issue #20919 
   
   The thing is, when Dataframe is created dinamically from cached data, it is not respecting column formats.
   This is a problem when decimal separator is configured by CSV_EXPORT, "sep" attribute
   
   I'm testing this, and it works well when changing:
   
   ```
   df = pd.DataFrame(
       data=obj["data"],
       dtype=object,
       columns=[c["name"] for c in obj["columns"]],
   )
   ```
   to
   ```
   df = pd.DataFrame(
       data=obj["data"],
       columns=[c["name"] for c in obj["columns"]],
   )
   ```
   
   Thank you


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] john-bodley commented on a diff in pull request #20760: John bodley fix 20151

Posted by GitBox <gi...@apache.org>.
john-bodley commented on code in PR #20760:
URL: https://github.com/apache/superset/pull/20760#discussion_r924023120


##########
superset/utils/csv.py:
##########
@@ -65,7 +65,6 @@ def escape_value(value: str) -> str:
 
 
 def df_to_escaped_csv(df: pd.DataFrame, **kwargs: Any) -> Any:
-    escape_values = lambda v: escape_value(v) if isinstance(v, str) else v

Review Comment:
   Unused code.



##########
superset/views/core.py:
##########
@@ -2502,8 +2502,7 @@ def csv(  # pylint: disable=no-self-use,too-many-locals
             obj = _deserialize_results_payload(
                 payload, query, cast(bool, results_backend_use_msgpack)
             )
-            columns = [c["name"] for c in obj["columns"]]
-            df = pd.DataFrame.from_records(obj["data"], columns=columns)
+            df = pd.DataFrame(data=obj["data"], dtype=object)

Review Comment:
   No need to specify column names as they're present in the data. Furthermore—per the Pandas documentation, 
   
   > Column labels to use for resulting frame when data does not have them, defaulting to RangeIndex(0, 1, 2, …, n). If data contains column labels, will perform column selection instead.
   
   the `columns` option doesn't rename existing columns if column labels are present, but performs column selection instead, i.e., 
   
   ```python
   >>> pd.DataFrame(data=[{"foo": 1}, {"foo": None}], dtype=object, columns=["foo"])
       foo
   0     1
   1  None
   
   >>> pd.DataFrame(data=[{"foo": 1}, {"foo": None}], dtype=object, columns=["bar"])
      bar
   0  NaN
   1  NaN
   ``` 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] codecov[bot] commented on pull request #20760: fix(csv): Do not coerce persisted data integer columns to float

Posted by GitBox <gi...@apache.org>.
codecov[bot] commented on PR #20760:
URL: https://github.com/apache/superset/pull/20760#issuecomment-1188553235

   # [Codecov](https://codecov.io/gh/apache/superset/pull/20760?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#20760](https://codecov.io/gh/apache/superset/pull/20760?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (cb7d25c) into [master](https://codecov.io/gh/apache/superset/commit/e60083b45b8953220e54c67544ce2381d7c96f2e?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (e60083b) will **decrease** coverage by `11.48%`.
   > The diff coverage is `0.00%`.
   
   > :exclamation: Current head cb7d25c differs from pull request most recent head 2915836. Consider uploading reports for the commit 2915836 to get more accurate results
   
   ```diff
   @@             Coverage Diff             @@
   ##           master   #20760       +/-   ##
   ===========================================
   - Coverage   66.35%   54.87%   -11.49%     
   ===========================================
     Files        1754     1754               
     Lines       66689    66688        -1     
     Branches     7049     7049               
   ===========================================
   - Hits        44253    36595     -7658     
   - Misses      20639    28296     +7657     
     Partials     1797     1797               
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | hive | `53.23% <0.00%> (+<0.01%)` | :arrow_up: |
   | mysql | `?` | |
   | postgres | `?` | |
   | presto | `53.09% <0.00%> (+<0.01%)` | :arrow_up: |
   | python | `58.00% <0.00%> (-23.69%)` | :arrow_down: |
   | sqlite | `?` | |
   | unit | `50.57% <0.00%> (+<0.01%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/superset/pull/20760?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [superset/views/core.py](https://codecov.io/gh/apache/superset/pull/20760/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvdmlld3MvY29yZS5weQ==) | `34.46% <0.00%> (-43.43%)` | :arrow_down: |
   | [superset/utils/dashboard\_import\_export.py](https://codecov.io/gh/apache/superset/pull/20760/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvdXRpbHMvZGFzaGJvYXJkX2ltcG9ydF9leHBvcnQucHk=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [superset/key\_value/commands/update.py](https://codecov.io/gh/apache/superset/pull/20760/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQva2V5X3ZhbHVlL2NvbW1hbmRzL3VwZGF0ZS5weQ==) | `0.00% <0.00%> (-88.89%)` | :arrow_down: |
   | [superset/key\_value/commands/delete.py](https://codecov.io/gh/apache/superset/pull/20760/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQva2V5X3ZhbHVlL2NvbW1hbmRzL2RlbGV0ZS5weQ==) | `0.00% <0.00%> (-85.30%)` | :arrow_down: |
   | [superset/key\_value/commands/delete\_expired.py](https://codecov.io/gh/apache/superset/pull/20760/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQva2V5X3ZhbHVlL2NvbW1hbmRzL2RlbGV0ZV9leHBpcmVkLnB5) | `0.00% <0.00%> (-80.77%)` | :arrow_down: |
   | [superset/dashboards/commands/importers/v0.py](https://codecov.io/gh/apache/superset/pull/20760/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGFzaGJvYXJkcy9jb21tYW5kcy9pbXBvcnRlcnMvdjAucHk=) | `15.62% <0.00%> (-76.25%)` | :arrow_down: |
   | [superset/datasets/commands/update.py](https://codecov.io/gh/apache/superset/pull/20760/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGF0YXNldHMvY29tbWFuZHMvdXBkYXRlLnB5) | `25.30% <0.00%> (-68.68%)` | :arrow_down: |
   | [superset/datasets/commands/create.py](https://codecov.io/gh/apache/superset/pull/20760/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGF0YXNldHMvY29tbWFuZHMvY3JlYXRlLnB5) | `29.41% <0.00%> (-68.63%)` | :arrow_down: |
   | [superset/datasets/commands/importers/v0.py](https://codecov.io/gh/apache/superset/pull/20760/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvZGF0YXNldHMvY29tbWFuZHMvaW1wb3J0ZXJzL3YwLnB5) | `24.03% <0.00%> (-67.45%)` | :arrow_down: |
   | [superset/reports/commands/execute.py](https://codecov.io/gh/apache/superset/pull/20760/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c3VwZXJzZXQvcmVwb3J0cy9jb21tYW5kcy9leGVjdXRlLnB5) | `24.45% <0.00%> (-67.16%)` | :arrow_down: |
   | ... and [275 more](https://codecov.io/gh/apache/superset/pull/20760/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/superset/pull/20760?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/superset/pull/20760?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [e60083b...2915836](https://codecov.io/gh/apache/superset/pull/20760?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org