You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@superset.apache.org by GitBox <gi...@apache.org> on 2019/09/14 22:18:56 UTC

[GitHub] [incubator-superset] betodealmeida opened a new pull request #8226: Handle int64 columns with missing data in SQL Lab

betodealmeida opened a new pull request #8226: Handle int64 columns with missing data in SQL Lab
URL: https://github.com/apache/incubator-superset/pull/8226
 
 
   ### CATEGORY
   
   Choose one
   
   - [X] Bug Fix
   - [ ] Enhancement (new features, refinement)
   - [ ] Refactor
   - [ ] Add tests
   - [ ] Build / Development Environment
   - [ ] Documentation
   
   This PR fixes https://github.com/apache/incubator-superset/issues/8225.
   
   ### SUMMARY
   <!--- Describe the change below, including rationale and design decisions -->
   
   When a column has `int64` integers and missing data, Pandas will cast it to `float64`, resulting in loss of precision and possibly returning incorrect numbers.
   
   This PR fixes the bug by adding a method to the DB engine specs that returns a `dtype` based on the cursor description, currently implemented in Presto only. With the `dtype`, we can create a Pandas `Series` for each column, and create a `DataFrame` that has the proper types.
   
   Note that in order to represent the column correctly we need to use a [nullable data type](https://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html), introduced in Pandas 0.240. Unfortunately, [PyArrow is unable to serialize the resulting data frame](https://github.com/apache/arrow/issues/4168), so `msgpack` has to be disabled.
   
   <!-- ### BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF -->
   <!--- Skip this if not applicable -->
   
   ### TEST PLAN
   <!--- What steps should be taken to verify the changes -->
   
   Added unit test.
   
   ### ADDITIONAL INFORMATION
   <!--- Check any relevant boxes with "x" -->
   <!--- HINT: Include "Fixes #nnn" if you are fixing an existing issue -->
   - [X] Has associated issue: https://github.com/apache/incubator-superset/issues/8225
   - [ ] Changes UI
   - [ ] Requires DB Migration.
   - [ ] Confirm DB Migration upgrade and downgrade tested.
   - [ ] Introduces new feature or API
   - [ ] Removes existing feature or API
   
   ### REVIEWERS
   
   @villebro @robdiciuccio 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org