You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@superset.apache.org by GitBox <gi...@apache.org> on 2019/09/14 20:11:38 UTC

[GitHub] [incubator-superset] betodealmeida opened a new issue #8225: Pandas casting int64 to float64, misrepresenting value

betodealmeida opened a new issue #8225: Pandas casting int64 to float64, misrepresenting value
URL: https://github.com/apache/incubator-superset/issues/8225
 
 
   I have the following data being returned by Presto (single column, 6 rows):
   
   ```
   [(None,), (1239162456494753670,), (None,), (None,), (None,), (None,)
   ```
   
   Due to the missing data (`None`), Pandas infers the type as `float64`, converting the value to a wrong id:
   
   ```python
   >>> column_names = ['organization_lyft_id']
   >>> data = [(None,), (1239162456494753670,), (None,), (None,), (None,), (None,)]
   >>> df = pd.DataFrame(list(data), columns=column_names).infer_objects()  # SupersetDataFrame
   >>> print(df)
   >>> print(df.dtypes)
      organization_lyft_id
   0                   NaN
   1          1.239162e+18
   2                   NaN
   3                   NaN
   4                   NaN
   5                   NaN
   organization_lyft_id    float64
   dtype: object
   ```
   
   The number then shows up as `1239162456494753800` in SQL Lab.
   
   Here's the Pandas documentation on this:
   
   > ... pandas primarily uses NaN to represent missing data. Because NaN is a float, this forces an array of integers with any missing values to become floating point. In some cases, this may not matter much. But if your integer column is, say, an identifier, casting to float can be problematic. **Some integers cannot even be represented as floating point numbers.** (emphasis mine)
   
   Note that if the missing data is filtered the value is inferred as an int64, and it shows up correctly in SQL Lab.
   
   The solution is to pass a `dtypes` argument when creating the Pandas data frame, built from the cursor description. I'm working on a fix for this.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org