You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@superset.apache.org by GitBox <gi...@apache.org> on 2018/06/11 17:51:44 UTC

[GitHub] mistercrunch commented on a change in pull request #4726: [bugfix] convert metrics to numeric in dataframe

mistercrunch commented on a change in pull request #4726: [bugfix] convert metrics to numeric in dataframe
URL: https://github.com/apache/incubator-superset/pull/4726#discussion_r194490570
 
 

 ##########
 File path: superset/viz.py
 ##########
 @@ -170,11 +170,21 @@ def get_df(self, query_obj=None):
                 if self.datasource.offset:
                     df[DTTM_ALIAS] += timedelta(hours=self.datasource.offset)
                 df[DTTM_ALIAS] += self.time_shift
+
+            self.df_metrics_to_num(df, query_obj.get('metrics') or [])
+
             df.replace([np.inf, -np.inf], np.nan)
             fillna = self.get_fillna_for_columns(df.columns)
             df = df.fillna(fillna)
         return df
 
+    @staticmethod
+    def df_metrics_to_num(df, metrics):
+        """Converting metrics to numeric when pandas.read_sql cannot"""
+        for col, dtype in df.dtypes.items():
+            if dtype.type == np.object_ and col in metrics:
+                df[col] = pd.to_numeric(df[col])
 
 Review comment:
   We should really clarify what a metric is or isn't. My understanding it that metrics are numeric, but I can see how it may not have been enforced in the past, and that in some places it would be interpreted as [only] being the result of an aggregate function that may or may not be numerical (say `MAX(my_string)` in the context of a Table viz would just happen to work somehow in the past).
   
   To be clear, the current conceptual model is dimensions are columns or non-aggregate-SQL-expressions, and metrics are always aggregate expressions and numeric. A more complete (but complex) model would be to have columns, sql-expression, and aggregate expressions, and each one may or may not be a metric and/or dimension. This model would require a fair amount of foundational redesign.
   
   How common is this? My incline would be to handle non-numeric aggregate functions outside of the semantic layer, meaning in a SQL Lab subquery or upstream data pipeline.
   
   Note that we could patch something to preserve backward compatibility here. Something like `BaseViz.enforce_numeric_metrics = True` that would be set to `False` for Table viz, and run the code referenced above conditionally.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org