You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@superset.apache.org by GitBox <gi...@apache.org> on 2020/07/10 16:24:30 UTC
[GitHub] [incubator-superset] ktmud commented on a change in pull request #10286: feat: add contribution operation and fix cache_key bug

ktmud commented on a change in pull request #10286:
URL: https://github.com/apache/incubator-superset/pull/10286#discussion_r452922238



##########
File path: superset/charts/schemas.py
##########
@@ -395,6 +395,19 @@ class ChartDataSortOptionsSchema(ChartDataPostProcessingOperationOptionsSchema):
     aggregates = ChartDataAggregateConfigField()
 
 
+class ChartDataContributionOptionsSchema(ChartDataPostProcessingOperationOptionsSchema):
+    """
+    Contribution operation config.
+    """
+
+    orientation = fields.String(
+        description="Should cell values be calculated across the row or column.",
+        required=True,
+        validate=validate.OneOf(choices=("row", "column",)),

Review comment:
       Can these two also based on the enum?

##########
File path: superset/utils/pandas_postprocessing.py
##########
@@ -517,3 +518,29 @@ def _parse_location(location: str) -> Tuple[float, float, float]:
         return _append_columns(df, geodetic_df, columns)
     except ValueError:
         raise QueryObjectValidationError(_("Invalid geodetic string"))
+
+
+def contribution(
+    df: DataFrame, orientation: PostProcessingContributionOrientation
+) -> DataFrame:
+    """
+    Calculate cell contibution to row/column total.
+
+    :param df: DataFrame containing all-numeric data (temporal column ignored)
+    :param orientation: calculate by dividing cell with row/column total
+    :return: DataFrame with contributions, with temporal column at beginning if present
+    """
+    temporal_series: Optional[Series] = None
+    contribution_df = df.copy()
+    if DTTM_ALIAS in df.columns:
+        temporal_series = cast(Series, contribution_df.pop(DTTM_ALIAS))
+
+    if orientation == PostProcessingContributionOrientation.ROW:
+        contribution_dft = contribution_df.T
+        contribution_df = (contribution_dft / contribution_dft.sum()).T
+    else:
+        contribution_df = contribution_df / contribution_df.sum()

Review comment:
       Could probably utilize "axis" parameter for `sum`

##########
File path: superset/utils/pandas_postprocessing.py
##########
@@ -517,3 +518,29 @@ def _parse_location(location: str) -> Tuple[float, float, float]:
         return _append_columns(df, geodetic_df, columns)
     except ValueError:
         raise QueryObjectValidationError(_("Invalid geodetic string"))
+
+
+def contribution(
+    df: DataFrame, orientation: PostProcessingContributionOrientation
+) -> DataFrame:

Review comment:
       Maybe it makes more things to keep things simple, but is it possible to allow specifying which column(s) to compute on? I could imagine this operator being used to support "percent metrics" in table charts and it’ll need the ability to select specific columns.
   
   Maybe the API can be something like this
   
   ```
   {
     # compute row wise contribution and override the columns
     "columns": ["foo", "bar"],
     "orientation": "row"
   }
   
   {
     # compute column wise computation and insert new columns
     "columns": ["pct_abc"],
     "sources": ["abc"]
   }
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org