You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@superset.apache.org by GitBox <gi...@apache.org> on 2021/07/29 20:24:49 UTC

[GitHub] [superset] serenajiang opened a new issue #15956: Time Series viz shows incorrect legend for multiple group by columns

serenajiang opened a new issue #15956:
URL: https://github.com/apache/superset/issues/15956


   If you use multiple group by columns in the time series viz, the legend shows all permutations of the values rather than just the ones that exist in the data. This is a regression.
   
   ### Expected results
   
   The legend should only show values for the series in the chart.
   
   ### Actual results
   
   The legend shows all permutations of column values that are in the chart.
   
   #### Screenshots
   
   The legend incorrectly shows all permutations of 0 and 1, but the only actual series are (0,0) and (1,1).
   
   ![image](https://user-images.githubusercontent.com/14146019/127559792-15ae70ae-981a-4b9f-830b-2411f1e3664a.png)
   
   
   #### How to reproduce the bug
   
   1. Run a query like: `SELECT 0 as a, 0 as b, '2021-07-28' as ds UNION SELECT 1 as a, 1 as b, '2021-07-28' as ds`
   2. Explore chart. Edit dataset to make `ds` temporal.
   3. Switch to time series viz. Use metric count(*), group by a and b. Turn on legend.
   4. View incorrect legend.
   
   ### Environment
   
   (please complete the following information):
   
   - superset version: `superset version` up to date with master as of 2021-07-28
   - python version: `python --version` 3.8.11
   - node.js version: `node -v` v14.16.1
   
   ### Checklist
   
   Make sure to follow these steps before submitting your issue - thank you!
   
   - [ ] I have checked the superset logs for python stacktraces and included it here as text if there are any.
   - [X] I have reproduced the issue with at least the latest released version of superset.
   - [ ] I have checked the issue tracker for the same issue and I haven't found one similar.
   
   ### Additional context
   We just started observing this issue in this week's release, so this regression likely occurred sometime during the last ~10 days.
   
   Unsure whether this was caused by a change in superset-frontend or superset-ui. @zhaoyongjie any thoughts?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] zhaoyongjie commented on issue #15956: Time Series viz shows incorrect legend for multiple group by columns

Posted by GitBox <gi...@apache.org>.
zhaoyongjie commented on issue #15956:
URL: https://github.com/apache/superset/issues/15956#issuecomment-889683888


   It's a Pandas pivot_table limitations.
   https://github.com/pandas-dev/pandas/issues/18030
   
   After filling other values(eg: ), the pivot_table cannot do aggregation.
   
   
   <img width="968" alt="image" src="https://user-images.githubusercontent.com/2016594/127615585-733d2f55-3c0c-4043-85e7-8c09b1b12019.png">
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] serenajiang edited a comment on issue #15956: Time Series viz shows incorrect legend for multiple group by columns

Posted by GitBox <gi...@apache.org>.
serenajiang edited a comment on issue #15956:
URL: https://github.com/apache/superset/issues/15956#issuecomment-889591762


   @zhaoyongjie I can see an argument that this is by design, but it is inconsistent with previous behavior, is not useful for a lot of use cases, and actually makes charts display *incorrect data* due to the `SERIES LIMIT`.
   
   For example, if someone applied a series limit of 5, they would only see 5 series, but would see up to 25 different categories in the legend. Only the top 5 series are shown, but there might be data for the other groupings listed in the legend that is not shown because it is not in the top 5. To the user, it would appear that those groupings have no data.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] zhaoyongjie commented on issue #15956: Time Series viz shows incorrect legend for multiple group by columns

Posted by GitBox <gi...@apache.org>.
zhaoyongjie commented on issue #15956:
URL: https://github.com/apache/superset/issues/15956#issuecomment-889819269


   @serenajiang Thanks for reporting this issue, I file a PR to fix that. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] zhaoyongjie edited a comment on issue #15956: Time Series viz shows incorrect legend for multiple group by columns

Posted by GitBox <gi...@apache.org>.
zhaoyongjie edited a comment on issue #15956:
URL: https://github.com/apache/superset/issues/15956#issuecomment-889671031


   I'm working on this. If dropna = True, Will lose the `NaN` in metric.
   
   ```
   df = pd.DataFrame({
       "ds": [datetime(2012, 11, 1), datetime(2012, 11, 1)],
       "a": [1, 0],
       "b": [1, 0],
       "c": [np.NaN, 9],
   })
   
    	ds 	                a 	b 	c
   0 	2012-11-01 	1 	1 	NaN
   1 	2012-11-01 	0 	0 	9.0
   
   df.pivot_table(
       index=["ds"],
       columns=["a", "b"],
       values=["c"],
       aggfunc={
        "c": np.mean
       },
       dropna=True
   )
   
    	c
   a 	0
   b 	0
   ds 	
   2012-11-01 	9.0
   
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] zhaoyongjie edited a comment on issue #15956: Time Series viz shows incorrect legend for multiple group by columns

Posted by GitBox <gi...@apache.org>.
zhaoyongjie edited a comment on issue #15956:
URL: https://github.com/apache/superset/issues/15956#issuecomment-889683888


   It's a Pandas pivot_table limitations.
   https://github.com/pandas-dev/pandas/issues/18030
   
   After filling other value,  the pivot_table is unable to do aggregation.
   
   
   <img width="968" alt="image" src="https://user-images.githubusercontent.com/2016594/127615585-733d2f55-3c0c-4043-85e7-8c09b1b12019.png">
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] zhaoyongjie edited a comment on issue #15956: Time Series viz shows incorrect legend for multiple group by columns

Posted by GitBox <gi...@apache.org>.
zhaoyongjie edited a comment on issue #15956:
URL: https://github.com/apache/superset/issues/15956#issuecomment-889671031


   I'm working on this. If dropna = True, Will lose the `NaN` in metric.
   
   ```
   df = pd.DataFrame({
       "ds": [datetime(2012, 11, 1), datetime(2012, 11, 1)],
       "a": [1, 0],
       "b": [1, 0],
       "c": [np.NaN, 9],
   })
   
    	ds 	                a 	b 	c
   0 	2012-11-01 	1 	1 	NaN
   1 	2012-11-01 	0 	0 	9.0
   
   df.pivot_table(
       index=["ds"],
       columns=["a", "b"],
       values=["c"],
       aggfunc={
        "c": np.mean
       },
       dropna=True
   )
   
    	c
   a 	0
   b 	0
   ds 	
   2012-11-01 	9.0
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] zhaoyongjie edited a comment on issue #15956: Time Series viz shows incorrect legend for multiple group by columns

Posted by GitBox <gi...@apache.org>.
zhaoyongjie edited a comment on issue #15956:
URL: https://github.com/apache/superset/issues/15956#issuecomment-889671031






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] zhaoyongjie edited a comment on issue #15956: Time Series viz shows incorrect legend for multiple group by columns

Posted by GitBox <gi...@apache.org>.
zhaoyongjie edited a comment on issue #15956:
URL: https://github.com/apache/superset/issues/15956#issuecomment-889671031






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] serenajiang commented on issue #15956: Time Series viz shows incorrect legend for multiple group by columns

Posted by GitBox <gi...@apache.org>.
serenajiang commented on issue #15956:
URL: https://github.com/apache/superset/issues/15956#issuecomment-889594950


   Besides the inaccuracy issue, the impact on our users is that a lot of charts now show such long legends that the legends block the entire chart. There doesn't seem to be any workaround - users could manually specify the top 5 groupings, but since these would change over time, this would not be sustainable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] zhaoyongjie edited a comment on issue #15956: Time Series viz shows incorrect legend for multiple group by columns

Posted by GitBox <gi...@apache.org>.
zhaoyongjie edited a comment on issue #15956:
URL: https://github.com/apache/superset/issues/15956#issuecomment-889671031


   I'm working on this. If dropna = True, Will lose the `NaN` in metric.
   
   ```
   df = pd.DataFrame({
       "ds": [datetime(2012, 11, 1), datetime(2012, 11, 1)],
       "a": [1, 0],
       "b": [1, 0],
       "c": [np.NaN, 9],
   })
   
    	ds 	     a 	b 	c
   0 	2012-11-01 	1 	1 	NaN
   1 	2012-11-01 	0 	0 	9.0
   
   df.pivot_table(
       index=["ds"],
       columns=["a", "b"],
       values=["c"],
       aggfunc={
        "c": np.mean
       },
       dropna=True
   )
   
    	c
   a 	0
   b 	0
   ds 	
   2012-11-01 	9.0
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] villebro commented on issue #15956: Time Series viz shows incorrect legend for multiple group by columns

Posted by GitBox <gi...@apache.org>.
villebro commented on issue #15956:
URL: https://github.com/apache/superset/issues/15956#issuecomment-889668861


   I agree with @serenajiang that it makes sense to not show the permutations that don't have values associated with them. I checked that the NVD3 line chart does not exhibit this, and digging in further this is caused by `DataFrame.pivot_table` being called with `dropna=False` in the Timeseries viz as opposed to line chart defaulting to `dropna=True`:
   
   ![image](https://user-images.githubusercontent.com/33317356/127611922-8f6bc9a5-73a6-4e8d-8b6d-b4686cd7f3d0.png)
   
   This was introduced by this change: https://github.com/apache-superset/superset-ui/pull/1231 (see `drop_missing_columns: false`).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] serenajiang commented on issue #15956: Time Series viz shows incorrect legend for multiple group by columns

Posted by GitBox <gi...@apache.org>.
serenajiang commented on issue #15956:
URL: https://github.com/apache/superset/issues/15956#issuecomment-889591762


   @zhaoyongjie I can see an argument that this is by design, but it is inconsistent with previous behavior, is not useful for a lot of use cases, and actually makes charts display *incorrect data* due to the `SERIES LIMIT`. For example, if someone applied a series limit of 5, they would only see 5 series, but would see up to 25 different categories in the legend. The chart might also be incorrect - only the top 5 series are shown, but there might be data for the other groupings listed in the legend.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] zhaoyongjie commented on issue #15956: Time Series viz shows incorrect legend for multiple group by columns

Posted by GitBox <gi...@apache.org>.
zhaoyongjie commented on issue #15956:
URL: https://github.com/apache/superset/issues/15956#issuecomment-889588923


   @serenajiang 
   I think this is by design(the same logic is also in the NVD3 line chart.)
   For echart timeseries and NVD3 line chart
   1. x-axis is continuous datatime(__timestamp in SQL)
   2. y-axis are metrics(the metrics in the metrics control)
   3. datamark are permutations of the columns values(values of groupby control)
   4. we should add some `filter` to limit permutations.
   
   for instance:
   this viz means: sum of sales per day for the large deal in California USA in 2003
   
   <img width="1250" alt="image" src="https://user-images.githubusercontent.com/2016594/127592295-c3bf3a88-8564-4318-8891-167ffa52edde.png">
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] zhaoyongjie closed issue #15956: Time Series viz shows incorrect legend for multiple group by columns

Posted by GitBox <gi...@apache.org>.
zhaoyongjie closed issue #15956:
URL: https://github.com/apache/superset/issues/15956


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] junlincc commented on issue #15956: Time Series viz shows incorrect legend for multiple group by columns

Posted by GitBox <gi...@apache.org>.
junlincc commented on issue #15956:
URL: https://github.com/apache/superset/issues/15956#issuecomment-889909847


   @jinghua-qa please add it to test suite, in virtual dataset category 🙏 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] zhaoyongjie edited a comment on issue #15956: Time Series viz shows incorrect legend for multiple group by columns

Posted by GitBox <gi...@apache.org>.
zhaoyongjie edited a comment on issue #15956:
URL: https://github.com/apache/superset/issues/15956#issuecomment-889588923


   @serenajiang 
   I think this is by design(the same logic is also in the NVD3 line chart.)
   For echart timeseries and NVD3 line chart
   1. x-axis is continuous datatime(__timestamp in SQL)
   2. y-axis are metrics(the metrics in the metrics control)
   3. datamark are permutations of the columns values(values of groupby control)
   4. we should add some `filter` to limit permutations.
   
   for instance(use cleaned_sales_data):
   this viz means: sum of sales per day for the large deal in California USA in 2003
   
   <img width="1250" alt="image" src="https://user-images.githubusercontent.com/2016594/127592295-c3bf3a88-8564-4318-8891-167ffa52edde.png">
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] villebro commented on issue #15956: Time Series viz shows incorrect legend for multiple group by columns

Posted by GitBox <gi...@apache.org>.
villebro commented on issue #15956:
URL: https://github.com/apache/superset/issues/15956#issuecomment-889672403


   If we want to do data imputation, e.g. replace nulls with 0, we should probably do the imputation before the pivot operation. This would eliminate the need for using `dropna=False`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] zhaoyongjie edited a comment on issue #15956: Time Series viz shows incorrect legend for multiple group by columns

Posted by GitBox <gi...@apache.org>.
zhaoyongjie edited a comment on issue #15956:
URL: https://github.com/apache/superset/issues/15956#issuecomment-889683888






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] zhaoyongjie commented on issue #15956: Time Series viz shows incorrect legend for multiple group by columns

Posted by GitBox <gi...@apache.org>.
zhaoyongjie commented on issue #15956:
URL: https://github.com/apache/superset/issues/15956#issuecomment-889671031


   I'm working on this. If dropna = True, Will lose the `NaN` in metric.
   
   ```
   df = pd.DataFrame({
       "ds": [datetime(2012, 11, 1), datetime(2012, 11, 1)],
       "a": [1, 0],
       "b": [1, 0],
       "c": [np.NaN, 9],
   })
   
    	ds 	a 	b 	c
   0 	2012-11-01 	1 	1 	NaN
   1 	2012-11-01 	0 	0 	9.0
   
   df.pivot_table(
       index=["ds"],
       columns=["a", "b"],
       values=["c"],
       aggfunc={
        "c": np.mean
       },
       dropna=True
   )
   
    	c
   a 	0
   b 	0
   ds 	
   2012-11-01 	9.0
   
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org