You are viewing a plain text version of this content. The canonical link for it is here.

Posted to notifications@superset.apache.org by GitBox <gi...@apache.org> on 2021/06/10 16:11:04 UTC

[GitHub] [superset] gwierzchowski opened a new issue #15092: [SIP] Proposal for exporting data from datasets as jinja keys to be used in other charts

gwierzchowski opened a new issue #15092:
URL: https://github.com/apache/superset/issues/15092

## [SIP] Proposal for exporting data from datasets as jinja keys to be used in other charts

### Motivation

We (at Hitachi-ABB Power Grids R&D) are trying to use Superset to build dashboards that present different measures related to energetic grids' performance and reliability. Definitions of those measures are fixed and regulated by respective norms. Many of them are defined as a division of one value by other, and here we face some problem. Nominators and denominators in our formulas are being calculated from different datasets (based on queries on PostgreSQL).

E.g. Inside BigNumber chart (and other charts like line chart) we want to display number `SUM("DurationMin")/COUNT(DISTINCT "CustomerCode")` where "DurationMin" is taken from other joined tables than "CustomerCode". And we want this calculation to be properly updated when we change selections in different filters. The problem is that we have for instance filter on data like Extent which only applies to dataset related to nominator (DurationMin) and is completely irrelevant to Consumer dataset (in means that there is no connection between tables). Even worse both datasets depend on some time-stamped data but with different granularity. So result is that when we try to apply some filters our value presented in BigNumber is being calculated improperly or we get division by zero.

This generally narrows down to need of having Charts that depend on two (or >1) datasets that are differently and independently filtered.

We tried at least 2 workarounds (described below in Alternatives section) but none worked well or was very hacky and not elegant.

### Proposed Change

We would like to make following changes in Superset in order to implement our dashboards in elegant way.

- Add possibility for datasets to export several values (could be some SQL expression - e.g. aggregation or so performed on dataset data after applying filter) to dictionary accessible by jinja (or other configured template engine). The keys would be strings, and the values could be either numeric, string or array/list in the sense of templating engine.
- Add possibility for dashboards to add plain datasets to them (in addition to charts) and allow user to configure which filters apply to which datasets (as it is for charts now). Added datasets could be placed in some special area of dashboard presented in GUI only in edition mode but hidden (working in background) in run-mode.
- Make necessary changes to ensure that added datasets are calculated first, and then charts when dashboard needs recalculation.
- Make values exported by datasets to be accessible by Charts - i.e. could be used in Chart custom SQL and be processed by Jinja/Templating engine before query is passed to DB.
- Possibly write and register some Jinja functions to reformat values as proper SQL (or other backends) - e.g. Array as `{ val, val2, val3 }` .
- Perform some optimization in case when the the same dataset is added directly to dashboard (to make it export values and be calculated first) and is used by chart in the same dashboard. Optimization should make dataset to be calculated only once.
- Possible other optimization (or maybe explicit option) would be in case when some Chart will use only exported values - i.e. should be only processed by Jinja. And it is in order to avoid sending literals-only queries to database.

With those changes we could have 2 datasets in the dashboard that export `nom` and `denom` respectively and define our Chart to display: `{{nom}} / {{denom}}` (or even make jinja if expression to divide by 1 if denominator is zero).

We think that such changes could highly extend "calculation possibilities" of Superset and also other users will benefit. The good thing about this is that all this could be done without necessity to implement own parsing / expression processing engine. We will use backend DB engine and Jinja / Python engines with already present in Superset possibility to register custom Jinja methods.

We have some both back-end (speaking in Python) and front-end (speaking in Typescript and React) engineers to try implementing those changes and prepare PR, but still need approval for work from company.

### New or Changed Public Interfaces

I see possible to points where interfaces should be extended:

- Exporting values from dataset (new dataset property)
- Adding datasets to dashboards

I think they could be done in backsword compatible way.

### New dependencies

I do not for-see any new dependencies.

### Migration Plan and Compatibility

Currently defined dashboards could assume none plain datasets added, and existing datasets not export any values. This should assure backward compatibility.

### Rejected Alternatives

We considered 2 workarounds:

1. Define dataset as several concatenated queries - like:
```
SELECT
'part1' as "part",
aaa,
bbb,
NULL as "ccc",
ddd
FROM tab1
UNION ALL
SELECT
'part2' as "part",
aaa,
bbb,
ccc,
NULL as "ddd"
FROM tab2
```
where `ccc` can not be derived from `tab1` by any means joins etc. and vice versa with `ddd`.
And then inside Chart try to properly calculate something. It did not worked well when we had filter against one of NULL-able value e.g. `ddd`. When we set same value in filter it caused second part of query to return no records, but for our calculations we need it to not react on that filter at all. Other cons was that in practice the query was quite long having many NULLs at both sides and made it hard to maintain and also less performant (as whole query is redone when anyone filter change).

2. Inside SQL injections in our Charts write one of parts as nested select query: e.g. instead: `SUM("DurationMin")/COUNT(DISTINCT "CustomerCode")` we tried `SUM("DurationMin")/( SELECT .... )`. This worked for some cases, but not always and led to very poor not acceptable performance.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org

[GitHub] [superset] churchill1973 commented on issue #15092: [SIP-69] Proposal for exporting data from datasets as jinja keys to be used in other charts

Posted by GitBox <gi...@apache.org>.

churchill1973 commented on issue #15092:
URL: https://github.com/apache/superset/issues/15092#issuecomment-898204088


   I'm facing, exactly, the very same issue. Furthermore, I was impressed we arrieved to the very same rejected alternative, and then I got stucked in , as you said, when a nullable field is filtered the alternative doesn't work.
   It would be a great advance a feature like join two 2 datasets to get a chart.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org