You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by "Julian Hyde (Jira)" <ji...@apache.org> on 2021/02/11 23:32:00 UTC
[jira] [Created] (CALCITE-4496) Measure columns ("SELECT ... AS MEASURE")

Julian Hyde created CALCITE-4496:
------------------------------------

             Summary: Measure columns ("SELECT ... AS MEASURE")
                 Key: CALCITE-4496
                 URL: https://issues.apache.org/jira/browse/CALCITE-4496
             Project: Calcite
          Issue Type: Bug
            Reporter: Julian Hyde


In multi-dimensional languages such as MDX, DAX, Tableau, you can define calculations in your models that can be re-evaluated in other dimensional contexts. (The models are often called cubes, and the calculations are often called measures.)

In SQL, the model is a view (or a sub-query in the FROM clause) but the columns are just values. Suppose you have a private {{Employees}} table, a {{Departments}} view that rolls {{Employees}} up to department level and has an {{averageSalary}} column. Now suppose you wish to roll up {{averageSalary}} to the region level. The values that went into {{averageSalary}} are not available to you, either directly or indirectly, so the best you can do is to average-the-averages.

In this proposed (and experimental) feature, you can define a special kind of column - a measure - in the SELECT list of a view (or sub-query in a FROM clause), and it remains a calculation. When a query uses a measure column, the calculation is re-evaluated in the context of that query.

To some extent, this breaches the "black box" property of SQL views. Hitherto, a SQL view can be replaced with a table that has the same contents, and all queries that use that view will return the same results. That property no longer holds. But the view remains a useful "hiding" abstraction, and the rows that compose that view cannot be viewed directly.

Like dimensional models, measures in SQL would allow high-level abstractions such as key-performance indicators (KPIs) to be shared and composed. Unlike dimensional models, the models remain relational, namely, it is still possible to enumerate and count the rows in a model.

Consider the following view and query that uses it:
{code:sql}
CREATE VIEW EmpSummary AS
SELECT deptno,
    job,
    AVG(sal) AS avg_sal,
    AVG(sal) AS MEASURE avg_sal_measure,
    COUNT(*) + 1 AS MEASURE count_plus_one_measure
FROM Emp
GROUP BY deptno, job;

SELECT deptno,
    AVG(avg_sal) AS a1,
    AGGREGATE(avg_sal_measure) AS a2,
    AGGREGATE(count_plus_one_measure) AS c1
FROM EmpSummary
GROUP BY deptno;{code}

Note that there is a special aggregate function, {{AGGREGATE}}, that rolls up measures. Columns {{a1}} and {{a2}} will contain different values; the first averages the averages, and the second computes the average from the raw data. Column {{c1}} will return the number of employees in each department plus one, not rolling up the "plus one" for each distinct job in the department.

This is just a brief sketch illustrating the purpose of measures. This feature is experimental, the syntax will no doubt change, and much of the semantics (for example, what expressions are valid as measures, whether measures remain measures they appear in the SELECT clause of an enclosing query, and what is the "context" in which a measure is evaluated) need to be ironed out.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)