You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kylin.apache.org by we...@corp.netease.com on 2019/10/28 09:31:05 UTC
need help with non-static dimensions
Hi, all. It seems that in Kylin, the dimensions of input data are
considered as static. How to use kylin to process input data whose
dimensions may change.
For example, there are input data with fields "t_date, user_id, user_server,
user_os". However, user's login server may change during the day. If I want
to calc the DAU of the server and os, I need to dedup like this:
SELECT
t_date, user_server, user_os, COUNT(DISTINCT user_id) AS user_cnt
FROM
(
SELECT
t_date, user_id, user_os,
-- one user maps to only one server
MAX(user_server) AS user_server
FROM Src
GROUP BY t_date, user_id, user_os
)
GROUP BY t_date, user_server, user_os
Because user can not be counted in more than one server.
Suppose there are inputs:
t_date user_id user_server user_os
20191028 Lily 100 Windows
20191028 Lily 101 Windows
The expected result is
t_date user_server user_os user_cnt
20191028 100 Windows 0
20191028 101 Windows 1
But the result from Kylin may be:
t_date user_server user_os user_cnt
20191028 100 Windows 1
20191028 101 Windows 1
which is not what I expect.
How should I do to deal with the input data with non-static dimensions ?