You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kylin.apache.org by ShaoFeng Shi <sh...@apache.org> on 2017/04/02 14:17:20 UTC
Re: How Kylin Cuboid Scheduler Work With Aggregation Groups ?
Hi Bing,
An aggregation group is a dimension group, or say a sub-cube; it is NOT a
cuboid.
I guess you want to precisely define the cuboids/combinations, that isn't
supported as in many cases user couldn't list all the combinations they
use. But you can describe them with the agg group / mandatory / joint as
close as possible.
2017-03-31 15:49 GMT+08:00 bingli3@iflytek.com <bi...@iflytek.com>:
> Hi,all
> I have a Cube, the desc is :
>
> {
> "uuid": "bcf11be2-83e4-497e-9e35-a402460a6446",
> "last_modified": 1490860973892,
> "version": "1.6.0",
> "name": "adx_flow_insight",
> "model_name": "adx_operator",
> "description": "",
> "null_string": null,
> "dimensions": [
> {
> "name": "GENDER",
> "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT",
> "column": "GENDER",
> "derived": null
> },
> {
> "name": "AGE",
> "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT",
> "column": "AGE",
> "derived": null
> },
> {
> "name": "BRAND",
> "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT",
> "column": "BRAND",
> "derived": null
> },
> {
> "name": "MODEL",
> "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT",
> "column": "MODEL",
> "derived": null
> },
> {
> "name": "RESOLUTION",
> "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT",
> "column": "RESOLUTION",
> "derived": null
> },
> {
> "name": "OS_VERSION",
> "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT",
> "column": "OS_VERSION",
> "derived": null
> },
> {
> "name": "NTT",
> "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT",
> "column": "NTT",
> "derived": null
> },
> {
> "name": "TS_MINUTE",
> "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT",
> "column": "TS_MINUTE",
> "derived": null
> },
> {
> "name": "TS_HOUR",
> "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT",
> "column": "TS_HOUR",
> "derived": null
> },
> {
> "name": "DAY_TIME",
> "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT",
> "column": "DAY_TIME",
> "derived": null
> }
> ],
> "measures": [
> {
> "name": "_COUNT_",
> "function": {
> "expression": "COUNT",
> "parameter": {
> "type": "constant",
> "value": "1",
> "next_parameter": null
> },
> "returntype": "bigint"
> },
> "dependent_measure_ref": null
> },
> {
> "name": "REQUEST_PV",
> "function": {
> "expression": "SUM",
> "parameter": {
> "type": "column",
> "value": "REQUEST",
> "next_parameter": null
> },
> "returntype": "bigint"
> },
> "dependent_measure_ref": null
> },
> {
> "name": "IMPRESS_PV",
> "function": {
> "expression": "SUM",
> "parameter": {
> "type": "column",
> "value": "IMPRESS",
> "next_parameter": null
> },
> "returntype": "bigint"
> },
> "dependent_measure_ref": null
> },
> {
> "name": "CLICK_PV",
> "function": {
> "expression": "SUM",
> "parameter": {
> "type": "column",
> "value": "CLICK",
> "next_parameter": null
> },
> "returntype": "bigint"
> },
> "dependent_measure_ref": null
> },
> {
> "name": "FILL_PV",
> "function": {
> "expression": "SUM",
> "parameter": {
> "type": "column",
> "value": "FILL",
> "next_parameter": null
> },
> "returntype": "bigint"
> },
> "dependent_measure_ref": null
> },
> {
> "name": "UV_DID",
> "function": {
> "expression": "COUNT_DISTINCT",
> "parameter": {
> "type": "column",
> "value": "DID",
> "next_parameter": null
> },
> "returntype": "hllc(10)"
> },
> "dependent_measure_ref": null
> }
> ],
> "dictionaries": [],
> "rowkey": {
> "rowkey_columns": [
> {
> "column": "DAY_TIME",
> "encoding": "date",
> "isShardBy": false
> },
> {
> "column": "TS_MINUTE",
> "encoding": "integer:4",
> "isShardBy": false
> },
> {
> "column": "TS_HOUR",
> "encoding": "integer:4",
> "isShardBy": false
> },
> {
> "column": "GENDER",
> "encoding": "dict",
> "isShardBy": false
> },
> {
> "column": "AGE",
> "encoding": "dict",
> "isShardBy": false
> },
> {
> "column": "BRAND",
> "encoding": "dict",
> "isShardBy": false
> },
> {
> "column": "MODEL",
> "encoding": "dict",
> "isShardBy": false
> },
> {
> "column": "RESOLUTION",
> "encoding": "dict",
> "isShardBy": false
> },
> {
> "column": "OS_VERSION",
> "encoding": "dict",
> "isShardBy": false
> },
> {
> "column": "NTT",
> "encoding": "dict",
> "isShardBy": false
> }
> ]
> },
> "hbase_mapping": {
> "column_family": [
> {
> "name": "F1",
> "columns": [
> {
> "qualifier": "M",
> "measure_refs": [
> "_COUNT_",
> "REQUEST_PV",
> "IMPRESS_PV",
> "CLICK_PV",
> "FILL_PV"
> ]
> }
> ]
> },
> {
> "name": "F2",
> "columns": [
> {
> "qualifier": "M",
> "measure_refs": [
> "UV_DID"
> ]
> }
> ]
> }
> ]
> },
> "aggregation_groups": [
> {
> "includes": [
> "DAY_TIME",
> "GENDER"
> ],
> "select_rule": {
> "hierarchy_dims": [],
> "mandatory_dims": [
> "DAY_TIME"
> ],
> "joint_dims": []
> }
> },
> {
> "includes": [
> "DAY_TIME",
> "AGE"
> ],
> "select_rule": {
> "hierarchy_dims": [],
> "mandatory_dims": [
> "DAY_TIME"
> ],
> "joint_dims": []
> }
> },
> {
> "includes": [
> "DAY_TIME",
> "BRAND"
> ],
> "select_rule": {
> "hierarchy_dims": [],
> "mandatory_dims": [
> "DAY_TIME"
> ],
> "joint_dims": []
> }
> },
> {
> "includes": [
> "DAY_TIME",
> "MODEL"
> ],
> "select_rule": {
> "hierarchy_dims": [],
> "mandatory_dims": [
> "DAY_TIME"
> ],
> "joint_dims": []
> }
> },
> {
> "includes": [
> "DAY_TIME",
> "RESOLUTION"
> ],
> "select_rule": {
> "hierarchy_dims": [],
> "mandatory_dims": [
> "RESOLUTION"
> ],
> "joint_dims": []
> }
> },
> {
> "includes": [
> "DAY_TIME",
> "OS_VERSION"
> ],
> "select_rule": {
> "hierarchy_dims": [],
> "mandatory_dims": [
> "DAY_TIME"
> ],
> "joint_dims": []
> }
> },
> {
> "includes": [
> "DAY_TIME",
> "NTT"
> ],
> "select_rule": {
> "hierarchy_dims": [],
> "mandatory_dims": [
> "DAY_TIME"
> ],
> "joint_dims": []
> }
> },
> {
> "includes": [
> "TS_MINUTE"
> ],
> "select_rule": {
> "hierarchy_dims": [],
> "mandatory_dims": [
> "TS_MINUTE"
> ],
> "joint_dims": []
> }
> },
> {
> "includes": [
> "TS_HOUR"
> ],
> "select_rule": {
> "hierarchy_dims": [],
> "mandatory_dims": [
> "TS_HOUR"
> ],
> "joint_dims": []
> }
> }
> ],
> "signature": "DSSmByHn2sATiETlBdjANQ==",
> "notify_list": [],
> "status_need_notify": [
> "ERROR",
> "DISCARDED",
> "SUCCEED"
> ],
> "partition_date_start": 1488326400000,
> "partition_date_end": 3153600000000,
> "auto_merge_time_ranges": [
> 604800000,
> 2419200000
> ],
> "retention_range": 0,
> "engine_type": 2,
> "storage_type": 2,
> "override_kylin_properties": {
> "kylin.job.mr.config.override.mapreduce.job.queuename": "ad"
> }
>
> }
>
> There have 10 dims, and use aggregation groups. I want Cube only
> contains 10 combs:
> <day_time, gender> 576
> <day_time, age> 544
> <day_time, brand> 528
> <day_time, model> 520
> <day_time, resolution> 516
> <day_time, os_version> 514
> <day_time, ntt> 513
> <ts_minute> 256
> <ts_hour> 128
> <day_time> 512
>
> But the Cuboid Scheduler parse as follower:
>
> 2017-03-31 15:32:47,735 (main) [INFO - org..apache.kylin.cube.CubeManager.
> loadAllCubeInstance(CubeManager.java:908)] Loaded 4 cubes, fail on 0 cubes
> 1023
> 516
> 576
> 513
> 528
> 514
> 544
> 520
> 2017-03-31 15:32:47,742 (Thread-0) [INFO - org.apache.hadoop.hbase.client.
> ConnectionManager$HConnectionImplementation.closeMasterService(
> ConnectionManager.java:2259)] Closing master protocol: MasterService
>
> Question:
> How Aggregation Groups Work? I can not set single dim in
> aggregation?
>
> Thanks for you suggestion~~~
>
> ------------------------------
> bingli3@iflytek.com
>
--
Best regards,
Shaofeng Shi 史少锋