You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kylin.apache.org by ShaoFeng Shi <sh...@apache.org> on 2017/04/02 14:17:20 UTC

Re: How Kylin Cuboid Scheduler Work With Aggregation Groups ?

Hi Bing,

An aggregation group is a dimension group, or say a sub-cube; it is NOT a
cuboid.

I guess you want to precisely define the cuboids/combinations, that isn't
supported as in many cases user couldn't list all the combinations they
use. But you can describe them with the agg group / mandatory / joint as
close as possible.

2017-03-31 15:49 GMT+08:00 bingli3@iflytek.com <bi...@iflytek.com>:

>   Hi,all
>       I have a Cube, the desc is :
>
> {
>   "uuid": "bcf11be2-83e4-497e-9e35-a402460a6446",
>   "last_modified": 1490860973892,
>   "version": "1.6.0",
>   "name": "adx_flow_insight",
>   "model_name": "adx_operator",
>   "description": "",
>   "null_string": null,
>   "dimensions": [
>     {
>       "name": "GENDER",
>       "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT",
>       "column": "GENDER",
>       "derived": null
>     },
>     {
>       "name": "AGE",
>       "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT",
>       "column": "AGE",
>       "derived": null
>     },
>     {
>       "name": "BRAND",
>       "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT",
>       "column": "BRAND",
>       "derived": null
>     },
>     {
>       "name": "MODEL",
>       "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT",
>       "column": "MODEL",
>       "derived": null
>     },
>     {
>       "name": "RESOLUTION",
>       "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT",
>       "column": "RESOLUTION",
>       "derived": null
>     },
>     {
>       "name": "OS_VERSION",
>       "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT",
>       "column": "OS_VERSION",
>       "derived": null
>     },
>     {
>       "name": "NTT",
>       "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT",
>       "column": "NTT",
>       "derived": null
>     },
>     {
>       "name": "TS_MINUTE",
>       "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT",
>       "column": "TS_MINUTE",
>       "derived": null
>     },
>     {
>       "name": "TS_HOUR",
>       "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT",
>       "column": "TS_HOUR",
>       "derived": null
>     },
>     {
>       "name": "DAY_TIME",
>       "table": "FLOW_INSIGHT.VIEW_FLOW_INSIGHT",
>       "column": "DAY_TIME",
>       "derived": null
>     }
>   ],
>   "measures": [
>     {
>       "name": "_COUNT_",
>       "function": {
>         "expression": "COUNT",
>         "parameter": {
>           "type": "constant",
>           "value": "1",
>           "next_parameter": null
>         },
>         "returntype": "bigint"
>       },
>       "dependent_measure_ref": null
>     },
>     {
>       "name": "REQUEST_PV",
>       "function": {
>         "expression": "SUM",
>         "parameter": {
>           "type": "column",
>           "value": "REQUEST",
>           "next_parameter": null
>         },
>         "returntype": "bigint"
>       },
>       "dependent_measure_ref": null
>     },
>     {
>       "name": "IMPRESS_PV",
>       "function": {
>         "expression": "SUM",
>         "parameter": {
>           "type": "column",
>           "value": "IMPRESS",
>           "next_parameter": null
>         },
>         "returntype": "bigint"
>       },
>       "dependent_measure_ref": null
>     },
>     {
>       "name": "CLICK_PV",
>       "function": {
>         "expression": "SUM",
>         "parameter": {
>           "type": "column",
>           "value": "CLICK",
>           "next_parameter": null
>         },
>         "returntype": "bigint"
>       },
>       "dependent_measure_ref": null
>     },
>     {
>       "name": "FILL_PV",
>       "function": {
>         "expression": "SUM",
>         "parameter": {
>           "type": "column",
>           "value": "FILL",
>           "next_parameter": null
>         },
>         "returntype": "bigint"
>       },
>       "dependent_measure_ref": null
>     },
>     {
>       "name": "UV_DID",
>       "function": {
>         "expression": "COUNT_DISTINCT",
>         "parameter": {
>           "type": "column",
>           "value": "DID",
>           "next_parameter": null
>         },
>         "returntype": "hllc(10)"
>       },
>       "dependent_measure_ref": null
>     }
>   ],
>   "dictionaries": [],
>   "rowkey": {
>     "rowkey_columns": [
>       {
>         "column": "DAY_TIME",
>         "encoding": "date",
>         "isShardBy": false
>       },
>       {
>         "column": "TS_MINUTE",
>         "encoding": "integer:4",
>         "isShardBy": false
>       },
>       {
>         "column": "TS_HOUR",
>         "encoding": "integer:4",
>         "isShardBy": false
>       },
>       {
>         "column": "GENDER",
>         "encoding": "dict",
>         "isShardBy": false
>       },
>       {
>         "column": "AGE",
>         "encoding": "dict",
>         "isShardBy": false
>       },
>       {
>         "column": "BRAND",
>         "encoding": "dict",
>         "isShardBy": false
>       },
>       {
>         "column": "MODEL",
>         "encoding": "dict",
>         "isShardBy": false
>       },
>       {
>         "column": "RESOLUTION",
>         "encoding": "dict",
>         "isShardBy": false
>       },
>       {
>         "column": "OS_VERSION",
>         "encoding": "dict",
>         "isShardBy": false
>       },
>       {
>         "column": "NTT",
>         "encoding": "dict",
>         "isShardBy": false
>       }
>     ]
>   },
>   "hbase_mapping": {
>     "column_family": [
>       {
>         "name": "F1",
>         "columns": [
>           {
>             "qualifier": "M",
>             "measure_refs": [
>               "_COUNT_",
>               "REQUEST_PV",
>               "IMPRESS_PV",
>               "CLICK_PV",
>               "FILL_PV"
>             ]
>           }
>         ]
>       },
>       {
>         "name": "F2",
>         "columns": [
>           {
>             "qualifier": "M",
>             "measure_refs": [
>               "UV_DID"
>             ]
>           }
>         ]
>       }
>     ]
>   },
>   "aggregation_groups": [
>     {
>       "includes": [
>         "DAY_TIME",
>         "GENDER"
>       ],
>       "select_rule": {
>         "hierarchy_dims": [],
>         "mandatory_dims": [
>           "DAY_TIME"
>         ],
>         "joint_dims": []
>       }
>     },
>     {
>       "includes": [
>         "DAY_TIME",
>         "AGE"
>       ],
>       "select_rule": {
>         "hierarchy_dims": [],
>         "mandatory_dims": [
>           "DAY_TIME"
>         ],
>         "joint_dims": []
>       }
>     },
>     {
>       "includes": [
>         "DAY_TIME",
>         "BRAND"
>       ],
>       "select_rule": {
>         "hierarchy_dims": [],
>         "mandatory_dims": [
>           "DAY_TIME"
>         ],
>         "joint_dims": []
>       }
>     },
>     {
>       "includes": [
>         "DAY_TIME",
>         "MODEL"
>       ],
>       "select_rule": {
>         "hierarchy_dims": [],
>         "mandatory_dims": [
>           "DAY_TIME"
>         ],
>         "joint_dims": []
>       }
>     },
>     {
>       "includes": [
>         "DAY_TIME",
>         "RESOLUTION"
>       ],
>       "select_rule": {
>         "hierarchy_dims": [],
>         "mandatory_dims": [
>           "RESOLUTION"
>         ],
>         "joint_dims": []
>       }
>     },
>     {
>       "includes": [
>         "DAY_TIME",
>         "OS_VERSION"
>       ],
>       "select_rule": {
>         "hierarchy_dims": [],
>         "mandatory_dims": [
>           "DAY_TIME"
>         ],
>         "joint_dims": []
>       }
>     },
>     {
>       "includes": [
>         "DAY_TIME",
>         "NTT"
>       ],
>       "select_rule": {
>         "hierarchy_dims": [],
>         "mandatory_dims": [
>           "DAY_TIME"
>         ],
>         "joint_dims": []
>       }
>     },
>     {
>       "includes": [
>         "TS_MINUTE"
>       ],
>       "select_rule": {
>         "hierarchy_dims": [],
>         "mandatory_dims": [
>           "TS_MINUTE"
>         ],
>         "joint_dims": []
>       }
>     },
>     {
>       "includes": [
>         "TS_HOUR"
>       ],
>       "select_rule": {
>         "hierarchy_dims": [],
>         "mandatory_dims": [
>           "TS_HOUR"
>         ],
>         "joint_dims": []
>       }
>     }
>   ],
>   "signature": "DSSmByHn2sATiETlBdjANQ==",
>   "notify_list": [],
>   "status_need_notify": [
>     "ERROR",
>     "DISCARDED",
>     "SUCCEED"
>   ],
>   "partition_date_start": 1488326400000,
>   "partition_date_end": 3153600000000,
>   "auto_merge_time_ranges": [
>     604800000,
>     2419200000
>   ],
>   "retention_range": 0,
>   "engine_type": 2,
>   "storage_type": 2,
>   "override_kylin_properties": {
>     "kylin.job.mr.config.override.mapreduce.job.queuename": "ad"
>   }
>
> }
>
>    There have 10 dims, and use aggregation groups. I want Cube only
> contains 10 combs:
>        <day_time, gender> 576
>        <day_time, age> 544
>        <day_time, brand> 528
>        <day_time, model> 520
>        <day_time, resolution> 516
>        <day_time, os_version> 514
>        <day_time, ntt>        513
>        <ts_minute>     256
>        <ts_hour>     128
>        <day_time>   512
>
>      But the Cuboid Scheduler parse as follower:
>
> 2017-03-31 15:32:47,735 (main) [INFO - org..apache.kylin.cube.CubeManager.
> loadAllCubeInstance(CubeManager.java:908)] Loaded 4 cubes, fail on 0 cubes
> 1023
> 516
> 576
> 513
> 528
> 514
> 544
> 520
> 2017-03-31 15:32:47,742 (Thread-0) [INFO - org.apache.hadoop.hbase.client.
> ConnectionManager$HConnectionImplementation.closeMasterService(
> ConnectionManager.java:2259)] Closing master protocol: MasterService
>
> Question:
>      How Aggregation Groups  Work? I can not set single dim in
> aggregation?
>
> Thanks for you suggestion~~~
>
> ------------------------------
> bingli3@iflytek.com
>



-- 
Best regards,

Shaofeng Shi 史少锋