You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kylin.apache.org by Billy Liu <bi...@apache.org> on 2018/04/01 15:23:11 UTC

Re: How to use MR to build UHC dimensions

Hi Fei Yi,

This parameter only works for ultra high cardinality columns,
including the columns defined as "ShardBy" and "Global Dictionary".
Please check if your cube has these two definitions.

With Warm regards

Billy Liu


2018-03-30 16:45 GMT+08:00 Fei Yi <yi...@gmail.com>:
> I use kylin 2.3.1 version,
> set kylin.engine.mr.build-uhc-dict-in-additional-step=true
> kylin.snapshot.max-mb=3000
>
> but job are still built in kylin server, I don't see a separate step to
> build UHC dimensions
>
>

Re: How to use MR to build UHC dimensions

Posted by Billy Liu <bi...@apache.org>.
From the metadata, we found the global dictionary was not created
successfully. The expected metadata should be like "dictionaries": [ {

  "column": "ORDER_ID",
  "builder": "org.apache.kylin.dict.GlobalDictionaryBuilder"
} ]

If you could reproduce this issue, please file an JIRA. It seems a bug
from frontend here.


With Warm regards

Billy Liu


2018-04-02 11:13 GMT+08:00 Fei Yi <yi...@gmail.com>:
> Hi Billy,
> I those a dimension with 60,000,000 data, measure is
> count_distinct(order_id),
> when i add the column "order_id" as global dictionary,web ui prompt created
> successfully.
> but global dictionary column are not displayed on the web ui ,and there are
> no any errors in the log file.
>
> Thanks for your help
>
> this is the log:
>
> 2018-04-02 10:43:22,354 DEBUG [http-bio-7070-exec-6]
> controller.CubeController:1010 : Saving cube {
>   "name": "GLD_MR_TEST",
>   "model_name": "M_ORDER",
>   "description": "",
>   "dimensions": [
>     {
>       "name": "CALENDAR_DATE",
>       "table": "OD",
>       "column": "CALENDAR_DATE",
>       "normal": "true"
>     },
>     {
>       "name": "YEAR_MONTH",
>       "table": "OD",
>       "column": "YEAR_MONTH",
>       "normal": "true"
>     }
>   ],
>   "measures": [
>     {
>       "name": "_COUNT_",
>       "function": {
>         "expression": "COUNT",
>         "returntype": "bigint",
>         "parameter": {
>           "type": "constant",
>           "value": "1"
>         },
>         "configuration": {}
>       }
>     },
>     {
>       "name": "CD",
>       "function": {
>         "expression": "COUNT_DISTINCT",
>         "returntype": "bitmap",
>         "parameter": {
>           "type": "column",
>           "value": "FACT_ORDER_DETAIL.ORDER_ID"
>         }
>       },
>       "showDim": false
>     }
>   ],
>   "dictionaries": [],
>   "rowkey": {
>     "rowkey_columns": [
>       {
>         "column": "OD.CALENDAR_DATE",
>         "encoding": "dict",
>         "isShardBy": "false",
>         "encoding_version": 1
>       },
>       {
>         "column": "OD.YEAR_MONTH",
>         "encoding": "dict",
>         "isShardBy": "false",
>         "encoding_version": 1
>       }
>     ]
>   },
>   "aggregation_groups": [
>     {
>       "includes": [
>         "OD.CALENDAR_DATE",
>         "OD.YEAR_MONTH"
>       ],
>       "select_rule": {
>         "hierarchy_dims": [],
>         "mandatory_dims": [
>           "OD.CALENDAR_DATE",
>           "OD.YEAR_MONTH"
>         ],
>         "joint_dims": []
>       }
>     }
>   ],
>   "mandatory_dimension_set_list": [],
>   "partition_date_start": 1514764800000,
>   "notify_list": [],
>   "hbase_mapping": {
>     "column_family": [
>       {
>         "name": "F1",
>         "columns": [
>           {
>             "qualifier": "M",
>             "measure_refs": [
>               "_COUNT_"
>             ]
>           }
>         ]
>       },
>       {
>         "name": "F2",
>         "columns": [
>           {
>             "qualifier": "M",
>             "measure_refs": [
>               "CD"
>             ]
>           }
>         ]
>       }
>     ]
>   },
>   "volatile_range": "0",
>   "retention_range": "0",
>   "status_need_notify": [
>     "ERROR",
>     "DISCARDED",
>     "SUCCEED"
>   ],
>   "auto_merge_time_ranges": [],
>   "engine_type": 2,
>   "storage_type": "2",
>   "override_kylin_properties": {}
> }
> 2018-04-02 10:43:22,356 DEBUG [http-bio-7070-exec-6]
> cachesync.CachedCrudAssist:190 : Saving CubeDesc at
> /cube_desc/GLD_MR_TEST.json
> 2018-04-02 10:43:22,359 DEBUG [pool-6-thread-1] cachesync.Broadcaster:113 :
> Servers in the cluster: [localhost:7070]
> 2018-04-02 10:43:22,359 DEBUG [pool-6-thread-1] cachesync.Broadcaster:123 :
> Announcing new broadcast to all: BroadcastEvent{entity=cube_desc,
> event=create, cacheKey=GLD_MR_TEST}
> 2018-04-02 10:43:22,361 DEBUG [http-bio-7070-exec-4]
> cachesync.Broadcaster:247 : Broadcasting CREATE, cube_desc, GLD_MR_TEST
> 2018-04-02 10:43:22,361 INFO  [http-bio-7070-exec-6] service.CubeService:211
> : New cube GLD_MR_TEST has 1 cuboids
> 2018-04-02 10:43:22,362 INFO  [http-bio-7070-exec-6] cube.CubeManager:219 :
> Creating cube 'dw_zyb-->GLD_MR_TEST' from desc 'GLD_MR_TEST'
> 2018-04-02 10:43:22,362 INFO  [http-bio-7070-exec-6] cube.CubeManager:297 :
> Updating cube instance 'GLD_MR_TEST'
> 2018-04-02 10:43:22,362 DEBUG [http-bio-7070-exec-6]
> cachesync.CachedCrudAssist:190 : Saving CubeInstance at
> /cube/GLD_MR_TEST.json
> 2018-04-02 10:43:22,362 DEBUG [http-bio-7070-exec-4]
> cachesync.Broadcaster:247 : Broadcasting UPDATE, project_schema, dw_zyb
> 2018-04-02 10:43:22,364 DEBUG [pool-6-thread-1] cachesync.Broadcaster:113 :
> Servers in the cluster: [localhost:7070]
> 2018-04-02 10:43:22,364 DEBUG [pool-6-thread-1] cachesync.Broadcaster:123 :
> Announcing new broadcast to all: BroadcastEvent{entity=cube, event=create,
> cacheKey=GLD_MR_TEST}
> 2018-04-02 10:43:22,365 DEBUG [http-bio-7070-exec-6]
> cachesync.CachedCrudAssist:190 : Saving ProjectInstance at
> /project/dw_zyb.json
> 2018-04-02 10:43:22,367 DEBUG [pool-6-thread-1] cachesync.Broadcaster:113 :
> Servers in the cluster: [localhost:7070]
> 2018-04-02 10:43:22,367 DEBUG [pool-6-thread-1] cachesync.Broadcaster:123 :
> Announcing new broadcast to all: BroadcastEvent{entity=project,
> event=update, cacheKey=dw_zyb}
> 2018-04-02 10:43:22,376 DEBUG [http-bio-7070-exec-4]
> project.ProjectL2Cache:195 : Loading L2 project cache for dw_zyb
> 2018-04-02 10:43:22,376 WARN  [http-bio-7070-exec-4]
> realization.RealizationRegistry:91 : No provider for realization type
> INVERTED_INDEX
> 2018-04-02 10:43:22,378 INFO  [http-bio-7070-exec-4]
> service.CacheService:120 : cleaning cache for project dw_zyb (currently
> remove all entries)
> 2018-04-02 10:43:22,378 DEBUG [http-bio-7070-exec-4]
> cachesync.Broadcaster:281 : Done broadcasting UPDATE, project_schema, dw_zyb
> 2018-04-02 10:43:22,378 DEBUG [http-bio-7070-exec-4]
> cachesync.Broadcaster:281 : Done broadcasting CREATE, cube_desc, GLD_MR_TEST
> 2018-04-02 10:43:22,381 DEBUG [http-bio-7070-exec-1]
> cachesync.Broadcaster:247 : Broadcasting CREATE, cube, GLD_MR_TEST
> 2018-04-02 10:43:22,383 DEBUG [http-bio-7070-exec-1]
> cachesync.Broadcaster:247 : Broadcasting UPDATE, project_data, dw_zyb
> 2018-04-02 10:43:22,383 INFO  [http-bio-7070-exec-1]
> service.CacheService:120 : cleaning cache for project dw_zyb (currently
> remove all entries)
> 2018-04-02 10:43:22,383 DEBUG [http-bio-7070-exec-1]
> cachesync.Broadcaster:281 : Done broadcasting UPDATE, project_data, dw_zyb
> 2018-04-02 10:43:22,383 DEBUG [http-bio-7070-exec-1]
> cachesync.Broadcaster:281 : Done broadcasting CREATE, cube, GLD_MR_TEST
> 2018-04-02 10:43:22,386 DEBUG [http-bio-7070-exec-1]
> cachesync.Broadcaster:247 : Broadcasting UPDATE, project, dw_zyb
> 2018-04-02 10:43:22,387 DEBUG [http-bio-7070-exec-1]
> project.ProjectL2Cache:195 : Loading L2 project cache for dw_zyb
> 2018-04-02 10:43:22,387 WARN  [http-bio-7070-exec-1]
> realization.RealizationRegistry:91 : No provider for realization type
> INVERTED_INDEX
> 2018-04-02 10:43:22,387 DEBUG [http-bio-7070-exec-1]
> cachesync.Broadcaster:247 : Broadcasting UPDATE, project_schema, dw_zyb
> 2018-04-02 10:43:22,402 DEBUG [http-bio-7070-exec-1]
> project.ProjectL2Cache:195 : Loading L2 project cache for dw_zyb
> 2018-04-02 10:43:22,402 WARN  [http-bio-7070-exec-1]
> realization.RealizationRegistry:91 : No provider for realization type
> INVERTED_INDEX
> 2018-04-02 10:43:22,404 INFO  [http-bio-7070-exec-1]
> service.CacheService:120 : cleaning cache for project dw_zyb (currently
> remove all entries)
> 2018-04-02 10:43:22,404 DEBUG [http-bio-7070-exec-1]
> cachesync.Broadcaster:281 : Done broadcasting UPDATE, project_schema, dw_zyb
> 2018-04-02 10:43:22,405 DEBUG [http-bio-7070-exec-1]
> cachesync.Broadcaster:247 : Broadcasting UPDATE, project_data, dw_zyb
> 2018-04-02 10:43:22,405 INFO  [http-bio-7070-exec-1]
> service.CacheService:120 : cleaning cache for project dw_zyb (currently
> remove all entries)
> 2018-04-02 10:43:22,405 DEBUG [http-bio-7070-exec-1]
> cachesync.Broadcaster:281 : Done broadcasting UPDATE, project_data, dw_zyb
> 2018-04-02 10:43:22,405 DEBUG [http-bio-7070-exec-1]
> cachesync.Broadcaster:281 : Done broadcasting UPDATE, project, dw_zyb
>
> 2018-04-01 23:23 GMT+08:00 Billy Liu <bi...@apache.org>:
>>
>> Hi Fei Yi,
>>
>> This parameter only works for ultra high cardinality columns,
>> including the columns defined as "ShardBy" and "Global Dictionary".
>> Please check if your cube has these two definitions.
>>
>> With Warm regards
>>
>> Billy Liu
>>
>>
>> 2018-03-30 16:45 GMT+08:00 Fei Yi <yi...@gmail.com>:
>> > I use kylin 2.3.1 version,
>> > set kylin.engine.mr.build-uhc-dict-in-additional-step=true
>> > kylin.snapshot.max-mb=3000
>> >
>> > but job are still built in kylin server, I don't see a separate step to
>> > build UHC dimensions
>> >
>> >
>
>

Re: How to use MR to build UHC dimensions

Posted by Fei Yi <yi...@gmail.com>.
Hi Billy,
I those a dimension with 60,000,000 data, measure is
count_distinct(order_id),
when i add the column "order_id" as global dictionary,web ui prompt created
successfully.
but global dictionary column are not displayed on the web ui ,and there are
no any errors in the log file.

Thanks for your help

this is the log:

2018-04-02 10:43:22,354 DEBUG [http-bio-7070-exec-6]
controller.CubeController:1010 : Saving cube {
  "name": "GLD_MR_TEST",
  "model_name": "M_ORDER",
  "description": "",
  "dimensions": [
    {
      "name": "CALENDAR_DATE",
      "table": "OD",
      "column": "CALENDAR_DATE",
      "normal": "true"
    },
    {
      "name": "YEAR_MONTH",
      "table": "OD",
      "column": "YEAR_MONTH",
      "normal": "true"
    }
  ],
  "measures": [
    {
      "name": "_COUNT_",
      "function": {
        "expression": "COUNT",
        "returntype": "bigint",
        "parameter": {
          "type": "constant",
          "value": "1"
        },
        "configuration": {}
      }
    },
    {
      "name": "CD",
      "function": {
        "expression": "COUNT_DISTINCT",
        "returntype": "bitmap",
        "parameter": {
          "type": "column",
          "value": "FACT_ORDER_DETAIL.ORDER_ID"
        }
      },
      "showDim": false
    }
  ],
  "dictionaries": [],
  "rowkey": {
    "rowkey_columns": [
      {
        "column": "OD.CALENDAR_DATE",
        "encoding": "dict",
        "isShardBy": "false",
        "encoding_version": 1
      },
      {
        "column": "OD.YEAR_MONTH",
        "encoding": "dict",
        "isShardBy": "false",
        "encoding_version": 1
      }
    ]
  },
  "aggregation_groups": [
    {
      "includes": [
        "OD.CALENDAR_DATE",
        "OD.YEAR_MONTH"
      ],
      "select_rule": {
        "hierarchy_dims": [],
        "mandatory_dims": [
          "OD.CALENDAR_DATE",
          "OD.YEAR_MONTH"
        ],
        "joint_dims": []
      }
    }
  ],
  "mandatory_dimension_set_list": [],
  "partition_date_start": 1514764800000,
  "notify_list": [],
  "hbase_mapping": {
    "column_family": [
      {
        "name": "F1",
        "columns": [
          {
            "qualifier": "M",
            "measure_refs": [
              "_COUNT_"
            ]
          }
        ]
      },
      {
        "name": "F2",
        "columns": [
          {
            "qualifier": "M",
            "measure_refs": [
              "CD"
            ]
          }
        ]
      }
    ]
  },
  "volatile_range": "0",
  "retention_range": "0",
  "status_need_notify": [
    "ERROR",
    "DISCARDED",
    "SUCCEED"
  ],
  "auto_merge_time_ranges": [],
  "engine_type": 2,
  "storage_type": "2",
  "override_kylin_properties": {}
}
2018-04-02 10:43:22,356 DEBUG [http-bio-7070-exec-6]
cachesync.CachedCrudAssist:190 : Saving CubeDesc at
/cube_desc/GLD_MR_TEST.json
2018-04-02 10:43:22,359 DEBUG [pool-6-thread-1] cachesync.Broadcaster:113 :
Servers in the cluster: [localhost:7070]
2018-04-02 10:43:22,359 DEBUG [pool-6-thread-1] cachesync.Broadcaster:123 :
Announcing new broadcast to all: BroadcastEvent{entity=cube_desc,
event=create, cacheKey=GLD_MR_TEST}
2018-04-02 10:43:22,361 DEBUG [http-bio-7070-exec-4]
cachesync.Broadcaster:247 : Broadcasting CREATE, cube_desc, GLD_MR_TEST
2018-04-02 10:43:22,361 INFO  [http-bio-7070-exec-6]
service.CubeService:211 : New cube GLD_MR_TEST has 1 cuboids
2018-04-02 10:43:22,362 INFO  [http-bio-7070-exec-6] cube.CubeManager:219 :
Creating cube 'dw_zyb-->GLD_MR_TEST' from desc 'GLD_MR_TEST'
2018-04-02 10:43:22,362 INFO  [http-bio-7070-exec-6] cube.CubeManager:297 :
Updating cube instance 'GLD_MR_TEST'
2018-04-02 10:43:22,362 DEBUG [http-bio-7070-exec-6]
cachesync.CachedCrudAssist:190 : Saving CubeInstance at
/cube/GLD_MR_TEST.json
2018-04-02 10:43:22,362 DEBUG [http-bio-7070-exec-4]
cachesync.Broadcaster:247 : Broadcasting UPDATE, project_schema, dw_zyb
2018-04-02 10:43:22,364 DEBUG [pool-6-thread-1] cachesync.Broadcaster:113 :
Servers in the cluster: [localhost:7070]
2018-04-02 10:43:22,364 DEBUG [pool-6-thread-1] cachesync.Broadcaster:123 :
Announcing new broadcast to all: BroadcastEvent{entity=cube, event=create,
cacheKey=GLD_MR_TEST}
2018-04-02 10:43:22,365 DEBUG [http-bio-7070-exec-6]
cachesync.CachedCrudAssist:190 : Saving ProjectInstance at
/project/dw_zyb.json
2018-04-02 10:43:22,367 DEBUG [pool-6-thread-1] cachesync.Broadcaster:113 :
Servers in the cluster: [localhost:7070]
2018-04-02 10:43:22,367 DEBUG [pool-6-thread-1] cachesync.Broadcaster:123 :
Announcing new broadcast to all: BroadcastEvent{entity=project,
event=update, cacheKey=dw_zyb}
2018-04-02 10:43:22,376 DEBUG [http-bio-7070-exec-4]
project.ProjectL2Cache:195 : Loading L2 project cache for dw_zyb
2018-04-02 10:43:22,376 WARN  [http-bio-7070-exec-4]
realization.RealizationRegistry:91 : No provider for realization type
INVERTED_INDEX
2018-04-02 10:43:22,378 INFO  [http-bio-7070-exec-4]
service.CacheService:120 : cleaning cache for project dw_zyb (currently
remove all entries)
2018-04-02 10:43:22,378 DEBUG [http-bio-7070-exec-4]
cachesync.Broadcaster:281 : Done broadcasting UPDATE, project_schema, dw_zyb
2018-04-02 10:43:22,378 DEBUG [http-bio-7070-exec-4]
cachesync.Broadcaster:281 : Done broadcasting CREATE, cube_desc, GLD_MR_TEST
2018-04-02 10:43:22,381 DEBUG [http-bio-7070-exec-1]
cachesync.Broadcaster:247 : Broadcasting CREATE, cube, GLD_MR_TEST
2018-04-02 10:43:22,383 DEBUG [http-bio-7070-exec-1]
cachesync.Broadcaster:247 : Broadcasting UPDATE, project_data, dw_zyb
2018-04-02 10:43:22,383 INFO  [http-bio-7070-exec-1]
service.CacheService:120 : cleaning cache for project dw_zyb (currently
remove all entries)
2018-04-02 10:43:22,383 DEBUG [http-bio-7070-exec-1]
cachesync.Broadcaster:281 : Done broadcasting UPDATE, project_data, dw_zyb
2018-04-02 10:43:22,383 DEBUG [http-bio-7070-exec-1]
cachesync.Broadcaster:281 : Done broadcasting CREATE, cube, GLD_MR_TEST
2018-04-02 10:43:22,386 DEBUG [http-bio-7070-exec-1]
cachesync.Broadcaster:247 : Broadcasting UPDATE, project, dw_zyb
2018-04-02 10:43:22,387 DEBUG [http-bio-7070-exec-1]
project.ProjectL2Cache:195 : Loading L2 project cache for dw_zyb
2018-04-02 10:43:22,387 WARN  [http-bio-7070-exec-1]
realization.RealizationRegistry:91 : No provider for realization type
INVERTED_INDEX
2018-04-02 10:43:22,387 DEBUG [http-bio-7070-exec-1]
cachesync.Broadcaster:247 : Broadcasting UPDATE, project_schema, dw_zyb
2018-04-02 10:43:22,402 DEBUG [http-bio-7070-exec-1]
project.ProjectL2Cache:195 : Loading L2 project cache for dw_zyb
2018-04-02 10:43:22,402 WARN  [http-bio-7070-exec-1]
realization.RealizationRegistry:91 : No provider for realization type
INVERTED_INDEX
2018-04-02 10:43:22,404 INFO  [http-bio-7070-exec-1]
service.CacheService:120 : cleaning cache for project dw_zyb (currently
remove all entries)
2018-04-02 10:43:22,404 DEBUG [http-bio-7070-exec-1]
cachesync.Broadcaster:281 : Done broadcasting UPDATE, project_schema, dw_zyb
2018-04-02 10:43:22,405 DEBUG [http-bio-7070-exec-1]
cachesync.Broadcaster:247 : Broadcasting UPDATE, project_data, dw_zyb
2018-04-02 10:43:22,405 INFO  [http-bio-7070-exec-1]
service.CacheService:120 : cleaning cache for project dw_zyb (currently
remove all entries)
2018-04-02 10:43:22,405 DEBUG [http-bio-7070-exec-1]
cachesync.Broadcaster:281 : Done broadcasting UPDATE, project_data, dw_zyb
2018-04-02 10:43:22,405 DEBUG [http-bio-7070-exec-1]
cachesync.Broadcaster:281 : Done broadcasting UPDATE, project, dw_zyb

2018-04-01 23:23 GMT+08:00 Billy Liu <bi...@apache.org>:

> Hi Fei Yi,
>
> This parameter only works for ultra high cardinality columns,
> including the columns defined as "ShardBy" and "Global Dictionary".
> Please check if your cube has these two definitions.
>
> With Warm regards
>
> Billy Liu
>
>
> 2018-03-30 16:45 GMT+08:00 Fei Yi <yi...@gmail.com>:
> > I use kylin 2.3.1 version,
> > set kylin.engine.mr.build-uhc-dict-in-additional-step=true
> > kylin.snapshot.max-mb=3000
> >
> > but job are still built in kylin server, I don't see a separate step to
> > build UHC dimensions
> >
> >
>