You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by Kamal Bannuru <ka...@gmail.com> on 2019/12/04 06:40:52 UTC
How to estimate cluster resources and storage required by Kylin
Hi Team,
This is Kamal Bannuru , I am newbie to Kylin community, please help me with
"*How to estimate cluster resources and storage required by Kylin*"
Please find more details about the dimension tables ,fact table and cube
design details as below.
*Dimension Table:* dim_audio_songs_name_mapping
--------------------------------------------------
Column Name | DataType | Sample values
--------------------------------------------------
songid | String | s001
songname | String | yyy
artistname | String | XXX
country_code | String | IN
--------------------------------------------------
Dimension table size in HDFS:10 GB
No.Of Records :5 Million records
*Fact Table:* tb_songs_tranasactions
--------------------------------------------------
Column Name | DataType | Sample Value
--------------------------------------------------
transactionid | bigint | 1001
country_code | String | IN
currency | String | INR
paid_money | String | 1000
songid | String | s001
--------------------------------------------------
Dimension table size in HDFS : 20 GB
No.Of Records : 50 Million Records
Model CubeEngine MR
*Cube Design details:*
----------------------------------------------------------------------------------------------------
Column Type | Column Name | Join Relation
----------------------------------------------------------------------------------------------------
Dimension Column | Country_code |
tb_songs_tranasactions.country_code=dim_audio_songs_name_mapping.country_code
Dimension Column | songid |
tb_songs_tranasactions.songid=dim_audio_songs_name_mapping.songid
Measure | Metric | Count(transactionid)
count(tb_songs_tranasactions.transactionid)
Measure | Metric | SUM(paid_money)
sum(tb_songs_tranasactions.paid_money)
----------------------------------------------------------------------------------------------------
*Cube size estimation and required computations calcuations *
*Storage Estimations:*
1) Please share the details like how much storage is relatively required
considering the dimension columns , cardinality values and facts data .
2) how much hive storage is required for the intermediate tables and for the
cube storage size at Hbase.
3) Do we have any Aproximate formulas to estimate these sizes ?
*Computation Estimations*
*Cube building :*
How much computation resources at cluster are required for the intermediate
hive jobs using cube engine as MR ?
*Cube Query :*
How much computation resources are required for Cube query from hbase
storage ?
Do we have any Aproximate formulas to estimate these sizes ?
If these questions are already answered, please share the links, please let
me know if any more details are required.
Thanks for the support.
Regards
Kamal Bannuru.
--
Sent from: http://apache-kylin.74782.x6.nabble.com/