You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by Kamal Bannuru <ka...@gmail.com> on 2019/12/04 06:40:52 UTC

How to estimate cluster resources and storage required by Kylin

Hi Team,

This is Kamal Bannuru , I am newbie to Kylin community, please help me with
"*How to estimate cluster resources and storage required by Kylin*"

Please find more details about the dimension tables ,fact table and cube
design details as below.

*Dimension Table:* dim_audio_songs_name_mapping	
--------------------------------------------------
Column Name     |	DataType	        |  Sample values
--------------------------------------------------
songid                |     String		|	s001
songname	         |	String		|	yyy
artistname	         |	String		|	XXX
country_code     |	String		|	IN
--------------------------------------------------
Dimension table size in HDFS:10 GB
No.Of Records          		:5 Million records


*Fact Table:* tb_songs_tranasactions	
--------------------------------------------------
Column Name    |	DataType	| Sample Value
--------------------------------------------------
transactionid      |	bigint	| 1001
country_code     |	String	| IN
currency            | 	String	| INR
paid_money       |   	String	| 1000
songid	        |      String  	| s001
--------------------------------------------------

Dimension table size in HDFS : 20 GB
No.Of Records		         : 50 Million Records


Model CubeEngine	MR	

*Cube Design details:*
----------------------------------------------------------------------------------------------------
Column Type         | Column Name       | Join Relation 
----------------------------------------------------------------------------------------------------
Dimension Column | Country_code	  |
tb_songs_tranasactions.country_code=dim_audio_songs_name_mapping.country_code
Dimension Column | songid                  |
tb_songs_tranasactions.songid=dim_audio_songs_name_mapping.songid
Measure                | Metric	                  | Count(transactionid)
count(tb_songs_tranasactions.transactionid)
Measure                | Metric	                 | SUM(paid_money)
sum(tb_songs_tranasactions.paid_money)
----------------------------------------------------------------------------------------------------



*Cube size estimation and required computations calcuations 	*			
*Storage Estimations:*				

1)  Please share the details like how much storage is relatively required
considering the dimension columns , cardinality values and facts data .
2) how much hive storage is required for the intermediate tables and for the
cube storage size at Hbase.
3) Do we have any Aproximate formulas to estimate these sizes ?
				
*Computation Estimations*
*Cube building :*
How much computation resources at cluster are required for the intermediate
hive jobs  using cube engine as MR ?				
				
*Cube Query :*
How much computation resources are required for Cube query from hbase
storage ?

Do we have any Aproximate formulas to estimate these sizes ?	

If these questions are already answered, please share the links, please let
me know if any more details are required.

Thanks for the support.

Regards
Kamal Bannuru.





--
Sent from: http://apache-kylin.74782.x6.nabble.com/