You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kylin.apache.org by "Kumar, Manoj H" <ma...@jpmorgan.com> on 2018/02/02 06:14:37 UTC

optimal parameters

Hi Folks - Need your inputs for optimizing the kylin Cube build process - We have approx.. 450 millions of records in one Partition & 80-90 Dimensions to be picked up from the tables. Can you pls. advise on this? What would be optimal way of running the jobs.We have Cloudera cluster of 16 nodes - with 8 cores machine for each nodes.

This process is running since 60 minutes.

2018-02-01 23:54:16,257 INFO  [pool-9-thread-1] threadpool.DefaultScheduler:116 : CubingJob{id=2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd, name=BUILD CUBE - Deposits - 20170929000000_201709      30000000 - GMT+08:00 2018-02-02 12:37:11, state=READY} scheduled
79923 2018-02-01 23:54:16,258 INFO  [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] execution.AbstractExecutable:111 : Executing AbstractExecutable (BUILD CUBE - Deposits - 20170929000000_20      170930000000 - GMT+08:00 2018-02-02 12:37:11)
79924 2018-02-01 23:54:16,263 INFO  [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] execution.ExecutableManager:425 : job id:2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd from READY to RUNNING
79925 2018-02-01 23:54:16,271 INFO  [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] execution.AbstractExecutable:111 : Executing AbstractExecutable (Extract Fact Table Distinct Columns)
79926 2018-02-01 23:54:16,275 INFO  [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] execution.ExecutableManager:425 : job id:2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-02 from READY to RUNNING
79927 2018-02-01 23:54:16,358 INFO  [pool-9-thread-1] threadpool.DefaultScheduler:123 : Job Fetcher: 0 should running, 1 actual running, 0 stopped, 1 ready, 86 already succeed, 47 error, 0       discarded, 0 others
79928 2018-02-01 23:54:16,371 INFO  [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] common.MapReduceExecutable:115 : parameters of the MapReduceExecutable:  -conf /apps/rft/rcmo/apps/kylin/k      ylin_namespace/apache-kylin-2.1.0-KYLIN-2846-cdh57/conf/kylin_job_conf.xml -cubename Deposits -output hdfs://sfpdev/tenants/rft/rcmo/kylin/ns_rft_rcmo_creg_poc-kylin_metadata/kylin-2b      8baabe-0d16-4ad8-9c4a-449b24cb0fcd/Deposits/fact_distinct_columns -segmentid da273eda-45ea-4c72-816c-709c8a61df16 -statisticsenabled true -statisticsoutput hdfs://sfpdev/tenants/rft/r      cmo/kylin/ns_rft_rcmo_creg_poc-kylin_metadata/kylin-2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd/Deposits/fact_distinct_columns/statistics -statisticssamplingpercent 100 -jobname Kylin_Fact_D      istinct_Columns_Deposits_Step -cubingJobId 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd
79929 2018-02-01 23:54:16,424 INFO  [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] steps.FactDistinctColumnsJob:106 : Starting: Kylin_Fact_Distinct_Columns_Deposits_Step
79930 2018-02-01 23:54:16,775 INFO  [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] hive.metastore:386 : Trying to connect to metastore with URI thrift://bdtpisr3n1.svr.us.jpmchase.net:9083
79931 2018-02-01 23:54:16,784 INFO  [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] hive.metastore:431 : Opened a connection to metastore, current connections: 3
79932 2018-02-01 23:54:16,784 INFO  [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] hive.metastore:483 : Connected to metastore.
79933 2018-02-01 23:54:17,345 INFO  [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] common.KylinConfigBase:162 : Kylin Config was updated with kylin.metadata.url : /apps/rft/rcmo/apps/kylin/      kylin_namespace/apache-kylin-2.1.0-KYLIN-2846-cdh57/bin/../tomcat/temp/kylin_job_meta8814952902761392543/meta
79934 2018-02-01 23:54:17,347 INFO  [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] persistence.ResourceStore:79 : Using metadata url /apps/rft/rcmo/apps/kylin/kylin_namespace/apache-kylin-2      .1.0-KYLIN-2846-cdh57/bin/../tomcat/temp/kylin_job_meta8814952902761392543/meta for resource store
79935 2018-02-01 23:54:17,354 DEBUG [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] common.AbstractHadoopJob:547 : Dump resources to /apps/rft/rcmo/apps/kylin/kylin_namespace/apache-kylin-2.      1.0-KYLIN-2846-cdh57/bin/../tomcat/temp/kylin_job_meta8814952902761392543/meta took 9 ms
79936 2018-02-01 23:54:17,354 INFO  [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] common.AbstractHadoopJob:505 : HDFS meta dir is: file:///apps/rft/rcmo/apps/kylin/kylin_namespace/apache-k      ylin-2.1.0-KYLIN-2846-cdh57/bin/../tomcat/temp/kylin_job_meta8814952902761392543/meta
79937 2018-02-01 23:54:17,470 INFO  [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] hdfs.DFSClient:1086 : Created token for a_rcmo_nd: HDFS_DELEGATION_TOKEN owner=a_rcmo_nd@NAEAST.AD.JPMORGA      NCHASE.COM, renewer=yarn, realUser=, issueDate=1517547257468, maxDate=1518152057468, sequenceNumber=917925, masterKeyId=921 on ha-hdfs:sfpdev
79938 2018-02-01 23:54:17,471 INFO  [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] security.TokenCache:144 : Got dt for hdfs://sfpdev; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:sfpdev,       Ident: (token for a_rcmo_nd: HDFS_DELEGATION_TOKEN owner=a_rcmo_nd@NAEAST.AD.JPMORGANCHASE.COM, renewer=yarn, realUser=, issueDate=1517547257468, maxDate=1518152057468, sequenceNumber      =917925, masterKeyId=921)
79939 2018-02-01 23:54:17,478 INFO  [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] client.ConfiguredRMFailoverProxyProvider:100 : Failing over to rm76
79940 2018-02-01 23:54:18,864 INFO  [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] mapred.FileInputFormat:249 : Total input paths to process : 482
79941 2018-02-01 23:54:19,518 INFO  [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] mapreduce.JobSubmitter:202 : number of splits:482
79942 2018-02-01 23:54:19,566 INFO  [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] mapreduce.JobSubmitter:291 : Submitting tokens for job: job_1516848187601_12793
79943 2018-02-01 23:54:19,566 INFO  [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] mapreduce.JobSubmitter:293 : Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:sfpdev, Ident: (token for a_rcm      o_nd: HDFS_DELEGATION_TOKEN owner=a_rcmo_nd@NAEAST.AD.JPMORGANCHASE.COM, renewer=yarn, realUser=, issueDate=1517547257468, maxDate=1518152057468, sequenceNumber=917925, masterKeyId=92      1)
79944 2018-02-01 23:54:19,821 INFO  [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] impl.YarnClientImpl:260 : Submitted application application_1516848187601_12793
79945 2018-02-01 23:54:19,825 INFO  [Job 2b8baabe-0d16-4ad8-9c4a-449b24cb0fcd-761] mapreduce.Job:1311 : The url to track the job: http://bdtpisr3n2.svr.us.jpmchase.net:8088/proxy/applicatio

[cid:image001.png@01D39C1A.BA1BC060]


Also pls. advise on Spark parameter as well.

147 kylin.engine.mr.reduce-input-mb=400
149 #kylin.engine.mr.max-reducer-number=300
151 kylin.engine.mr.mapper-input-rows=500000
154 #kylin.engine.mr.build-dict-in-reducer=true
157 kylin.engine.mr.uhc-reducer-count=2
159 #### CUBE | DICTIONARY ###
164 kylin.cube.algorithm=inmem
166 ## A smaller threshold prefers layer, a larger threshold prefers in-mem
167 #kylin.cube.algorithm.layer-or-inmem-threshold=7
169 kylin.cube.aggrgroup.max-combination=61440
171 kylin.snapshot.max-mb=1500



kylin.engine.spark.rdd-partition-cut-mb=800
229 kylin.engine.spark.min-partition=1
231 ## Max partition numbers of rdd
232 kylin.engine.spark.max-partition=500
237 kylin.engine.spark-conf.spark.yarn.queue=XXXX
238 kylin.engine.spark-conf.spark.executor.memory=8G
239 kylin.engine.spark-conf.spark.executor.cores=6
240 kylin.engine.spark-conf.spark.executor.instances=10
241 kylin.engine.spark-conf.spark.eventLog.enabled=true
242 kylin.engine.spark-conf.spark.eventLog.dir=XXXX
243 kylin.engine.spark-conf.spark.history.fs.logDirectory=XXXX
244 kylin.engine.spark-conf.spark.hadoop.yarn.timeline-service.enabled=false

Regards,
Manoj


This message is confidential and subject to terms at: http://www.jpmorgan.com/emaildisclaimer including on confidentiality, legal privilege, viruses and monitoring of electronic messages. If you are not the intended recipient, please delete this message and notify the sender immediately. Any unauthorized use is strictly prohibited.