You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by Na Zhai <na...@kyligence.io> on 2019/03/13 09:24:02 UTC

答复: hadoop3版本,kylin2.6.1,spark构建cube,第三步,spark状态一直是state running,但一直是10%的进度,没有任何错误信息

Hi, 奥威软件

Does you use the apache-kylin-2.6.1-bin-hadoop3.tar.gz<https://www.apache.org/dyn/closer.cgi/kylin/apache-kylin-2.6.1/apache-kylin-2.6.1-bin-hadoop3.tar.gz>? If not, I recommend you to use this jar. If you have already used this jar, you can check the log of yarn of spark jobhistory. It may be caused by your unreasonable configuration of Spark. I can not see your pic, you can add it as the attachment. And I also can not see your attachment. Have you uploaded it?

发送自 Windows 10 版邮件<https://go.microsoft.com/fwlink/?LinkId=550986>应用

________________________________
发件人: 奥威软件 <35...@qq.com>
发送时间: Friday, March 8, 2019 7:42:35 PM
收件人: dev
主题: hadoop3版本,kylin2.6.1,spark构建cube,第三步,spark状态一直是state running,但一直是10%的进度,没有任何错误信息

Hi,
环境:
hadoop3.1.2
kylin:2.6.1

卡在第三步(#3 step : Extract Fact Table Distinct Columns)

[cid:2F27FAD3@C2F83023.2B55825C.png]


kylin.log 上没有报错信息~,
以下是执行信息:
2019-03-08 11:15:08,660 INFO  [Scheduler 190936888 Job edba142b-b9ec-2379-89f6-2a34511d0425-177] spark.SparkExecutable:262 : cmd: export HADOOP_CONF_DIR=/home/apache/hbase/conf && /home/apache/kylin/spark/bin/spark-submit --class org.apache.kylin.common.util.SparkEntry  --conf spark.executor.instances=40  --conf spark.yarn.archive=hdfs://master:9820/kylin/spark/spark-libs.jar  --conf spark.network.timeout=600  --conf spark.yarn.queue=default  --conf spark.history.fs.logDirectory=hdfs:///kylin/spark-history  --conf spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec  --conf spark.dynamicAllocation.enabled=true  --conf spark.master=yarn  --conf spark.storage.memoryFraction=0.5  --conf spark.dynamicAllocation.executorIdleTimeout=300  --conf spark.hadoop.yarn.timeline-service.enabled=false  --conf spark.executor.memory=4G  --conf spark.eventLog.enabled=true  --conf spark.eventLog.dir=hdfs:///kylin/spark-history  --conf spark.dynamicAllocation.minExecutors=1  --conf spark.executor.cores=2  --conf spark.yarn.executor.memoryOverhead=1024  --conf spark.hadoop.dfs.replication=2  --conf spark.dynamicAllocation.maxExecutors=1000  --conf spark.executor.memoryOverhead=1024  --conf spark.driver.memory=2G  --conf spark.driver.memoryOverhead=256  --conf spark.submit.deployMode=cluster  --conf spark.shuffle.service.enabled=true --jars /home/apache/kylin/lib/kylin-job-2.6.1.jar /home/apache/kylin/lib/kylin-job-2.6.1.jar -className org.apache.kylin.engine.spark.SparkFactDistinct -counterOutput hdfs://kylincluster/kylin/kylin_metadata/kylin-edba142b-b9ec-2379-89f6-2a34511d0425/spark_cube_low/counter -statisticssamplingpercent 100 -cubename spark_cube_low -hiveTable default.kylin_intermediate_spark_cube_low_63876e8b_97c8_074d_0356_3f779a6ca1b6 -output hdfs://kylincluster/kylin/kylin_metadata/kylin-edba142b-b9ec-2379-89f6-2a34511d0425/spark_cube_low/fact_distinct_columns -input hdfs://kylincluster/kylin/kylin_metadata/kylin-edba142b-b9ec-2379-89f6-2a34511d0425/kylin_intermediate_spark_cube_low_63876e8b_97c8_074d_0356_3f779a6ca1b6 -segmentId 63876e8b-97c8-074d-0356-3f779a6ca1b6 -metaUrl kylin_metadata@hdfs,path=hdfs://kylincluster/kylin/kylin_metadata/kylin-edba142b-b9ec-2379-89f6-2a34511d0425/spark_cube_low/metadata
2019-03-08 11:15:09,423 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:09 WARN  SparkConf:66 - The configuration key 'spark.yarn.executor.memoryOverhead' has been deprecated as of Spark 2.3 and may be removed in the future. Please use the new key 'spark.executor.memoryOverhead' instead.
2019-03-08 11:15:10,478 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:10 INFO  Client:54 - Requesting a new application from cluster with 2 NodeManagers
2019-03-08 11:15:10,541 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:10 INFO  Client:54 - Verifying our application has not requested more than the maximum memory capability of the cluster (8000 MB per container)
2019-03-08 11:15:10,542 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:10 INFO  Client:54 - Will allocate AM container, with 2304 MB memory including 256 MB overhead
2019-03-08 11:15:10,542 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:10 INFO  Client:54 - Setting up container launch context for our AM
2019-03-08 11:15:10,544 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:10 INFO  Client:54 - Setting up the launch environment for our AM container
2019-03-08 11:15:10,552 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:10 INFO  Client:54 - Preparing resources for our AM container
2019-03-08 11:15:11,553 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:11 INFO  Client:54 - Uploading resource hdfs://master:9820/kylin/spark/spark-libs.jar -> hdfs://kylincluster/user/root/.sparkStaging/application_1552042715496_0010/spark-libs.jar
2019-03-08 11:15:13,978 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:13 INFO  Client:54 - Uploading resource file:/home/apache/kylin/lib/kylin-job-2.6.1.jar -> hdfs://kylincluster/user/root/.sparkStaging/application_1552042715496_0010/kylin-job-2.6.1.jar
2019-03-08 11:15:14,485 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:14 WARN  Client:66 - Same name resource file:///home/apache/kylin/lib/kylin-job-2.6.1.jar added multiple times to distributed cache
2019-03-08 11:15:14,594 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:14 INFO  Client:54 - Uploading resource file:/tmp/spark-1e734412-567c-4c59-a6c6-9c58c83355db/__spark_conf__6590808330343993974.zip -> hdfs://kylincluster/user/root/.sparkStaging/application_1552042715496_0010/__spark_conf__.zip
2019-03-08 11:15:14,796 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:14 INFO  SecurityManager:54 - Changing view acls to: root
2019-03-08 11:15:14,797 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:14 INFO  SecurityManager:54 - Changing modify acls to: root
2019-03-08 11:15:14,798 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:14 INFO  SecurityManager:54 - Changing view acls groups to:
2019-03-08 11:15:14,798 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:14 INFO  SecurityManager:54 - Changing modify acls groups to:
2019-03-08 11:15:14,800 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:14 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
2019-03-08 11:15:14,811 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:14 INFO  Client:54 - Submitting application application_1552042715496_0010 to ResourceManager
2019-03-08 11:15:15,038 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:15 INFO  YarnClientImpl:273 - Submitted application application_1552042715496_0010
2019-03-08 11:15:16,042 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:16 INFO  Client:54 - Application report for application_1552042715496_0010 (state: ACCEPTED)
2019-03-08 11:15:16,046 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:16 INFO  Client:54 -
2019-03-08 11:15:16,046 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : client token: N/A
2019-03-08 11:15:16,046 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : diagnostics: AM container is launched, waiting for AM container to Register with RM
2019-03-08 11:15:16,046 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : ApplicationMaster host: N/A
2019-03-08 11:15:16,046 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : ApplicationMaster RPC port: -1
2019-03-08 11:15:16,046 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : queue: default
2019-03-08 11:15:16,047 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : start time: 1552043714819
2019-03-08 11:15:16,047 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : final status: UNDEFINED
2019-03-08 11:15:16,047 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : tracking URL: http://master:8088/proxy/application_1552042715496_0010/
2019-03-08 11:15:16,050 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : user: root
2019-03-08 11:15:17,048 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:17 INFO  Client:54 - Application report for application_1552042715496_0010 (state: ACCEPTED)
2019-03-08 11:15:18,050 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:18 INFO  Client:54 - Application report for application_1552042715496_0010 (state: ACCEPTED)
2019-03-08 11:15:19,052 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:19 INFO  Client:54 - Application report for application_1552042715496_0010 (state: ACCEPTED)
2019-03-08 11:15:20,054 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:20 INFO  Client:54 - Application report for application_1552042715496_0010 (state: ACCEPTED)
2019-03-08 11:15:21,056 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:21 INFO  Client:54 - Application report for application_1552042715496_0010 (state: ACCEPTED)
2019-03-08 11:15:22,058 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:22 INFO  Client:54 - Application report for application_1552042715496_0010 (state: RUNNING)
2019-03-08 11:15:22,059 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:22 INFO  Client:54 -
2019-03-08 11:15:22,059 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : client token: N/A
2019-03-08 11:15:22,059 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : diagnostics: N/A
2019-03-08 11:15:22,059 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : ApplicationMaster host: 192.168.0.110
2019-03-08 11:15:22,059 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : ApplicationMaster RPC port: 0
2019-03-08 11:15:22,059 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : queue: default
2019-03-08 11:15:22,059 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : start time: 1552043714819
2019-03-08 11:15:22,059 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : final status: UNDEFINED
2019-03-08 11:15:22,059 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : tracking URL: http://master:8088/proxy/application_1552042715496_0010/
2019-03-08 11:15:22,063 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : user: root
2019-03-08 11:15:23,061 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:23 INFO  Client:54 - Application report for application_1552042715496_0010 (state: RUNNING)
2019-03-08 11:15:24,062 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:24 INFO  Client:54 - Application report for application_1552042715496_0010 (state: RUNNING)
2019-03-08 11:15:24,647 INFO  [FetcherRunner 1227064704-51] threadpool.DefaultFetcherRunner:94 : Job Fetcher: 1 should running, 1 actual running, 1 stopped, 0 ready, 22 already succeed, 29 error, 0 discarded, 0 others
2019-03-08 11:15:25,065 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:25 INFO  Client:54 - Application report for application_1552042715496_0010 (state: RUNNING)
2019-03-08 11:15:26,067 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:26 INFO  Client:54 - Application report for application_1552042715496_0010 (state: RUNNING)
2019-03-08 11:15:27,069 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:27 INFO  Client:54 - Application report for application_1552042715496_0010 (state: RUNNING)
2019-03-08 11:15:28,071 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:28 INFO  Client:54 - Application report for application_1552042715496_0010 (state: RUNNING)
2019-03-08 11:15:29,073 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:29 INFO  Client:54 - Application report for application_1552042715496_0010 (state: RUNNING)
2019-03-08 11:15:30,075 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:30 INFO  Client:54 - Application report for application_1552042715496_0010 (state: RUNNING)
2019-03-08 11:15:31,077 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:31 INFO  Client:54 - Application report for application_1552042715496_0010 (state: RUNNING)
2019-03-08 11:15:32,079 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:32 INFO  Client:54 - Application report for application_1552042715496_0010 (state: RUNNING)
2019-03-08 11:15:33,081 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:33 INFO  Client:54 - Application report for application_1552042715496_0010 (state: RUNNING)
2019-03-08 11:15:33,487 DEBUG [http-nio-7070-exec-4] common.KylinConfig:328 : KYLIN_CONF property was not set, will seek KYLIN_HOME env variable
2019-03-08 11:15:33,487 INFO  [http-nio-7070-exec-4] common.KylinConfig:334 : Use KYLIN_HOME=/home/apache/kylin
2019-03-08 11:15:34,083 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:34 INFO  Client:54 - Application report for application_1552042715496_0010 (state: RUNNING)
2019-03-08 11:15:34,242 DEBUG [http-nio-7070-exec-3] badquery.BadQueryHistoryManager:65 : Loaded 0 Bad Query(s)
2019-03-08 11:15:35,085 INFO  [pool-16-thread-1] spark.SparkExecutable:38 : 2019-03-08 11:15:35 INFO  Client:54 - Application report for application_15520427

spark 显示正常, INFO  Client:54 - Application report for application_1552042715496_0010 (state: RUNNING)
但是跑了一两个小时,进度都在10%
spark的job显示,如下:
[cid:05F89F2B@304E236C.2B55825C.png]
[cid:5C334037@D684745A.2B55825C.png]
[cid:C20B2E0E@6575433A.2B55825C.png]
此cube用map reduce方式构建只要几分钟。

hadoop-root-resourcemanager-master.log 显示有在不停使用yarn的container, 数量由1-4不停变动。
以下为resourcemanager日志:
2019-03-08 11:32:34,829 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Allocation proposal accepted
2019-03-08 11:32:37,590 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e89_1552042715496_0010_01_000409 Container Transitioned from ACQUIRED to RELEASED
2019-03-08 11:32:37,590 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root IP=192.168.0.110 OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS APPID=application_1552042715496_0010 CONTAINERID=container_e89_1552042715496_0010_01_000409 RESOURCE=<memory:6144, vCores:1> QUEUENAME=default
2019-03-08 11:32:37,591 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e89_1552042715496_0010_01_000410 Container Transitioned from ALLOCATED to ACQUIRED
2019-03-08 11:32:37,833 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator: assignedContainer application attempt=appattempt_1552042715496_0010_000001 container=null queue=default clusterResource=<memory:32000, vCores:24> type=NODE_LOCAL requestedPartition=
2019-03-08 11:32:37,833 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e89_1552042715496_0010_01_000411 Container Transitioned from NEW to ALLOCATED
2019-03-08 11:32:37,833 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode: Assigned container container_e89_1552042715496_0010_01_000411 of capacity <memory:6144, vCores:1> on host master:33725, which has 2 containers, <memory:12288, vCores:2> used and <memory:3712, vCores:10> available after allocation
2019-03-08 11:32:37,833 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root OPERATION=AM Allocated Container TARGET=SchedulerApp RESULT=SUCCESS APPID=application_1552042715496_0010 CONTAINERID=container_e89_1552042715496_0010_01_000411 RESOURCE=<memory:6144, vCores:1> QUEUENAME=default
2019-03-08 11:32:37,833 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: assignedContainer queue=root usedCapacity=0.512 absoluteUsedCapacity=0.512 used=<memory:16384, vCores:3> cluster=<memory:32000, vCores:24>
2019-03-08 11:32:37,833 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Allocation proposal accepted
2019-03-08 11:32:40,595 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e89_1552042715496_0010_01_000410 Container Transitioned from ACQUIRED to RELEASED
2019-03-08 11:32:40,595 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root IP=192.168.0.110 OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS APPID=application_1552042715496_0010 CONTAINERID=container_e89_1552042715496_0010_01_000410 RESOURCE=<memory:6144, vCores:1> QUEUENAME=default
2019-03-08 11:32:40,596 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e89_1552042715496_0010_01_000411 Container Transitioned from ALLOCATED to ACQUIRED
2019-03-08 11:32:40,836 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator: assignedContainer application attempt=appattempt_1552042715496_0010_000001 container=null queue=default clusterResource=<memory:32000, vCores:24> type=NODE_LOCAL requestedPartition=
2019-03-08 11:32:40,836 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e89_1552042715496_0010_01_000412 Container Transitioned from NEW to ALLOCATED
2019-03-08 11:32:40,836 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode: Assigned container container_e89_1552042715496_0010_01_000412 of capacity <memory:6144, vCores:1> on host master:33725, which has 2 containers, <memory:12288, vCores:2> used and <memory:3712, vCores:10> available after allocation
2019-03-08 11:32:40,836 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root OPERATION=AM Allocated Container TARGET=SchedulerApp RESULT=SUCCESS APPID=application_1552042715496_0010 CONTAINERID=container_e89_1552042715496_0010_01_000412 RESOURCE=<memory:6144, vCores:1> QUEUENAME=default
2019-03-08 11:32:40,836 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: assignedContainer queue=root usedCapacity=0.512 absoluteUsedCapacity=0.512 used=<memory:16384, vCores:3> cluster=<memory:32000, vCores:24>
2019-03-08 11:32:40,836 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Allocation proposal accepted



附件是diagnosis
请帮忙看下怎么解决,谢谢!

Best regards