You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kylin.apache.org by "wenyefbl@163.com" <we...@163.com> on 2016/01/08 06:00:26 UTC

How to improve the performance of job！

I have five machines (8 core, 32g MEM), using HDP 2.3 building cluster environment, version of the kyling Kyline apache-kylin-1.3-HBase-1.1-SNAPSHOT-bin, HBase for Version 1.1.1, hive table data is now 30000000 ，but now job running the one hour, job schedule is about 10%, view the task of MR found that job is not running to MR Do you have any way to improve the performance of the job：
this is my configure：
1.kylin.properties
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

## Config for Kylin Engine ##

# List of web servers in use, this enables one web server instance to sync up with other servers.
kylin.rest.servers=192.168.1.40:7070

#set display timezone on UI,format like[GMT+N or GMT-N]
kylin.rest.timezone=GMT-8
kylin.query.cache.enabled=true
# The metadata store in hbase
kylin.metadata.url=kylin_metadata@hbase

# The storage for final cube file in hbase
kylin.storage.url=hbase
kylin.job.yarn.app.rest.check.status.url=http://192.168.1.40:8088/ws/v1/cluster/apps/${job_id}?
kylin.job.yarn.app.rest.check.interval.seconds=20
kylin.query.security.enabled=false
# Temp folder in hdfs, make sure user has the right access to the hdfs directory
kylin.hdfs.working.dir=/kylin

# HBase Cluster FileSystem, which serving hbase, format as hdfs://hbase-cluster:8020
# leave empty if hbase running on same cluster with hive and mapreduce
kylin.hbase.cluster.fs=hdfs://mycluster/apps/hbase/data
kylin.route.hive.enabled=true
kylin.route.hive.url=jdbc:hive2://192.168.1.50:10000

kylin.job.mapreduce.default.reduce.input.mb=500

kylin.server.mode=all

# If true, job engine will not assume that hadoop CLI reside on the same server as it self
# you will have to specify kylin.job.remote.cli.hostname, kylin.job.remote.cli.username and kylin.job.remote.cli.password
# It should not be set to "true" unless you're NOT running Kylin.sh on a hadoop client machine 
# (Thus kylin instance has to ssh to another real hadoop client machine to execute hbase,hive,hadoop commands)
kylin.job.run.as.remote.cmd=false

# Only necessary when kylin.job.run.as.remote.cmd=true
kylin.job.remote.cli.hostname=

# Only necessary when kylin.job.run.as.remote.cmd=true
kylin.job.remote.cli.username=

# Only necessary when kylin.job.run.as.remote.cmd=true
kylin.job.remote.cli.password=

# Used by test cases to prepare synthetic data for sample cube
kylin.job.remote.cli.working.dir=/tmp/kylin

# Max count of concurrent jobs running
kylin.job.concurrent.max.limit=10

# Time interval to check hadoop job status
kylin.job.yarn.app.rest.check.interval.seconds=10

# Hive database name for putting the intermediate flat tables
#kylin.job.hive.database.for.intermediatetable=kylin

#default compression codec for htable,snappy,lzo,gzip,lz4
kylin.hbase.default.compression.codec=snappy

# The cut size for hbase region, in GB.
# E.g, for cube whose capacity be marked as "SMALL", split region per 10GB by default
kylin.hbase.region.cut.small=10
kylin.hbase.region.cut.medium=20
kylin.hbase.region.cut.large=100

# HBase min and max region count
kylin.hbase.region.count.min=1
kylin.hbase.region.count.max=500

## Config for Restful APP ##
# database connection settings:
ldap.server=
ldap.username=
ldap.password=
ldap.user.searchBase=
ldap.user.searchPattern=
ldap.user.groupSearchBase=
ldap.service.searchBase=OU=
ldap.service.searchPattern=
ldap.service.groupSearchBase=
acl.adminRole=
acl.defaultRole=
ganglia.group=
ganglia.port=8664

## Config for mail service

# If true, will send email notification;
mail.enabled=false
mail.host=
mail.username=
mail.password=
mail.sender=

###########################config info for web#######################

#help info ,format{name|displayName|link} ,optional
kylin.web.help.length=4
kylin.web.help.0=start|Getting Started|
kylin.web.help.1=odbc|ODBC Driver|
kylin.web.help.2=tableau|Tableau Guide|
kylin.web.help.3=onboard|Cube Design Tutorial|
#hadoop url link ,optional
kylin.web.hadoop=
#job diagnostic url link ,optional
kylin.web.diagnostic=
#contact mail on web page ,optional
kylin.web.contact_mail=

###########################config info for front#######################

#env DEV|QA|PROD
deploy.env=PROD

###########################config info for sandbox#######################
kylin.sandbox=true


###########################config info for kylin monitor#######################
# hive jdbc url
kylin.monitor.hive.jdbc.connection.url=jdbc:hive2://192.168.1.12:10000

#config where to parse query log,split with comma ,will also read $KYLIN_HOME/tomcat/logs/ by default
kylin.monitor.ext.log.base.dir = /tmp/kylin_log1,/tmp/kylin_log2

#will create external hive table to query result csv file
#will set to kylin_query_log by default if not config here
kylin.monitor.query.log.parse.result.table = kylin_query_log


2.kylin_job_conf.xml 
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->

<configuration>

    <property>
        <name>mapreduce.job.split.metainfo.maxsize</name>
        <value>-1</value>
        <description>The maximum permissible size of the split metainfo file.
            The JobTracker won't attempt to read split metainfo files bigger than
            the configured value. No limits if set to -1.
        </description>
    </property>

    <property>
        <name>mapred.compress.map.output</name>
        <value>true</value>
        <description>Compress map outputs</description>
    </property>

    <property>
        <name>mapred.map.output.compression.codec</name>
        <value>org.apache.hadoop.io.compress.SnappyCodec</value>
        <description>The compression codec to use for map outputs
        </description>
    </property>

    <property>
        <name>mapred.output.compress</name>
        <value>true</value>
        <description>Compress the output of a MapReduce job</description>
    </property>

    <property>
        <name>mapred.output.compression.codec</name>
        <value>org.apache.hadoop.io.compress.SnappyCodec</value>
        <description>The compression codec to use for job outputs
        </description>
    </property>

    <property>
        <name>mapred.output.compression.type</name>
        <value>BLOCK</value>
        <description>The compression type to use for job outputs</description>
    </property>

    <property>
        <name>mapreduce.job.max.split.locations</name>
        <value>2000</value>
        <description>No description</description>
    </property>

    <property>
        <name>dfs.replication</name>
        <value>1</value>
        <description>Block replication</description>
    </property>


    <property>
        <name>hive.merge.mapfiles</name>
        <value>true</value>
        <description>Enable hive file merge on mapper only job</description>
    </property>
    <property>
        <name>hive.merge.mapredfiles</name>
        <value>true</value>
        <description>Enable hive file merge on map-reduce job</description>
    </property>
    <property>
        <name>hive.merge.size.per.task</name>
        <value>268435456</value>
        <description>Size for the merged file: 256M</description>
    </property>

    <property>
        <name>hive.support.concurrency</name>
        <value>false</value>
        <description>Hive concurrency lock</description>
    </property>
</configuration>


3.debug：
[pool-7-thread-8]:[2016-01-08 12:48:18,488][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)] - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01 (Store kylin_metadata@hbase)
[pool-7-thread-8]:[2016-01-08 12:48:38,500][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)] - Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
[pool-7-thread-8]:[2016-01-08 12:48:38,501][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)] - Job job_1452156670999_0116 get status check result.

[pool-7-thread-8]:[2016-01-08 12:48:38,501][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)] - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
[pool-7-thread-8]:[2016-01-08 12:48:38,504][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)] - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01 (Store kylin_metadata@hbase)
[pool-7-thread-8]:[2016-01-08 12:48:58,523][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)] - Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
[pool-7-thread-8]:[2016-01-08 12:48:58,523][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)] - Job job_1452156670999_0116 get status check result.

[pool-7-thread-8]:[2016-01-08 12:48:58,523][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)] - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
[pool-7-thread-8]:[2016-01-08 12:48:58,526][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)] - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01 (Store kylin_metadata@hbase)
[pool-6-thread-1]:[2016-01-08 12:49:08,108][INFO][org.apache.kylin.job.impl.threadpool.DefaultScheduler$FetcherRunner.run(DefaultScheduler.java:112)] - Job Fetcher: 3 running, 3 actual running, 0 ready, 28 others
[pool-7-thread-8]:[2016-01-08 12:49:18,538][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)] - Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
[pool-7-thread-8]:[2016-01-08 12:49:18,538][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)] - Job job_1452156670999_0116 get status check result.

[pool-7-thread-8]:[2016-01-08 12:49:18,539][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)] - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
[pool-7-thread-8]:[2016-01-08 12:49:18,542][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)] - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01 (Store kylin_metadata@hbase)
[pool-7-thread-8]:[2016-01-08 12:49:38,555][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)] - Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
[pool-7-thread-8]:[2016-01-08 12:49:38,555][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)] - Job job_1452156670999_0116 get status check result.

[pool-7-thread-8]:[2016-01-08 12:49:38,556][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)] - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
[pool-7-thread-8]:[2016-01-08 12:49:38,558][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)] - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01 (Store kylin_metadata@hbase)
[pool-7-thread-8]:[2016-01-08 12:49:58,571][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)] - Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
[pool-7-thread-8]:[2016-01-08 12:49:58,571][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)] - Job job_1452156670999_0116 get status check result.

[pool-7-thread-8]:[2016-01-08 12:49:58,572][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)] - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
[pool-7-thread-8]:[2016-01-08 12:49:58,574][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)] - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01 (Store kylin_metadata@hbase)
[pool-6-thread-1]:[2016-01-08 12:50:08,118][INFO][org.apache.kylin.job.impl.threadpool.DefaultScheduler$FetcherRunner.run(DefaultScheduler.java:112)] - Job Fetcher: 3 running, 3 actual running, 0 ready, 28 others
[pool-7-thread-8]:[2016-01-08 12:50:18,594][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)] - Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
[pool-7-thread-8]:[2016-01-08 12:50:18,594][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)] - Job job_1452156670999_0116 get status check result.

[pool-7-thread-8]:[2016-01-08 12:50:18,595][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)] - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
[pool-7-thread-8]:[2016-01-08 12:50:18,597][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)] - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01 (Store kylin_metadata@hbase)
[pool-7-thread-8]:[2016-01-08 12:50:38,609][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)] - Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
[pool-7-thread-8]:[2016-01-08 12:50:38,610][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)] - Job job_1452156670999_0116 get status check result.

[pool-7-thread-8]:[2016-01-08 12:50:38,610][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)] - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
[pool-7-thread-8]:[2016-01-08 12:50:38,613][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)] - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01 (Store kylin_metadata@hbase)
[pool-7-thread-8]:[2016-01-08 12:50:58,627][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)] - Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
[pool-7-thread-8]:[2016-01-08 12:50:58,627][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)] - Job job_1452156670999_0116 get status check result.

[pool-7-thread-8]:[2016-01-08 12:50:58,627][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)] - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
[pool-7-thread-8]:[2016-01-08 12:50:58,630][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)] - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01 (Store kylin_metadata@hbase)
[pool-6-thread-1]:[2016-01-08 12:51:08,111][INFO][org.apache.kylin.job.impl.threadpool.DefaultScheduler$FetcherRunner.run(DefaultScheduler.java:112)] - Job Fetcher: 3 running, 3 actual running, 0 ready, 28 others
[http-bio-7070-exec-2]:[2016-01-08 12:51:11,522][DEBUG][org.apache.kylin.rest.filter.KylinApiFilter.logRequest(KylinApiFilter.java:120)] - REQUEST: REQUESTER=ADMIN;REQ_TIME=GMT-08:00 2016-01-07 20:51:11;URI=/kylin/api/jobs;METHOD=GET;QUERY_STRING=limit=15&offset=0&projectName=learn_kylin;PAYLOAD=;RESP_STATUS=200;
[http-bio-7070-exec-2]:[2016-01-08 12:51:13,793][DEBUG][org.apache.kylin.rest.filter.KylinApiFilter.logRequest(KylinApiFilter.java:120)] - REQUEST: REQUESTER=ADMIN;REQ_TIME=GMT-08:00 2016-01-07 20:51:13;URI=/kylin/api/jobs;METHOD=GET;QUERY_STRING=limit=15&offset=0&projectName=learn_kylin;PAYLOAD=;RESP_STATUS=200;
[http-bio-7070-exec-2]:[2016-01-08 12:51:14,431][DEBUG][org.apache.kylin.rest.filter.KylinApiFilter.logRequest(KylinApiFilter.java:120)] - REQUEST: REQUESTER=ADMIN;REQ_TIME=GMT-08:00 2016-01-07 20:51:14;URI=/kylin/api/jobs;METHOD=GET;QUERY_STRING=limit=15&offset=0&projectName=learn_kylin;PAYLOAD=;RESP_STATUS=200;
[pool-7-thread-8]:[2016-01-08 12:51:18,643][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)] - Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
[pool-7-thread-8]:[2016-01-08 12:51:18,643][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)] - Job job_1452156670999_0116 get status check result.

[pool-7-thread-8]:[2016-01-08 12:51:18,644][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)] - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
[pool-7-thread-8]:[2016-01-08 12:51:18,647][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)] - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01 (Store kylin_metadata@hbase)
[pool-7-thread-8]:[2016-01-08 12:51:38,658][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)] - Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
[pool-7-thread-8]:[2016-01-08 12:51:38,659][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)] - Job job_1452156670999_0116 get status check result.

[pool-7-thread-8]:[2016-01-08 12:51:38,659][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)] - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
[pool-7-thread-8]:[2016-01-08 12:51:38,662][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)] - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01 (Store kylin_metadata@hbase)
[pool-7-thread-8]:[2016-01-08 12:51:58,674][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)] - Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
[pool-7-thread-8]:[2016-01-08 12:51:58,674][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)] - Job job_1452156670999_0116 get status check result.

[pool-7-thread-8]:[2016-01-08 12:51:58,675][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)] - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
[pool-7-thread-8]:[2016-01-08 12:51:58,677][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)] - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01 (Store kylin_metadata@hbase)
[pool-6-thread-1]:[2016-01-08 12:52:08,118][INFO][org.apache.kylin.job.impl.threadpool.DefaultScheduler$FetcherRunner.run(DefaultScheduler.java:112)] - Job Fetcher: 3 running, 3 actual running, 0 ready, 28 others
[pool-7-thread-8]:[2016-01-08 12:52:18,696][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)] - Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
[pool-7-thread-8]:[2016-01-08 12:52:18,697][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)] - Job job_1452156670999_0116 get status check result.

[pool-7-thread-8]:[2016-01-08 12:52:18,697][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)] - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
[pool-7-thread-8]:[2016-01-08 12:52:18,700][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)] - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01 (Store kylin_metadata@hbase)
[pool-7-thread-8]:[2016-01-08 12:52:38,712][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)] - Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
[pool-7-thread-8]:[2016-01-08 12:52:38,713][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)] - Job job_1452156670999_0116 get status check result.

[pool-7-thread-8]:[2016-01-08 12:52:38,713][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)] - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
[pool-7-thread-8]:[2016-01-08 12:52:38,716][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)] - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01 (Store kylin_metadata@hbase)
[pool-7-thread-8]:[2016-01-08 12:52:58,728][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)] - Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
[pool-7-thread-8]:[2016-01-08 12:52:58,728][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)] - Job job_1452156670999_0116 get status check result.

[pool-7-thread-8]:[2016-01-08 12:52:58,729][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)] - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
[pool-7-thread-8]:[2016-01-08 12:52:58,731][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)] - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01 (Store kylin_metadata@hbase)
[pool-6-thread-1]:[2016-01-08 12:53:08,104][INFO][org.apache.kylin.job.impl.threadpool.DefaultScheduler$FetcherRunner.run(DefaultScheduler.java:112)] - Job Fetcher: 3 running, 3 actual running, 0 ready, 28 others
[pool-7-thread-8]:[2016-01-08 12:53:18,744][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)] - Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
[pool-7-thread-8]:[2016-01-08 12:53:18,744][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)] - Job job_1452156670999_0116 get status check result.

[pool-7-thread-8]:[2016-01-08 12:53:18,745][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)] - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
[pool-7-thread-8]:[2016-01-08 12:53:18,747][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)] - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01 (Store kylin_metadata@hbase)
[pool-7-thread-8]:[2016-01-08 12:53:38,760][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)] - Going to buffer response body of large or unknown size. Using getResponseBodyAsStream instead is recommended.
[pool-7-thread-8]:[2016-01-08 12:53:38,760][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)] - Job job_1452156670999_0116 get status check result.

[pool-7-thread-8]:[2016-01-08 12:53:38,761][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)] - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
[pool-7-thread-8]:[2016-01-08 12:53:38,764][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)] - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01 (Store kylin_metadata@hbase)



wenyefbl@163.com

Re: Re: How to improve the performance of job！

Posted by yu feng <ol...@gmail.com>.

It is a long value, you should modify value to 67108864

2016-01-08 14:17 GMT+08:00 wenyefbl@163.com <we...@163.com>:

> I modified the mapreduce.input.fileinputformat.split.maxsize parameter
> according to your proposal, and now it's wrong:
>
> Query returned non-zero code: 1, cause: 'SET
> mapreduce.input.fileinputformat.split.maxsize=64MB' FAILED because
> mapreduce.input.fileinputformat.split.maxsize expects LONG type value.
>
>         at
> org.apache.kylin.common.util.CliCommandExecutor.execute(CliCommandExecutor.java:90)
>         at
> org.apache.kylin.job.common.ShellExecutable.doWork(ShellExecutable.java:52)
>         at
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
>         at
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:51)
>         at
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
>         at
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:130)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
>
> My profile kylin_job_conf.xml：
>     <property>
>     <name>mapreduce.input.fileinputformat.split.maxsize</name>
>               <value>64MB</value>
>         <description>Hive concurrency lock</description>
>     </property>
>
>
>
> wenyefbl@163.com
>
> 发件人： yu feng
> 发送时间： 2016-01-08 13:21
> 收件人： dev
> 主题： Re: How to improve the performance of job！
> According to our experience: you can try those :
> 1、use newer hive to promote the first step.
> 2、startup more mapper and reducer for every MR job, you can reduce the
> value of 'kylin.job.mapreduce.default.reduce.input.mb' in kylin.properties
> which means input size for every reducer in NDCuboid calculation steps.
> smaller value means more reducer.
> 3、 you can set the property
> 'mapreduce.input.fileinputformat.split.maxsize'('mapred.max.split.size' in
> prior hadoop version) in kylin_job_conf.xml, which means the max split size
> of a mapper, we set the value less than block size of hadoop cluster, such
> as 64MB
> 4、 try to set cube size as SMALL while creating cube, which can increase
> reducer number while generate Hfile.
>
> Hope it is helpful to you .
>
> 2016-01-08 13:00 GMT+08:00 wenyefbl@163.com <we...@163.com>:
>
> > I have five machines (8 core, 32g MEM), using HDP 2.3 building cluster
> > environment, version of the kyling Kyline
> > apache-kylin-1.3-HBase-1.1-SNAPSHOT-bin, HBase for Version 1.1.1, hive
> > table data is now 30000000 ，but now job running the one hour, job
> schedule
> > is about 10%, view the task of MR found that job is not running to MR Do
> > you have any way to improve the performance of the job：
> > this is my configure：
> > 1.kylin.properties
> > #
> > # Licensed to the Apache Software Foundation (ASF) under one or more
> > # contributor license agreements.  See the NOTICE file distributed with
> > # this work for additional information regarding copyright ownership.
> > # The ASF licenses this file to You under the Apache License, Version 2.0
> > # (the "License"); you may not use this file except in compliance with
> > # the License.  You may obtain a copy of the License at
> > #
> > #    http://www.apache.org/licenses/LICENSE-2.0
> > #
> > # Unless required by applicable law or agreed to in writing, software
> > # distributed under the License is distributed on an "AS IS" BASIS,
> > # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> implied.
> > # See the License for the specific language governing permissions and
> > # limitations under the License.
> > #
> >
> > ## Config for Kylin Engine ##
> >
> > # List of web servers in use, this enables one web server instance to
> sync
> > up with other servers.
> > kylin.rest.servers=192.168.1.40:7070
> >
> > #set display timezone on UI,format like[GMT+N or GMT-N]
> > kylin.rest.timezone=GMT-8
> > kylin.query.cache.enabled=true
> > # The metadata store in hbase
> > kylin.metadata.url=kylin_metadata@hbase
> >
> > # The storage for final cube file in hbase
> > kylin.storage.url=hbase
> > kylin.job.yarn.app.rest.check.status.url=
> > http://192.168.1.40:8088/ws/v1/cluster/apps/${job_id}?
> > kylin.job.yarn.app.rest.check.interval.seconds=20
> > kylin.query.security.enabled=false
> > # Temp folder in hdfs, make sure user has the right access to the hdfs
> > directory
> > kylin.hdfs.working.dir=/kylin
> >
> > # HBase Cluster FileSystem, which serving hbase, format as
> > hdfs://hbase-cluster:8020
> > # leave empty if hbase running on same cluster with hive and mapreduce
> > kylin.hbase.cluster.fs=hdfs://mycluster/apps/hbase/data
> > kylin.route.hive.enabled=true
> > kylin.route.hive.url=jdbc:hive2://192.168.1.50:10000
> >
> > kylin.job.mapreduce.default.reduce.input.mb=500
> >
> > kylin.server.mode=all
> >
> > # If true, job engine will not assume that hadoop CLI reside on the same
> > server as it self
> > # you will have to specify kylin.job.remote.cli.hostname,
> > kylin.job.remote.cli.username and kylin.job.remote.cli.password
> > # It should not be set to "true" unless you're NOT running Kylin.sh on a
> > hadoop client machine
> > # (Thus kylin instance has to ssh to another real hadoop client machine
> to
> > execute hbase,hive,hadoop commands)
> > kylin.job.run.as.remote.cmd=false
> >
> > # Only necessary when kylin.job.run.as.remote.cmd=true
> > kylin.job.remote.cli.hostname=
> >
> > # Only necessary when kylin.job.run.as.remote.cmd=true
> > kylin.job.remote.cli.username=
> >
> > # Only necessary when kylin.job.run.as.remote.cmd=true
> > kylin.job.remote.cli.password=
> >
> > # Used by test cases to prepare synthetic data for sample cube
> > kylin.job.remote.cli.working.dir=/tmp/kylin
> >
> > # Max count of concurrent jobs running
> > kylin.job.concurrent.max.limit=10
> >
> > # Time interval to check hadoop job status
> > kylin.job.yarn.app.rest.check.interval.seconds=10
> >
> > # Hive database name for putting the intermediate flat tables
> > #kylin.job.hive.database.for.intermediatetable=kylin
> >
> > #default compression codec for htable,snappy,lzo,gzip,lz4
> > kylin.hbase.default.compression.codec=snappy
> >
> > # The cut size for hbase region, in GB.
> > # E.g, for cube whose capacity be marked as "SMALL", split region per
> 10GB
> > by default
> > kylin.hbase.region.cut.small=10
> > kylin.hbase.region.cut.medium=20
> > kylin.hbase.region.cut.large=100
> >
> > # HBase min and max region count
> > kylin.hbase.region.count.min=1
> > kylin.hbase.region.count.max=500
> >
> > ## Config for Restful APP ##
> > # database connection settings:
> > ldap.server=
> > ldap.username=
> > ldap.password=
> > ldap.user.searchBase=
> > ldap.user.searchPattern=
> > ldap.user.groupSearchBase=
> > ldap.service.searchBase=OU=
> > ldap.service.searchPattern=
> > ldap.service.groupSearchBase=
> > acl.adminRole=
> > acl.defaultRole=
> > ganglia.group=
> > ganglia.port=8664
> >
> > ## Config for mail service
> >
> > # If true, will send email notification;
> > mail.enabled=false
> > mail.host=
> > mail.username=
> > mail.password=
> > mail.sender=
> >
> > ###########################config info for web#######################
> >
> > #help info ,format{name|displayName|link} ,optional
> > kylin.web.help.length=4
> > kylin.web.help.0=start|Getting Started|
> > kylin.web.help.1=odbc|ODBC Driver|
> > kylin.web.help.2=tableau|Tableau Guide|
> > kylin.web.help.3=onboard|Cube Design Tutorial|
> > #hadoop url link ,optional
> > kylin.web.hadoop=
> > #job diagnostic url link ,optional
> > kylin.web.diagnostic=
> > #contact mail on web page ,optional
> > kylin.web.contact_mail=
> >
> > ###########################config info for front#######################
> >
> > #env DEV|QA|PROD
> > deploy.env=PROD
> >
> > ###########################config info for sandbox#######################
> > kylin.sandbox=true
> >
> >
> > ###########################config info for kylin
> > monitor#######################
> > # hive jdbc url
> > kylin.monitor.hive.jdbc.connection.url=jdbc:hive2://192.168.1.12:10000
> >
> > #config where to parse query log,split with comma ,will also read
> > $KYLIN_HOME/tomcat/logs/ by default
> > kylin.monitor.ext.log.base.dir = /tmp/kylin_log1,/tmp/kylin_log2
> >
> > #will create external hive table to query result csv file
> > #will set to kylin_query_log by default if not config here
> > kylin.monitor.query.log.parse.result.table = kylin_query_log
> >
> >
> > 2.kylin_job_conf.xml
> > <?xml version="1.0"?>
> > <!--
> > Licensed under the Apache License, Version 2.0 (the "License");
> > you may not use this file except in compliance with the License.
> > You may obtain a copy of the License at
> >
> > http://www.apache.org/licenses/LICENSE-2.0
> >
> > Unless required by applicable law or agreed to in writing, software
> > distributed under the License is distributed on an "AS IS" BASIS,
> > WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> > See the License for the specific language governing permissions and
> > limitations under the License. See accompanying LICENSE file.
> > -->
> >
> > <configuration>
> >
> >     <property>
> >         <name>mapreduce.job.split.metainfo.maxsize</name>
> >         <value>-1</value>
> >         <description>The maximum permissible size of the split metainfo
> > file.
> >             The JobTracker won't attempt to read split metainfo files
> > bigger than
> >             the configured value. No limits if set to -1.
> >         </description>
> >     </property>
> >
> >     <property>
> >         <name>mapred.compress.map.output</name>
> >         <value>true</value>
> >         <description>Compress map outputs</description>
> >     </property>
> >
> >     <property>
> >         <name>mapred.map.output.compression.codec</name>
> >         <value>org.apache.hadoop.io.compress.SnappyCodec</value>
> >         <description>The compression codec to use for map outputs
> >         </description>
> >     </property>
> >
> >     <property>
> >         <name>mapred.output.compress</name>
> >         <value>true</value>
> >         <description>Compress the output of a MapReduce job</description>
> >     </property>
> >
> >     <property>
> >         <name>mapred.output.compression.codec</name>
> >         <value>org.apache.hadoop.io.compress.SnappyCodec</value>
> >         <description>The compression codec to use for job outputs
> >         </description>
> >     </property>
> >
> >     <property>
> >         <name>mapred.output.compression.type</name>
> >         <value>BLOCK</value>
> >         <description>The compression type to use for job
> > outputs</description>
> >     </property>
> >
> >     <property>
> >         <name>mapreduce.job.max.split.locations</name>
> >         <value>2000</value>
> >         <description>No description</description>
> >     </property>
> >
> >     <property>
> >         <name>dfs.replication</name>
> >         <value>1</value>
> >         <description>Block replication</description>
> >     </property>
> >
> >
> >     <property>
> >         <name>hive.merge.mapfiles</name>
> >         <value>true</value>
> >         <description>Enable hive file merge on mapper only
> > job</description>
> >     </property>
> >     <property>
> >         <name>hive.merge.mapredfiles</name>
> >         <value>true</value>
> >         <description>Enable hive file merge on map-reduce
> job</description>
> >     </property>
> >     <property>
> >         <name>hive.merge.size.per.task</name>
> >         <value>268435456</value>
> >         <description>Size for the merged file: 256M</description>
> >     </property>
> >
> >     <property>
> >         <name>hive.support.concurrency</name>
> >         <value>false</value>
> >         <description>Hive concurrency lock</description>
> >     </property>
> > </configuration>
> >
> >
> > 3.debug：
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:48:18,488][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> > - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> > (Store kylin_metadata@hbase)
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:48:38,500][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)]
> > - Going to buffer response body of large or unknown size. Using
> > getResponseBodyAsStream instead is recommended.
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:48:38,501][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)]
> > - Job job_1452156670999_0116 get status check result.
> >
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:48:38,501][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)]
> > - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:48:38,504][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> > - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> > (Store kylin_metadata@hbase)
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:48:58,523][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)]
> > - Going to buffer response body of large or unknown size. Using
> > getResponseBodyAsStream instead is recommended.
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:48:58,523][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)]
> > - Job job_1452156670999_0116 get status check result.
> >
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:48:58,523][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)]
> > - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:48:58,526][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> > - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> > (Store kylin_metadata@hbase)
> > [pool-6-thread-1]:[2016-01-08
> >
> 12:49:08,108][INFO][org.apache.kylin.job.impl.threadpool.DefaultScheduler$FetcherRunner.run(DefaultScheduler.java:112)]
> > - Job Fetcher: 3 running, 3 actual running, 0 ready, 28 others
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:49:18,538][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)]
> > - Going to buffer response body of large or unknown size. Using
> > getResponseBodyAsStream instead is recommended.
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:49:18,538][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)]
> > - Job job_1452156670999_0116 get status check result.
> >
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:49:18,539][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)]
> > - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:49:18,542][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> > - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> > (Store kylin_metadata@hbase)
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:49:38,555][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)]
> > - Going to buffer response body of large or unknown size. Using
> > getResponseBodyAsStream instead is recommended.
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:49:38,555][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)]
> > - Job job_1452156670999_0116 get status check result.
> >
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:49:38,556][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)]
> > - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:49:38,558][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> > - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> > (Store kylin_metadata@hbase)
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:49:58,571][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)]
> > - Going to buffer response body of large or unknown size. Using
> > getResponseBodyAsStream instead is recommended.
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:49:58,571][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)]
> > - Job job_1452156670999_0116 get status check result.
> >
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:49:58,572][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)]
> > - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:49:58,574][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> > - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> > (Store kylin_metadata@hbase)
> > [pool-6-thread-1]:[2016-01-08
> >
> 12:50:08,118][INFO][org.apache.kylin.job.impl.threadpool.DefaultScheduler$FetcherRunner.run(DefaultScheduler.java:112)]
> > - Job Fetcher: 3 running, 3 actual running, 0 ready, 28 others
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:50:18,594][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)]
> > - Going to buffer response body of large or unknown size. Using
> > getResponseBodyAsStream instead is recommended.
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:50:18,594][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)]
> > - Job job_1452156670999_0116 get status check result.
> >
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:50:18,595][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)]
> > - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:50:18,597][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> > - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> > (Store kylin_metadata@hbase)
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:50:38,609][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)]
> > - Going to buffer response body of large or unknown size. Using
> > getResponseBodyAsStream instead is recommended.
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:50:38,610][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)]
> > - Job job_1452156670999_0116 get status check result.
> >
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:50:38,610][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)]
> > - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:50:38,613][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> > - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> > (Store kylin_metadata@hbase)
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:50:58,627][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)]
> > - Going to buffer response body of large or unknown size. Using
> > getResponseBodyAsStream instead is recommended.
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:50:58,627][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)]
> > - Job job_1452156670999_0116 get status check result.
> >
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:50:58,627][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)]
> > - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:50:58,630][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> > - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> > (Store kylin_metadata@hbase)
> > [pool-6-thread-1]:[2016-01-08
> >
> 12:51:08,111][INFO][org.apache.kylin.job.impl.threadpool.DefaultScheduler$FetcherRunner.run(DefaultScheduler.java:112)]
> > - Job Fetcher: 3 running, 3 actual running, 0 ready, 28 others
> > [http-bio-7070-exec-2]:[2016-01-08
> >
> 12:51:11,522][DEBUG][org.apache.kylin.rest.filter.KylinApiFilter.logRequest(KylinApiFilter.java:120)]
> > - REQUEST: REQUESTER=ADMIN;REQ_TIME=GMT-08:00 2016-01-07
> >
> 20:51:11;URI=/kylin/api/jobs;METHOD=GET;QUERY_STRING=limit=15&offset=0&projectName=learn_kylin;PAYLOAD=;RESP_STATUS=200;
> > [http-bio-7070-exec-2]:[2016-01-08
> >
> 12:51:13,793][DEBUG][org.apache.kylin.rest.filter.KylinApiFilter.logRequest(KylinApiFilter.java:120)]
> > - REQUEST: REQUESTER=ADMIN;REQ_TIME=GMT-08:00 2016-01-07
> >
> 20:51:13;URI=/kylin/api/jobs;METHOD=GET;QUERY_STRING=limit=15&offset=0&projectName=learn_kylin;PAYLOAD=;RESP_STATUS=200;
> > [http-bio-7070-exec-2]:[2016-01-08
> >
> 12:51:14,431][DEBUG][org.apache.kylin.rest.filter.KylinApiFilter.logRequest(KylinApiFilter.java:120)]
> > - REQUEST: REQUESTER=ADMIN;REQ_TIME=GMT-08:00 2016-01-07
> >
> 20:51:14;URI=/kylin/api/jobs;METHOD=GET;QUERY_STRING=limit=15&offset=0&projectName=learn_kylin;PAYLOAD=;RESP_STATUS=200;
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:51:18,643][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)]
> > - Going to buffer response body of large or unknown size. Using
> > getResponseBodyAsStream instead is recommended.
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:51:18,643][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)]
> > - Job job_1452156670999_0116 get status check result.
> >
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:51:18,644][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)]
> > - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:51:18,647][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> > - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> > (Store kylin_metadata@hbase)
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:51:38,658][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)]
> > - Going to buffer response body of large or unknown size. Using
> > getResponseBodyAsStream instead is recommended.
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:51:38,659][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)]
> > - Job job_1452156670999_0116 get status check result.
> >
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:51:38,659][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)]
> > - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:51:38,662][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> > - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> > (Store kylin_metadata@hbase)
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:51:58,674][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)]
> > - Going to buffer response body of large or unknown size. Using
> > getResponseBodyAsStream instead is recommended.
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:51:58,674][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)]
> > - Job job_1452156670999_0116 get status check result.
> >
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:51:58,675][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)]
> > - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:51:58,677][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> > - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> > (Store kylin_metadata@hbase)
> > [pool-6-thread-1]:[2016-01-08
> >
> 12:52:08,118][INFO][org.apache.kylin.job.impl.threadpool.DefaultScheduler$FetcherRunner.run(DefaultScheduler.java:112)]
> > - Job Fetcher: 3 running, 3 actual running, 0 ready, 28 others
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:52:18,696][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)]
> > - Going to buffer response body of large or unknown size. Using
> > getResponseBodyAsStream instead is recommended.
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:52:18,697][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)]
> > - Job job_1452156670999_0116 get status check result.
> >
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:52:18,697][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)]
> > - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:52:18,700][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> > - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> > (Store kylin_metadata@hbase)
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:52:38,712][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)]
> > - Going to buffer response body of large or unknown size. Using
> > getResponseBodyAsStream instead is recommended.
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:52:38,713][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)]
> > - Job job_1452156670999_0116 get status check result.
> >
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:52:38,713][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)]
> > - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:52:38,716][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> > - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> > (Store kylin_metadata@hbase)
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:52:58,728][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)]
> > - Going to buffer response body of large or unknown size. Using
> > getResponseBodyAsStream instead is recommended.
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:52:58,728][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)]
> > - Job job_1452156670999_0116 get status check result.
> >
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:52:58,729][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)]
> > - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:52:58,731][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> > - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> > (Store kylin_metadata@hbase)
> > [pool-6-thread-1]:[2016-01-08
> >
> 12:53:08,104][INFO][org.apache.kylin.job.impl.threadpool.DefaultScheduler$FetcherRunner.run(DefaultScheduler.java:112)]
> > - Job Fetcher: 3 running, 3 actual running, 0 ready, 28 others
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:53:18,744][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)]
> > - Going to buffer response body of large or unknown size. Using
> > getResponseBodyAsStream instead is recommended.
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:53:18,744][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)]
> > - Job job_1452156670999_0116 get status check result.
> >
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:53:18,745][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)]
> > - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:53:18,747][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> > - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> > (Store kylin_metadata@hbase)
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:53:38,760][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)]
> > - Going to buffer response body of large or unknown size. Using
> > getResponseBodyAsStream instead is recommended.
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:53:38,760][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)]
> > - Job job_1452156670999_0116 get status check result.
> >
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:53:38,761][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)]
> > - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
> > [pool-7-thread-8]:[2016-01-08
> >
> 12:53:38,764][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> > - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> > (Store kylin_metadata@hbase)
> >
> >
> >
> > wenyefbl@163.com
> >
>

Re: Re: How to improve the performance of job！

Posted by "wenyefbl@163.com" <we...@163.com>.

I modified the mapreduce.input.fileinputformat.split.maxsize parameter according to your proposal, and now it's wrong:

Query returned non-zero code: 1, cause: 'SET mapreduce.input.fileinputformat.split.maxsize=64MB' FAILED because mapreduce.input.fileinputformat.split.maxsize expects LONG type value.

        at org.apache.kylin.common.util.CliCommandExecutor.execute(CliCommandExecutor.java:90)
        at org.apache.kylin.job.common.ShellExecutable.doWork(ShellExecutable.java:52)
        at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
        at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:51)
        at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
        at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:130)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

My profile kylin_job_conf.xml：
    <property>
    <name>mapreduce.input.fileinputformat.split.maxsize</name>
              <value>64MB</value>
        <description>Hive concurrency lock</description>
    </property>



wenyefbl@163.com
 
发件人： yu feng
发送时间： 2016-01-08 13:21
收件人： dev
主题： Re: How to improve the performance of job！
According to our experience: you can try those :
1、use newer hive to promote the first step.
2、startup more mapper and reducer for every MR job, you can reduce the
value of 'kylin.job.mapreduce.default.reduce.input.mb' in kylin.properties
which means input size for every reducer in NDCuboid calculation steps.
smaller value means more reducer.
3、 you can set the property
'mapreduce.input.fileinputformat.split.maxsize'('mapred.max.split.size' in
prior hadoop version) in kylin_job_conf.xml, which means the max split size
of a mapper, we set the value less than block size of hadoop cluster, such
as 64MB
4、 try to set cube size as SMALL while creating cube, which can increase
reducer number while generate Hfile.
 
Hope it is helpful to you .
 
2016-01-08 13:00 GMT+08:00 wenyefbl@163.com <we...@163.com>:
 
> I have five machines (8 core, 32g MEM), using HDP 2.3 building cluster
> environment, version of the kyling Kyline
> apache-kylin-1.3-HBase-1.1-SNAPSHOT-bin, HBase for Version 1.1.1, hive
> table data is now 30000000 ，but now job running the one hour, job schedule
> is about 10%, view the task of MR found that job is not running to MR Do
> you have any way to improve the performance of the job：
> this is my configure：
> 1.kylin.properties
> #
> # Licensed to the Apache Software Foundation (ASF) under one or more
> # contributor license agreements.  See the NOTICE file distributed with
> # this work for additional information regarding copyright ownership.
> # The ASF licenses this file to You under the Apache License, Version 2.0
> # (the "License"); you may not use this file except in compliance with
> # the License.  You may obtain a copy of the License at
> #
> #    http://www.apache.org/licenses/LICENSE-2.0
> #
> # Unless required by applicable law or agreed to in writing, software
> # distributed under the License is distributed on an "AS IS" BASIS,
> # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> # See the License for the specific language governing permissions and
> # limitations under the License.
> #
>
> ## Config for Kylin Engine ##
>
> # List of web servers in use, this enables one web server instance to sync
> up with other servers.
> kylin.rest.servers=192.168.1.40:7070
>
> #set display timezone on UI,format like[GMT+N or GMT-N]
> kylin.rest.timezone=GMT-8
> kylin.query.cache.enabled=true
> # The metadata store in hbase
> kylin.metadata.url=kylin_metadata@hbase
>
> # The storage for final cube file in hbase
> kylin.storage.url=hbase
> kylin.job.yarn.app.rest.check.status.url=
> http://192.168.1.40:8088/ws/v1/cluster/apps/${job_id}?
> kylin.job.yarn.app.rest.check.interval.seconds=20
> kylin.query.security.enabled=false
> # Temp folder in hdfs, make sure user has the right access to the hdfs
> directory
> kylin.hdfs.working.dir=/kylin
>
> # HBase Cluster FileSystem, which serving hbase, format as
> hdfs://hbase-cluster:8020
> # leave empty if hbase running on same cluster with hive and mapreduce
> kylin.hbase.cluster.fs=hdfs://mycluster/apps/hbase/data
> kylin.route.hive.enabled=true
> kylin.route.hive.url=jdbc:hive2://192.168.1.50:10000
>
> kylin.job.mapreduce.default.reduce.input.mb=500
>
> kylin.server.mode=all
>
> # If true, job engine will not assume that hadoop CLI reside on the same
> server as it self
> # you will have to specify kylin.job.remote.cli.hostname,
> kylin.job.remote.cli.username and kylin.job.remote.cli.password
> # It should not be set to "true" unless you're NOT running Kylin.sh on a
> hadoop client machine
> # (Thus kylin instance has to ssh to another real hadoop client machine to
> execute hbase,hive,hadoop commands)
> kylin.job.run.as.remote.cmd=false
>
> # Only necessary when kylin.job.run.as.remote.cmd=true
> kylin.job.remote.cli.hostname=
>
> # Only necessary when kylin.job.run.as.remote.cmd=true
> kylin.job.remote.cli.username=
>
> # Only necessary when kylin.job.run.as.remote.cmd=true
> kylin.job.remote.cli.password=
>
> # Used by test cases to prepare synthetic data for sample cube
> kylin.job.remote.cli.working.dir=/tmp/kylin
>
> # Max count of concurrent jobs running
> kylin.job.concurrent.max.limit=10
>
> # Time interval to check hadoop job status
> kylin.job.yarn.app.rest.check.interval.seconds=10
>
> # Hive database name for putting the intermediate flat tables
> #kylin.job.hive.database.for.intermediatetable=kylin
>
> #default compression codec for htable,snappy,lzo,gzip,lz4
> kylin.hbase.default.compression.codec=snappy
>
> # The cut size for hbase region, in GB.
> # E.g, for cube whose capacity be marked as "SMALL", split region per 10GB
> by default
> kylin.hbase.region.cut.small=10
> kylin.hbase.region.cut.medium=20
> kylin.hbase.region.cut.large=100
>
> # HBase min and max region count
> kylin.hbase.region.count.min=1
> kylin.hbase.region.count.max=500
>
> ## Config for Restful APP ##
> # database connection settings:
> ldap.server=
> ldap.username=
> ldap.password=
> ldap.user.searchBase=
> ldap.user.searchPattern=
> ldap.user.groupSearchBase=
> ldap.service.searchBase=OU=
> ldap.service.searchPattern=
> ldap.service.groupSearchBase=
> acl.adminRole=
> acl.defaultRole=
> ganglia.group=
> ganglia.port=8664
>
> ## Config for mail service
>
> # If true, will send email notification;
> mail.enabled=false
> mail.host=
> mail.username=
> mail.password=
> mail.sender=
>
> ###########################config info for web#######################
>
> #help info ,format{name|displayName|link} ,optional
> kylin.web.help.length=4
> kylin.web.help.0=start|Getting Started|
> kylin.web.help.1=odbc|ODBC Driver|
> kylin.web.help.2=tableau|Tableau Guide|
> kylin.web.help.3=onboard|Cube Design Tutorial|
> #hadoop url link ,optional
> kylin.web.hadoop=
> #job diagnostic url link ,optional
> kylin.web.diagnostic=
> #contact mail on web page ,optional
> kylin.web.contact_mail=
>
> ###########################config info for front#######################
>
> #env DEV|QA|PROD
> deploy.env=PROD
>
> ###########################config info for sandbox#######################
> kylin.sandbox=true
>
>
> ###########################config info for kylin
> monitor#######################
> # hive jdbc url
> kylin.monitor.hive.jdbc.connection.url=jdbc:hive2://192.168.1.12:10000
>
> #config where to parse query log,split with comma ,will also read
> $KYLIN_HOME/tomcat/logs/ by default
> kylin.monitor.ext.log.base.dir = /tmp/kylin_log1,/tmp/kylin_log2
>
> #will create external hive table to query result csv file
> #will set to kylin_query_log by default if not config here
> kylin.monitor.query.log.parse.result.table = kylin_query_log
>
>
> 2.kylin_job_conf.xml
> <?xml version="1.0"?>
> <!--
> Licensed under the Apache License, Version 2.0 (the "License");
> you may not use this file except in compliance with the License.
> You may obtain a copy of the License at
>
> http://www.apache.org/licenses/LICENSE-2.0
>
> Unless required by applicable law or agreed to in writing, software
> distributed under the License is distributed on an "AS IS" BASIS,
> WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> See the License for the specific language governing permissions and
> limitations under the License. See accompanying LICENSE file.
> -->
>
> <configuration>
>
>     <property>
>         <name>mapreduce.job.split.metainfo.maxsize</name>
>         <value>-1</value>
>         <description>The maximum permissible size of the split metainfo
> file.
>             The JobTracker won't attempt to read split metainfo files
> bigger than
>             the configured value. No limits if set to -1.
>         </description>
>     </property>
>
>     <property>
>         <name>mapred.compress.map.output</name>
>         <value>true</value>
>         <description>Compress map outputs</description>
>     </property>
>
>     <property>
>         <name>mapred.map.output.compression.codec</name>
>         <value>org.apache.hadoop.io.compress.SnappyCodec</value>
>         <description>The compression codec to use for map outputs
>         </description>
>     </property>
>
>     <property>
>         <name>mapred.output.compress</name>
>         <value>true</value>
>         <description>Compress the output of a MapReduce job</description>
>     </property>
>
>     <property>
>         <name>mapred.output.compression.codec</name>
>         <value>org.apache.hadoop.io.compress.SnappyCodec</value>
>         <description>The compression codec to use for job outputs
>         </description>
>     </property>
>
>     <property>
>         <name>mapred.output.compression.type</name>
>         <value>BLOCK</value>
>         <description>The compression type to use for job
> outputs</description>
>     </property>
>
>     <property>
>         <name>mapreduce.job.max.split.locations</name>
>         <value>2000</value>
>         <description>No description</description>
>     </property>
>
>     <property>
>         <name>dfs.replication</name>
>         <value>1</value>
>         <description>Block replication</description>
>     </property>
>
>
>     <property>
>         <name>hive.merge.mapfiles</name>
>         <value>true</value>
>         <description>Enable hive file merge on mapper only
> job</description>
>     </property>
>     <property>
>         <name>hive.merge.mapredfiles</name>
>         <value>true</value>
>         <description>Enable hive file merge on map-reduce job</description>
>     </property>
>     <property>
>         <name>hive.merge.size.per.task</name>
>         <value>268435456</value>
>         <description>Size for the merged file: 256M</description>
>     </property>
>
>     <property>
>         <name>hive.support.concurrency</name>
>         <value>false</value>
>         <description>Hive concurrency lock</description>
>     </property>
> </configuration>
>
>
> 3.debug：
> [pool-7-thread-8]:[2016-01-08
> 12:48:18,488][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> (Store kylin_metadata@hbase)
> [pool-7-thread-8]:[2016-01-08
> 12:48:38,500][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)]
> - Going to buffer response body of large or unknown size. Using
> getResponseBodyAsStream instead is recommended.
> [pool-7-thread-8]:[2016-01-08
> 12:48:38,501][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)]
> - Job job_1452156670999_0116 get status check result.
>
> [pool-7-thread-8]:[2016-01-08
> 12:48:38,501][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)]
> - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
> [pool-7-thread-8]:[2016-01-08
> 12:48:38,504][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> (Store kylin_metadata@hbase)
> [pool-7-thread-8]:[2016-01-08
> 12:48:58,523][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)]
> - Going to buffer response body of large or unknown size. Using
> getResponseBodyAsStream instead is recommended.
> [pool-7-thread-8]:[2016-01-08
> 12:48:58,523][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)]
> - Job job_1452156670999_0116 get status check result.
>
> [pool-7-thread-8]:[2016-01-08
> 12:48:58,523][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)]
> - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
> [pool-7-thread-8]:[2016-01-08
> 12:48:58,526][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> (Store kylin_metadata@hbase)
> [pool-6-thread-1]:[2016-01-08
> 12:49:08,108][INFO][org.apache.kylin.job.impl.threadpool.DefaultScheduler$FetcherRunner.run(DefaultScheduler.java:112)]
> - Job Fetcher: 3 running, 3 actual running, 0 ready, 28 others
> [pool-7-thread-8]:[2016-01-08
> 12:49:18,538][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)]
> - Going to buffer response body of large or unknown size. Using
> getResponseBodyAsStream instead is recommended.
> [pool-7-thread-8]:[2016-01-08
> 12:49:18,538][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)]
> - Job job_1452156670999_0116 get status check result.
>
> [pool-7-thread-8]:[2016-01-08
> 12:49:18,539][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)]
> - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
> [pool-7-thread-8]:[2016-01-08
> 12:49:18,542][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> (Store kylin_metadata@hbase)
> [pool-7-thread-8]:[2016-01-08
> 12:49:38,555][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)]
> - Going to buffer response body of large or unknown size. Using
> getResponseBodyAsStream instead is recommended.
> [pool-7-thread-8]:[2016-01-08
> 12:49:38,555][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)]
> - Job job_1452156670999_0116 get status check result.
>
> [pool-7-thread-8]:[2016-01-08
> 12:49:38,556][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)]
> - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
> [pool-7-thread-8]:[2016-01-08
> 12:49:38,558][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> (Store kylin_metadata@hbase)
> [pool-7-thread-8]:[2016-01-08
> 12:49:58,571][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)]
> - Going to buffer response body of large or unknown size. Using
> getResponseBodyAsStream instead is recommended.
> [pool-7-thread-8]:[2016-01-08
> 12:49:58,571][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)]
> - Job job_1452156670999_0116 get status check result.
>
> [pool-7-thread-8]:[2016-01-08
> 12:49:58,572][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)]
> - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
> [pool-7-thread-8]:[2016-01-08
> 12:49:58,574][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> (Store kylin_metadata@hbase)
> [pool-6-thread-1]:[2016-01-08
> 12:50:08,118][INFO][org.apache.kylin.job.impl.threadpool.DefaultScheduler$FetcherRunner.run(DefaultScheduler.java:112)]
> - Job Fetcher: 3 running, 3 actual running, 0 ready, 28 others
> [pool-7-thread-8]:[2016-01-08
> 12:50:18,594][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)]
> - Going to buffer response body of large or unknown size. Using
> getResponseBodyAsStream instead is recommended.
> [pool-7-thread-8]:[2016-01-08
> 12:50:18,594][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)]
> - Job job_1452156670999_0116 get status check result.
>
> [pool-7-thread-8]:[2016-01-08
> 12:50:18,595][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)]
> - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
> [pool-7-thread-8]:[2016-01-08
> 12:50:18,597][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> (Store kylin_metadata@hbase)
> [pool-7-thread-8]:[2016-01-08
> 12:50:38,609][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)]
> - Going to buffer response body of large or unknown size. Using
> getResponseBodyAsStream instead is recommended.
> [pool-7-thread-8]:[2016-01-08
> 12:50:38,610][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)]
> - Job job_1452156670999_0116 get status check result.
>
> [pool-7-thread-8]:[2016-01-08
> 12:50:38,610][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)]
> - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
> [pool-7-thread-8]:[2016-01-08
> 12:50:38,613][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> (Store kylin_metadata@hbase)
> [pool-7-thread-8]:[2016-01-08
> 12:50:58,627][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)]
> - Going to buffer response body of large or unknown size. Using
> getResponseBodyAsStream instead is recommended.
> [pool-7-thread-8]:[2016-01-08
> 12:50:58,627][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)]
> - Job job_1452156670999_0116 get status check result.
>
> [pool-7-thread-8]:[2016-01-08
> 12:50:58,627][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)]
> - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
> [pool-7-thread-8]:[2016-01-08
> 12:50:58,630][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> (Store kylin_metadata@hbase)
> [pool-6-thread-1]:[2016-01-08
> 12:51:08,111][INFO][org.apache.kylin.job.impl.threadpool.DefaultScheduler$FetcherRunner.run(DefaultScheduler.java:112)]
> - Job Fetcher: 3 running, 3 actual running, 0 ready, 28 others
> [http-bio-7070-exec-2]:[2016-01-08
> 12:51:11,522][DEBUG][org.apache.kylin.rest.filter.KylinApiFilter.logRequest(KylinApiFilter.java:120)]
> - REQUEST: REQUESTER=ADMIN;REQ_TIME=GMT-08:00 2016-01-07
> 20:51:11;URI=/kylin/api/jobs;METHOD=GET;QUERY_STRING=limit=15&offset=0&projectName=learn_kylin;PAYLOAD=;RESP_STATUS=200;
> [http-bio-7070-exec-2]:[2016-01-08
> 12:51:13,793][DEBUG][org.apache.kylin.rest.filter.KylinApiFilter.logRequest(KylinApiFilter.java:120)]
> - REQUEST: REQUESTER=ADMIN;REQ_TIME=GMT-08:00 2016-01-07
> 20:51:13;URI=/kylin/api/jobs;METHOD=GET;QUERY_STRING=limit=15&offset=0&projectName=learn_kylin;PAYLOAD=;RESP_STATUS=200;
> [http-bio-7070-exec-2]:[2016-01-08
> 12:51:14,431][DEBUG][org.apache.kylin.rest.filter.KylinApiFilter.logRequest(KylinApiFilter.java:120)]
> - REQUEST: REQUESTER=ADMIN;REQ_TIME=GMT-08:00 2016-01-07
> 20:51:14;URI=/kylin/api/jobs;METHOD=GET;QUERY_STRING=limit=15&offset=0&projectName=learn_kylin;PAYLOAD=;RESP_STATUS=200;
> [pool-7-thread-8]:[2016-01-08
> 12:51:18,643][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)]
> - Going to buffer response body of large or unknown size. Using
> getResponseBodyAsStream instead is recommended.
> [pool-7-thread-8]:[2016-01-08
> 12:51:18,643][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)]
> - Job job_1452156670999_0116 get status check result.
>
> [pool-7-thread-8]:[2016-01-08
> 12:51:18,644][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)]
> - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
> [pool-7-thread-8]:[2016-01-08
> 12:51:18,647][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> (Store kylin_metadata@hbase)
> [pool-7-thread-8]:[2016-01-08
> 12:51:38,658][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)]
> - Going to buffer response body of large or unknown size. Using
> getResponseBodyAsStream instead is recommended.
> [pool-7-thread-8]:[2016-01-08
> 12:51:38,659][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)]
> - Job job_1452156670999_0116 get status check result.
>
> [pool-7-thread-8]:[2016-01-08
> 12:51:38,659][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)]
> - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
> [pool-7-thread-8]:[2016-01-08
> 12:51:38,662][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> (Store kylin_metadata@hbase)
> [pool-7-thread-8]:[2016-01-08
> 12:51:58,674][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)]
> - Going to buffer response body of large or unknown size. Using
> getResponseBodyAsStream instead is recommended.
> [pool-7-thread-8]:[2016-01-08
> 12:51:58,674][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)]
> - Job job_1452156670999_0116 get status check result.
>
> [pool-7-thread-8]:[2016-01-08
> 12:51:58,675][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)]
> - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
> [pool-7-thread-8]:[2016-01-08
> 12:51:58,677][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> (Store kylin_metadata@hbase)
> [pool-6-thread-1]:[2016-01-08
> 12:52:08,118][INFO][org.apache.kylin.job.impl.threadpool.DefaultScheduler$FetcherRunner.run(DefaultScheduler.java:112)]
> - Job Fetcher: 3 running, 3 actual running, 0 ready, 28 others
> [pool-7-thread-8]:[2016-01-08
> 12:52:18,696][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)]
> - Going to buffer response body of large or unknown size. Using
> getResponseBodyAsStream instead is recommended.
> [pool-7-thread-8]:[2016-01-08
> 12:52:18,697][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)]
> - Job job_1452156670999_0116 get status check result.
>
> [pool-7-thread-8]:[2016-01-08
> 12:52:18,697][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)]
> - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
> [pool-7-thread-8]:[2016-01-08
> 12:52:18,700][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> (Store kylin_metadata@hbase)
> [pool-7-thread-8]:[2016-01-08
> 12:52:38,712][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)]
> - Going to buffer response body of large or unknown size. Using
> getResponseBodyAsStream instead is recommended.
> [pool-7-thread-8]:[2016-01-08
> 12:52:38,713][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)]
> - Job job_1452156670999_0116 get status check result.
>
> [pool-7-thread-8]:[2016-01-08
> 12:52:38,713][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)]
> - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
> [pool-7-thread-8]:[2016-01-08
> 12:52:38,716][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> (Store kylin_metadata@hbase)
> [pool-7-thread-8]:[2016-01-08
> 12:52:58,728][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)]
> - Going to buffer response body of large or unknown size. Using
> getResponseBodyAsStream instead is recommended.
> [pool-7-thread-8]:[2016-01-08
> 12:52:58,728][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)]
> - Job job_1452156670999_0116 get status check result.
>
> [pool-7-thread-8]:[2016-01-08
> 12:52:58,729][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)]
> - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
> [pool-7-thread-8]:[2016-01-08
> 12:52:58,731][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> (Store kylin_metadata@hbase)
> [pool-6-thread-1]:[2016-01-08
> 12:53:08,104][INFO][org.apache.kylin.job.impl.threadpool.DefaultScheduler$FetcherRunner.run(DefaultScheduler.java:112)]
> - Job Fetcher: 3 running, 3 actual running, 0 ready, 28 others
> [pool-7-thread-8]:[2016-01-08
> 12:53:18,744][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)]
> - Going to buffer response body of large or unknown size. Using
> getResponseBodyAsStream instead is recommended.
> [pool-7-thread-8]:[2016-01-08
> 12:53:18,744][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)]
> - Job job_1452156670999_0116 get status check result.
>
> [pool-7-thread-8]:[2016-01-08
> 12:53:18,745][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)]
> - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
> [pool-7-thread-8]:[2016-01-08
> 12:53:18,747][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> (Store kylin_metadata@hbase)
> [pool-7-thread-8]:[2016-01-08
> 12:53:38,760][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)]
> - Going to buffer response body of large or unknown size. Using
> getResponseBodyAsStream instead is recommended.
> [pool-7-thread-8]:[2016-01-08
> 12:53:38,760][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)]
> - Job job_1452156670999_0116 get status check result.
>
> [pool-7-thread-8]:[2016-01-08
> 12:53:38,761][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)]
> - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
> [pool-7-thread-8]:[2016-01-08
> 12:53:38,764][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> (Store kylin_metadata@hbase)
>
>
>
> wenyefbl@163.com
>

Re: How to improve the performance of job！

Posted by yu feng <ol...@gmail.com>.

According to our experience: you can try those :
1、use newer hive to promote the first step.
2、startup more mapper and reducer for every MR job, you can reduce the
value of 'kylin.job.mapreduce.default.reduce.input.mb' in kylin.properties
which means input size for every reducer in NDCuboid calculation steps.
smaller value means more reducer.
3、 you can set the property
'mapreduce.input.fileinputformat.split.maxsize'('mapred.max.split.size' in
prior hadoop version) in kylin_job_conf.xml, which means the max split size
of a mapper, we set the value less than block size of hadoop cluster, such
as 64MB
4、 try to set cube size as SMALL while creating cube, which can increase
reducer number while generate Hfile.

Hope it is helpful to you .

2016-01-08 13:00 GMT+08:00 wenyefbl@163.com <we...@163.com>:

> I have five machines (8 core, 32g MEM), using HDP 2.3 building cluster
> environment, version of the kyling Kyline
> apache-kylin-1.3-HBase-1.1-SNAPSHOT-bin, HBase for Version 1.1.1, hive
> table data is now 30000000 ，but now job running the one hour, job schedule
> is about 10%, view the task of MR found that job is not running to MR Do
> you have any way to improve the performance of the job：
> this is my configure：
> 1.kylin.properties
> #
> # Licensed to the Apache Software Foundation (ASF) under one or more
> # contributor license agreements.  See the NOTICE file distributed with
> # this work for additional information regarding copyright ownership.
> # The ASF licenses this file to You under the Apache License, Version 2.0
> # (the "License"); you may not use this file except in compliance with
> # the License.  You may obtain a copy of the License at
> #
> #    http://www.apache.org/licenses/LICENSE-2.0
> #
> # Unless required by applicable law or agreed to in writing, software
> # distributed under the License is distributed on an "AS IS" BASIS,
> # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> # See the License for the specific language governing permissions and
> # limitations under the License.
> #
>
> ## Config for Kylin Engine ##
>
> # List of web servers in use, this enables one web server instance to sync
> up with other servers.
> kylin.rest.servers=192.168.1.40:7070
>
> #set display timezone on UI,format like[GMT+N or GMT-N]
> kylin.rest.timezone=GMT-8
> kylin.query.cache.enabled=true
> # The metadata store in hbase
> kylin.metadata.url=kylin_metadata@hbase
>
> # The storage for final cube file in hbase
> kylin.storage.url=hbase
> kylin.job.yarn.app.rest.check.status.url=
> http://192.168.1.40:8088/ws/v1/cluster/apps/${job_id}?
> kylin.job.yarn.app.rest.check.interval.seconds=20
> kylin.query.security.enabled=false
> # Temp folder in hdfs, make sure user has the right access to the hdfs
> directory
> kylin.hdfs.working.dir=/kylin
>
> # HBase Cluster FileSystem, which serving hbase, format as
> hdfs://hbase-cluster:8020
> # leave empty if hbase running on same cluster with hive and mapreduce
> kylin.hbase.cluster.fs=hdfs://mycluster/apps/hbase/data
> kylin.route.hive.enabled=true
> kylin.route.hive.url=jdbc:hive2://192.168.1.50:10000
>
> kylin.job.mapreduce.default.reduce.input.mb=500
>
> kylin.server.mode=all
>
> # If true, job engine will not assume that hadoop CLI reside on the same
> server as it self
> # you will have to specify kylin.job.remote.cli.hostname,
> kylin.job.remote.cli.username and kylin.job.remote.cli.password
> # It should not be set to "true" unless you're NOT running Kylin.sh on a
> hadoop client machine
> # (Thus kylin instance has to ssh to another real hadoop client machine to
> execute hbase,hive,hadoop commands)
> kylin.job.run.as.remote.cmd=false
>
> # Only necessary when kylin.job.run.as.remote.cmd=true
> kylin.job.remote.cli.hostname=
>
> # Only necessary when kylin.job.run.as.remote.cmd=true
> kylin.job.remote.cli.username=
>
> # Only necessary when kylin.job.run.as.remote.cmd=true
> kylin.job.remote.cli.password=
>
> # Used by test cases to prepare synthetic data for sample cube
> kylin.job.remote.cli.working.dir=/tmp/kylin
>
> # Max count of concurrent jobs running
> kylin.job.concurrent.max.limit=10
>
> # Time interval to check hadoop job status
> kylin.job.yarn.app.rest.check.interval.seconds=10
>
> # Hive database name for putting the intermediate flat tables
> #kylin.job.hive.database.for.intermediatetable=kylin
>
> #default compression codec for htable,snappy,lzo,gzip,lz4
> kylin.hbase.default.compression.codec=snappy
>
> # The cut size for hbase region, in GB.
> # E.g, for cube whose capacity be marked as "SMALL", split region per 10GB
> by default
> kylin.hbase.region.cut.small=10
> kylin.hbase.region.cut.medium=20
> kylin.hbase.region.cut.large=100
>
> # HBase min and max region count
> kylin.hbase.region.count.min=1
> kylin.hbase.region.count.max=500
>
> ## Config for Restful APP ##
> # database connection settings:
> ldap.server=
> ldap.username=
> ldap.password=
> ldap.user.searchBase=
> ldap.user.searchPattern=
> ldap.user.groupSearchBase=
> ldap.service.searchBase=OU=
> ldap.service.searchPattern=
> ldap.service.groupSearchBase=
> acl.adminRole=
> acl.defaultRole=
> ganglia.group=
> ganglia.port=8664
>
> ## Config for mail service
>
> # If true, will send email notification;
> mail.enabled=false
> mail.host=
> mail.username=
> mail.password=
> mail.sender=
>
> ###########################config info for web#######################
>
> #help info ,format{name|displayName|link} ,optional
> kylin.web.help.length=4
> kylin.web.help.0=start|Getting Started|
> kylin.web.help.1=odbc|ODBC Driver|
> kylin.web.help.2=tableau|Tableau Guide|
> kylin.web.help.3=onboard|Cube Design Tutorial|
> #hadoop url link ,optional
> kylin.web.hadoop=
> #job diagnostic url link ,optional
> kylin.web.diagnostic=
> #contact mail on web page ,optional
> kylin.web.contact_mail=
>
> ###########################config info for front#######################
>
> #env DEV|QA|PROD
> deploy.env=PROD
>
> ###########################config info for sandbox#######################
> kylin.sandbox=true
>
>
> ###########################config info for kylin
> monitor#######################
> # hive jdbc url
> kylin.monitor.hive.jdbc.connection.url=jdbc:hive2://192.168.1.12:10000
>
> #config where to parse query log,split with comma ,will also read
> $KYLIN_HOME/tomcat/logs/ by default
> kylin.monitor.ext.log.base.dir = /tmp/kylin_log1,/tmp/kylin_log2
>
> #will create external hive table to query result csv file
> #will set to kylin_query_log by default if not config here
> kylin.monitor.query.log.parse.result.table = kylin_query_log
>
>
> 2.kylin_job_conf.xml
> <?xml version="1.0"?>
> <!--
> Licensed under the Apache License, Version 2.0 (the "License");
> you may not use this file except in compliance with the License.
> You may obtain a copy of the License at
>
> http://www.apache.org/licenses/LICENSE-2.0
>
> Unless required by applicable law or agreed to in writing, software
> distributed under the License is distributed on an "AS IS" BASIS,
> WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> See the License for the specific language governing permissions and
> limitations under the License. See accompanying LICENSE file.
> -->
>
> <configuration>
>
>     <property>
>         <name>mapreduce.job.split.metainfo.maxsize</name>
>         <value>-1</value>
>         <description>The maximum permissible size of the split metainfo
> file.
>             The JobTracker won't attempt to read split metainfo files
> bigger than
>             the configured value. No limits if set to -1.
>         </description>
>     </property>
>
>     <property>
>         <name>mapred.compress.map.output</name>
>         <value>true</value>
>         <description>Compress map outputs</description>
>     </property>
>
>     <property>
>         <name>mapred.map.output.compression.codec</name>
>         <value>org.apache.hadoop.io.compress.SnappyCodec</value>
>         <description>The compression codec to use for map outputs
>         </description>
>     </property>
>
>     <property>
>         <name>mapred.output.compress</name>
>         <value>true</value>
>         <description>Compress the output of a MapReduce job</description>
>     </property>
>
>     <property>
>         <name>mapred.output.compression.codec</name>
>         <value>org.apache.hadoop.io.compress.SnappyCodec</value>
>         <description>The compression codec to use for job outputs
>         </description>
>     </property>
>
>     <property>
>         <name>mapred.output.compression.type</name>
>         <value>BLOCK</value>
>         <description>The compression type to use for job
> outputs</description>
>     </property>
>
>     <property>
>         <name>mapreduce.job.max.split.locations</name>
>         <value>2000</value>
>         <description>No description</description>
>     </property>
>
>     <property>
>         <name>dfs.replication</name>
>         <value>1</value>
>         <description>Block replication</description>
>     </property>
>
>
>     <property>
>         <name>hive.merge.mapfiles</name>
>         <value>true</value>
>         <description>Enable hive file merge on mapper only
> job</description>
>     </property>
>     <property>
>         <name>hive.merge.mapredfiles</name>
>         <value>true</value>
>         <description>Enable hive file merge on map-reduce job</description>
>     </property>
>     <property>
>         <name>hive.merge.size.per.task</name>
>         <value>268435456</value>
>         <description>Size for the merged file: 256M</description>
>     </property>
>
>     <property>
>         <name>hive.support.concurrency</name>
>         <value>false</value>
>         <description>Hive concurrency lock</description>
>     </property>
> </configuration>
>
>
> 3.debug：
> [pool-7-thread-8]:[2016-01-08
> 12:48:18,488][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> (Store kylin_metadata@hbase)
> [pool-7-thread-8]:[2016-01-08
> 12:48:38,500][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)]
> - Going to buffer response body of large or unknown size. Using
> getResponseBodyAsStream instead is recommended.
> [pool-7-thread-8]:[2016-01-08
> 12:48:38,501][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)]
> - Job job_1452156670999_0116 get status check result.
>
> [pool-7-thread-8]:[2016-01-08
> 12:48:38,501][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)]
> - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
> [pool-7-thread-8]:[2016-01-08
> 12:48:38,504][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> (Store kylin_metadata@hbase)
> [pool-7-thread-8]:[2016-01-08
> 12:48:58,523][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)]
> - Going to buffer response body of large or unknown size. Using
> getResponseBodyAsStream instead is recommended.
> [pool-7-thread-8]:[2016-01-08
> 12:48:58,523][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)]
> - Job job_1452156670999_0116 get status check result.
>
> [pool-7-thread-8]:[2016-01-08
> 12:48:58,523][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)]
> - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
> [pool-7-thread-8]:[2016-01-08
> 12:48:58,526][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> (Store kylin_metadata@hbase)
> [pool-6-thread-1]:[2016-01-08
> 12:49:08,108][INFO][org.apache.kylin.job.impl.threadpool.DefaultScheduler$FetcherRunner.run(DefaultScheduler.java:112)]
> - Job Fetcher: 3 running, 3 actual running, 0 ready, 28 others
> [pool-7-thread-8]:[2016-01-08
> 12:49:18,538][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)]
> - Going to buffer response body of large or unknown size. Using
> getResponseBodyAsStream instead is recommended.
> [pool-7-thread-8]:[2016-01-08
> 12:49:18,538][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)]
> - Job job_1452156670999_0116 get status check result.
>
> [pool-7-thread-8]:[2016-01-08
> 12:49:18,539][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)]
> - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
> [pool-7-thread-8]:[2016-01-08
> 12:49:18,542][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> (Store kylin_metadata@hbase)
> [pool-7-thread-8]:[2016-01-08
> 12:49:38,555][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)]
> - Going to buffer response body of large or unknown size. Using
> getResponseBodyAsStream instead is recommended.
> [pool-7-thread-8]:[2016-01-08
> 12:49:38,555][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)]
> - Job job_1452156670999_0116 get status check result.
>
> [pool-7-thread-8]:[2016-01-08
> 12:49:38,556][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)]
> - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
> [pool-7-thread-8]:[2016-01-08
> 12:49:38,558][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> (Store kylin_metadata@hbase)
> [pool-7-thread-8]:[2016-01-08
> 12:49:58,571][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)]
> - Going to buffer response body of large or unknown size. Using
> getResponseBodyAsStream instead is recommended.
> [pool-7-thread-8]:[2016-01-08
> 12:49:58,571][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)]
> - Job job_1452156670999_0116 get status check result.
>
> [pool-7-thread-8]:[2016-01-08
> 12:49:58,572][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)]
> - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
> [pool-7-thread-8]:[2016-01-08
> 12:49:58,574][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> (Store kylin_metadata@hbase)
> [pool-6-thread-1]:[2016-01-08
> 12:50:08,118][INFO][org.apache.kylin.job.impl.threadpool.DefaultScheduler$FetcherRunner.run(DefaultScheduler.java:112)]
> - Job Fetcher: 3 running, 3 actual running, 0 ready, 28 others
> [pool-7-thread-8]:[2016-01-08
> 12:50:18,594][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)]
> - Going to buffer response body of large or unknown size. Using
> getResponseBodyAsStream instead is recommended.
> [pool-7-thread-8]:[2016-01-08
> 12:50:18,594][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)]
> - Job job_1452156670999_0116 get status check result.
>
> [pool-7-thread-8]:[2016-01-08
> 12:50:18,595][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)]
> - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
> [pool-7-thread-8]:[2016-01-08
> 12:50:18,597][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> (Store kylin_metadata@hbase)
> [pool-7-thread-8]:[2016-01-08
> 12:50:38,609][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)]
> - Going to buffer response body of large or unknown size. Using
> getResponseBodyAsStream instead is recommended.
> [pool-7-thread-8]:[2016-01-08
> 12:50:38,610][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)]
> - Job job_1452156670999_0116 get status check result.
>
> [pool-7-thread-8]:[2016-01-08
> 12:50:38,610][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)]
> - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
> [pool-7-thread-8]:[2016-01-08
> 12:50:38,613][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> (Store kylin_metadata@hbase)
> [pool-7-thread-8]:[2016-01-08
> 12:50:58,627][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)]
> - Going to buffer response body of large or unknown size. Using
> getResponseBodyAsStream instead is recommended.
> [pool-7-thread-8]:[2016-01-08
> 12:50:58,627][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)]
> - Job job_1452156670999_0116 get status check result.
>
> [pool-7-thread-8]:[2016-01-08
> 12:50:58,627][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)]
> - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
> [pool-7-thread-8]:[2016-01-08
> 12:50:58,630][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> (Store kylin_metadata@hbase)
> [pool-6-thread-1]:[2016-01-08
> 12:51:08,111][INFO][org.apache.kylin.job.impl.threadpool.DefaultScheduler$FetcherRunner.run(DefaultScheduler.java:112)]
> - Job Fetcher: 3 running, 3 actual running, 0 ready, 28 others
> [http-bio-7070-exec-2]:[2016-01-08
> 12:51:11,522][DEBUG][org.apache.kylin.rest.filter.KylinApiFilter.logRequest(KylinApiFilter.java:120)]
> - REQUEST: REQUESTER=ADMIN;REQ_TIME=GMT-08:00 2016-01-07
> 20:51:11;URI=/kylin/api/jobs;METHOD=GET;QUERY_STRING=limit=15&offset=0&projectName=learn_kylin;PAYLOAD=;RESP_STATUS=200;
> [http-bio-7070-exec-2]:[2016-01-08
> 12:51:13,793][DEBUG][org.apache.kylin.rest.filter.KylinApiFilter.logRequest(KylinApiFilter.java:120)]
> - REQUEST: REQUESTER=ADMIN;REQ_TIME=GMT-08:00 2016-01-07
> 20:51:13;URI=/kylin/api/jobs;METHOD=GET;QUERY_STRING=limit=15&offset=0&projectName=learn_kylin;PAYLOAD=;RESP_STATUS=200;
> [http-bio-7070-exec-2]:[2016-01-08
> 12:51:14,431][DEBUG][org.apache.kylin.rest.filter.KylinApiFilter.logRequest(KylinApiFilter.java:120)]
> - REQUEST: REQUESTER=ADMIN;REQ_TIME=GMT-08:00 2016-01-07
> 20:51:14;URI=/kylin/api/jobs;METHOD=GET;QUERY_STRING=limit=15&offset=0&projectName=learn_kylin;PAYLOAD=;RESP_STATUS=200;
> [pool-7-thread-8]:[2016-01-08
> 12:51:18,643][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)]
> - Going to buffer response body of large or unknown size. Using
> getResponseBodyAsStream instead is recommended.
> [pool-7-thread-8]:[2016-01-08
> 12:51:18,643][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)]
> - Job job_1452156670999_0116 get status check result.
>
> [pool-7-thread-8]:[2016-01-08
> 12:51:18,644][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)]
> - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
> [pool-7-thread-8]:[2016-01-08
> 12:51:18,647][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> (Store kylin_metadata@hbase)
> [pool-7-thread-8]:[2016-01-08
> 12:51:38,658][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)]
> - Going to buffer response body of large or unknown size. Using
> getResponseBodyAsStream instead is recommended.
> [pool-7-thread-8]:[2016-01-08
> 12:51:38,659][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)]
> - Job job_1452156670999_0116 get status check result.
>
> [pool-7-thread-8]:[2016-01-08
> 12:51:38,659][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)]
> - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
> [pool-7-thread-8]:[2016-01-08
> 12:51:38,662][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> (Store kylin_metadata@hbase)
> [pool-7-thread-8]:[2016-01-08
> 12:51:58,674][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)]
> - Going to buffer response body of large or unknown size. Using
> getResponseBodyAsStream instead is recommended.
> [pool-7-thread-8]:[2016-01-08
> 12:51:58,674][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)]
> - Job job_1452156670999_0116 get status check result.
>
> [pool-7-thread-8]:[2016-01-08
> 12:51:58,675][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)]
> - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
> [pool-7-thread-8]:[2016-01-08
> 12:51:58,677][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> (Store kylin_metadata@hbase)
> [pool-6-thread-1]:[2016-01-08
> 12:52:08,118][INFO][org.apache.kylin.job.impl.threadpool.DefaultScheduler$FetcherRunner.run(DefaultScheduler.java:112)]
> - Job Fetcher: 3 running, 3 actual running, 0 ready, 28 others
> [pool-7-thread-8]:[2016-01-08
> 12:52:18,696][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)]
> - Going to buffer response body of large or unknown size. Using
> getResponseBodyAsStream instead is recommended.
> [pool-7-thread-8]:[2016-01-08
> 12:52:18,697][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)]
> - Job job_1452156670999_0116 get status check result.
>
> [pool-7-thread-8]:[2016-01-08
> 12:52:18,697][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)]
> - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
> [pool-7-thread-8]:[2016-01-08
> 12:52:18,700][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> (Store kylin_metadata@hbase)
> [pool-7-thread-8]:[2016-01-08
> 12:52:38,712][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)]
> - Going to buffer response body of large or unknown size. Using
> getResponseBodyAsStream instead is recommended.
> [pool-7-thread-8]:[2016-01-08
> 12:52:38,713][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)]
> - Job job_1452156670999_0116 get status check result.
>
> [pool-7-thread-8]:[2016-01-08
> 12:52:38,713][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)]
> - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
> [pool-7-thread-8]:[2016-01-08
> 12:52:38,716][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> (Store kylin_metadata@hbase)
> [pool-7-thread-8]:[2016-01-08
> 12:52:58,728][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)]
> - Going to buffer response body of large or unknown size. Using
> getResponseBodyAsStream instead is recommended.
> [pool-7-thread-8]:[2016-01-08
> 12:52:58,728][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)]
> - Job job_1452156670999_0116 get status check result.
>
> [pool-7-thread-8]:[2016-01-08
> 12:52:58,729][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)]
> - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
> [pool-7-thread-8]:[2016-01-08
> 12:52:58,731][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> (Store kylin_metadata@hbase)
> [pool-6-thread-1]:[2016-01-08
> 12:53:08,104][INFO][org.apache.kylin.job.impl.threadpool.DefaultScheduler$FetcherRunner.run(DefaultScheduler.java:112)]
> - Job Fetcher: 3 running, 3 actual running, 0 ready, 28 others
> [pool-7-thread-8]:[2016-01-08
> 12:53:18,744][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)]
> - Going to buffer response body of large or unknown size. Using
> getResponseBodyAsStream instead is recommended.
> [pool-7-thread-8]:[2016-01-08
> 12:53:18,744][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)]
> - Job job_1452156670999_0116 get status check result.
>
> [pool-7-thread-8]:[2016-01-08
> 12:53:18,745][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)]
> - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
> [pool-7-thread-8]:[2016-01-08
> 12:53:18,747][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> (Store kylin_metadata@hbase)
> [pool-7-thread-8]:[2016-01-08
> 12:53:38,760][WARN][org.apache.commons.httpclient.HttpMethodBase.getResponseBody(HttpMethodBase.java:682)]
> - Going to buffer response body of large or unknown size. Using
> getResponseBodyAsStream instead is recommended.
> [pool-7-thread-8]:[2016-01-08
> 12:53:38,760][DEBUG][org.apache.kylin.job.tools.HadoopStatusGetter.getHttpResponse(HadoopStatusGetter.java:110)]
> - Job job_1452156670999_0116 get status check result.
>
> [pool-7-thread-8]:[2016-01-08
> 12:53:38,761][DEBUG][org.apache.kylin.job.tools.HadoopStatusChecker.checkStatus(HadoopStatusChecker.java:57)]
> - State of Hadoop job: job_1452156670999_0116:RUNNING-UNDEFINED
> [pool-7-thread-8]:[2016-01-08
> 12:53:38,764][DEBUG][org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:200)]
> - Saving resource /execute_output/12226fd3-4750-4a55-8fa0-bd24b039c834-01
> (Store kylin_metadata@hbase)
>
>
>
> wenyefbl@163.com
>