You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "xue lin (Jira)" <ji...@apache.org> on 2021/04/26 02:23:00 UTC
[jira] [Created] (KYLIN-4990) 使用hive构建全局字典表指定MR读取具体位置的文件而不是hdfs根目录
xue lin created KYLIN-4990:
------------------------------
Summary: 使用hive构建全局字典表指定MR读取具体位置的文件而不是hdfs根目录
Key: KYLIN-4990
URL: https://issues.apache.org/jira/browse/KYLIN-4990
Project: Kylin
Issue Type: Bug
Components: Job Engine
Affects Versions: v3.1.1
Reporter: xue lin
Attachments: s3-hive-全局字典表.png
我参考了如下文档在涉及到bitmap时构建hive全局字典表
[http://kylin.apache.org/cn/docs/howto/howto_use_hive_mr_dict.html]
[https://cwiki.apache.org/confluence/display/KYLIN/Introduction+to+Hive+Global+Dictionary]
https://issues.apache.org/jira/browse/KYLIN-4616
理想状况下,希望将表都放在S3上,当今天如下配置时
-----------------------
# kylin_hive_conf.xml
<property>
<name>hive.metastore.warehouse.dir</name>
<value>s3://etl-script-product/hive-kylin-dict</value>
<description>location of default database for the warehouse</description>
</property>
-----------------------
S3上表存储情况见附件
但当kylin进行到Build Hive Global Dict - parallel part build,报错如下
---------------------------
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://ip-10-50-69-202.eu-west-1.compute.internal:8020/kylin_intermediate_cube_fact_remain_dc1531fe_0197_4ab1_a2d5_fe6d6629bb09_distinct_value/dict_column=VIEW_FACT_REMAIN_ID
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:271)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:358)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:303)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:320)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:198)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1338)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1338)
at org.apache.kylin.engine.mr.common.AbstractHadoopJob.waitForCompletion(AbstractHadoopJob.java:198)
at org.apache.kylin.engine.mr.steps.BuildGlobalHiveDictPartBuildJob.run(BuildGlobalHiveDictPartBuildJob.java:109)
at org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:155)
at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179)
at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179)
at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:113)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
---------------------------
当把hive.metastore.warehouse.dir参数调整成如下时能绕过去
-----------------------
# kylin_hive_conf.xml
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/</value>
<description>location of default database for the warehouse</description>
</property>
-----------------------
有没有参数能够更改Build Hive Global Dict - parallel part build时MR读取文件的路径?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)