You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "Xiaoxiang Yu (Jira)" <ji...@apache.org> on 2021/04/26 09:29:00 UTC
[jira] [Comment Edited] (KYLIN-4990) 使用hive构建全局字典表指定MR读取具体位置的文件而不是hdfs根目录
[ https://issues.apache.org/jira/browse/KYLIN-4990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331931#comment-17331931 ]
Xiaoxiang Yu edited comment on KYLIN-4990 at 4/26/21, 9:28 AM:
---------------------------------------------------------------
Hello [~linlin994395], it is quite complex situation, could you contact me via wechat, so we can have a direct discussion. My wechat id is "hit-lacus" .
was (Author: xxyu):
Hello [~linlin994395], it is quite complex situation, could you contact me via wechat, so we can have a direct discussion.
> 使用hive构建全局字典表指定MR读取具体位置的文件而不是hdfs根目录
> ------------------------------------
>
> Key: KYLIN-4990
> URL: https://issues.apache.org/jira/browse/KYLIN-4990
> Project: Kylin
> Issue Type: Bug
> Components: Job Engine
> Affects Versions: v3.1.1
> Reporter: xue lin
> Priority: Major
> Attachments: s3-hive-全局字典表.png
>
>
> 我参考了如下文档在涉及到bitmap时构建hive全局字典表
> [http://kylin.apache.org/cn/docs/howto/howto_use_hive_mr_dict.html]
> [https://cwiki.apache.org/confluence/display/KYLIN/Introduction+to+Hive+Global+Dictionary]
> https://issues.apache.org/jira/browse/KYLIN-4616
> 理想状况下,希望将表都放在S3上,当今天如下配置时
> -----------------------
> # kylin_hive_conf.xml
> <property>
> <name>hive.metastore.warehouse.dir</name>
> <value>s3://etl-script-product/hive-kylin-dict</value>
> <description>location of default database for the warehouse</description>
> </property>
> -----------------------
> S3上表存储情况见附件
> 但当kylin进行到Build Hive Global Dict - parallel part build,报错如下
> ---------------------------
> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://ip-10-50-69-202.eu-west-1.compute.internal:8020/kylin_intermediate_cube_fact_remain_dc1531fe_0197_4ab1_a2d5_fe6d6629bb09_distinct_value/dict_column=VIEW_FACT_REMAIN_ID
> at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323)
> at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:271)
> at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:358)
> at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:303)
> at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:320)
> at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:198)
> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341)
> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1338)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1338)
> at org.apache.kylin.engine.mr.common.AbstractHadoopJob.waitForCompletion(AbstractHadoopJob.java:198)
> at org.apache.kylin.engine.mr.steps.BuildGlobalHiveDictPartBuildJob.run(BuildGlobalHiveDictPartBuildJob.java:109)
> at org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:155)
> at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179)
> at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
> at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179)
> at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:113)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> ---------------------------
> 当把hive.metastore.warehouse.dir参数调整成如下时能绕过去
> -----------------------
> # kylin_hive_conf.xml
> <property>
> <name>hive.metastore.warehouse.dir</name>
> <value>/</value>
> <description>location of default database for the warehouse</description>
> </property>
> -----------------------
> 有没有参数能够更改Build Hive Global Dict - parallel part build时MR读取文件的路径?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)