You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "xue lin (Jira)" <ji...@apache.org> on 2021/04/26 02:23:00 UTC
[jira] [Created] (KYLIN-4990) 使用hive构建全局字典表指定MR读取具体位置的文件而不是hdfs根目录

xue lin created KYLIN-4990:
------------------------------

             Summary: 使用hive构建全局字典表指定MR读取具体位置的文件而不是hdfs根目录
                 Key: KYLIN-4990
                 URL: https://issues.apache.org/jira/browse/KYLIN-4990
             Project: Kylin
          Issue Type: Bug
          Components: Job Engine
    Affects Versions: v3.1.1
            Reporter: xue lin
         Attachments: s3-hive-全局字典表.png

我参考了如下文档在涉及到bitmap时构建hive全局字典表

[http://kylin.apache.org/cn/docs/howto/howto_use_hive_mr_dict.html]

[https://cwiki.apache.org/confluence/display/KYLIN/Introduction+to+Hive+Global+Dictionary]

https://issues.apache.org/jira/browse/KYLIN-4616

理想状况下，希望将表都放在S3上，当今天如下配置时

-----------------------

# kylin_hive_conf.xml

<property>
 <name>hive.metastore.warehouse.dir</name>
 <value>s3://etl-script-product/hive-kylin-dict</value>
 <description>location of default database for the warehouse</description>
</property>

-----------------------

S3上表存储情况见附件

但当kylin进行到Build Hive Global Dict - parallel part build，报错如下

---------------------------

org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://ip-10-50-69-202.eu-west-1.compute.internal:8020/kylin_intermediate_cube_fact_remain_dc1531fe_0197_4ab1_a2d5_fe6d6629bb09_distinct_value/dict_column=VIEW_FACT_REMAIN_ID
 at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323)
 at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:271)
 at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:358)
 at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:303)
 at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:320)
 at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:198)
 at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341)
 at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1338)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
 at org.apache.hadoop.mapreduce.Job.submit(Job.java:1338)
 at org.apache.kylin.engine.mr.common.AbstractHadoopJob.waitForCompletion(AbstractHadoopJob.java:198)
 at org.apache.kylin.engine.mr.steps.BuildGlobalHiveDictPartBuildJob.run(BuildGlobalHiveDictPartBuildJob.java:109)
 at org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:155)
 at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179)
 at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
 at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179)
 at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:113)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)

---------------------------

当把hive.metastore.warehouse.dir参数调整成如下时能绕过去

-----------------------

# kylin_hive_conf.xml

<property>
 <name>hive.metastore.warehouse.dir</name>
 <value>/</value>
 <description>location of default database for the warehouse</description>
</property>

-----------------------

有没有参数能够更改Build Hive Global Dict - parallel part build时MR读取文件的路径？



--
This message was sent by Atlassian Jira
(v8.3.4#803005)