You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "Xiaoxiang Yu (Jira)" <ji...@apache.org> on 2019/12/31 09:29:00 UTC
[jira] [Commented] (KYLIN-4299) Issue with building real-time segment cache into HBase when using S3 as working dir

    [ https://issues.apache.org/jira/browse/KYLIN-4299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006013#comment-17006013 ] 

Xiaoxiang Yu commented on KYLIN-4299:
-------------------------------------

Modify $KYLIN_HOME/conf/kylin_hive_conf.xml /etc/hive/conf/hive-site.xml /etc/hadoop/conf/core-site.xml is a workaroud, and I will fix this.



{code:java}
  <property>
    <name>fs.defaultFS</name>
    <value>s3://xiaoxiang-yu</value>
    <!--<value>hdfs://ip-172-31-6-58.cn-northwest-1.compute.internal:8020</value>-->
  </property>
{code}


> Issue with building real-time segment cache into HBase when using S3 as working dir
> -----------------------------------------------------------------------------------
>
>                 Key: KYLIN-4299
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4299
>             Project: Kylin
>          Issue Type: Bug
>          Components: Real-time Streaming
>    Affects Versions: v3.0.0-alpha2
>            Reporter: Andras Istvan Nagy
>            Priority: Major
>             Fix For: v3.1.0
>
>
> We have an issue with using S3 as working dir for Kylin when using real-time streaming. The reason why we would like to do this is to have no state in HDFS, so the actual runtime environment running Kylin becomes stateless. 
> We already have HBase data on S3, but there is persistent data also in {{kylin.env.hdfs-working-dir}} (cube dictionaries), so we need to have that in S3 as well to have a setup where it's possible to fail over to a new cluster without having to rebuild all cubes.
> We are using the real-time streaming feature in Kylin, which persists segment caches hourly and a MR job merges those hourly segments into HBase. In these MR jobs, we get the following exception:
> {code:java}
> Error: java.lang.IllegalArgumentException: Wrong FS: s3://kylin-XXXXX/kylin-dev/hdfs-rootdir/kylin_metadata/stream/tops_jaywalks/20191206010000_20191206020000/1/1, expected: hdfs://ip-24-0-3-243.us-west-2.compute.internal:8020 at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:669) at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:214) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:897) at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:114) at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:964) at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:961) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:971) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1551) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1577) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1625) at org.apache.hadoop.fs.FileSystem$4.<init>(FileSystem.java:1808) at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1807) at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1785) at org.apache.hadoop.fs.FileSystem$6.<init>(FileSystem.java:1887) at org.apache.hadoop.fs.FileSystem.listFiles(FileSystem.java:1885) at org.apache.kylin.engine.mr.streaming.ColumnarFilesReader.checkPath(ColumnarFilesReader.java:46) at org.apache.kylin.engine.mr.streaming.ColumnarFilesReader.<init>(ColumnarFilesReader.java:41) at org.apache.kylin.engine.mr.streaming.DictsReader.<init>(DictsReader.java:43) at org.apache.kylin.engine.mr.streaming.ColumnarSplitDictReader.init(ColumnarSplitDictReader.java:65) at org.apache.kylin.engine.mr.streaming.ColumnarSplitDictReader.<init>(ColumnarSplitDictReader.java:52) at org.apache.kylin.engine.mr.streaming.ColumnarSplitDictInputFormat.createRecordReader(ColumnarSplitDictInputFormat.java:32) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:524) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169) at org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:173) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179) at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179) at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)