You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "Andras Istvan Nagy (Jira)" <ji...@apache.org> on 2019/12/13 19:08:00 UTC
[jira] [Created] (KYLIN-4299) Issue with building real-time segment
cache into HBase when using S3 as working dir
Andras Istvan Nagy created KYLIN-4299:
-----------------------------------------
Summary: Issue with building real-time segment cache into HBase when using S3 as working dir
Key: KYLIN-4299
URL: https://issues.apache.org/jira/browse/KYLIN-4299
Project: Kylin
Issue Type: Bug
Components: Real-time Streaming
Affects Versions: v3.0.0-alpha2
Reporter: Andras Istvan Nagy
We have an issue with using S3 as working dir for Kylin when using real-time streaming. The reason why we would like to do this is to have no state in HDFS, so the actual runtime environment running Kylin becomes stateless.
We already have HBase data on S3, but there is persistent data also in {{kylin.env.hdfs-working-dir}} (cube dictionaries), so we need to have that in S3 as well to have a setup where it's possible to fail over to a new cluster without having to rebuild all cubes.
We are using the real-time streaming feature in Kylin, which persists segment caches hourly and a MR job merges those hourly segments into HBase. In these MR jobs, we get the following exception:
{code:java}
Error: java.lang.IllegalArgumentException: Wrong FS: s3://kylin-XXXXX/kylin-dev/hdfs-rootdir/kylin_metadata/stream/tops_jaywalks/20191206010000_20191206020000/1/1, expected: hdfs://ip-24-0-3-243.us-west-2.compute.internal:8020 at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:669) at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:214) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:897) at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:114) at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:964) at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:961) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:971) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1551) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1577) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1625) at org.apache.hadoop.fs.FileSystem$4.<init>(FileSystem.java:1808) at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1807) at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1785) at org.apache.hadoop.fs.FileSystem$6.<init>(FileSystem.java:1887) at org.apache.hadoop.fs.FileSystem.listFiles(FileSystem.java:1885) at org.apache.kylin.engine.mr.streaming.ColumnarFilesReader.checkPath(ColumnarFilesReader.java:46) at org.apache.kylin.engine.mr.streaming.ColumnarFilesReader.<init>(ColumnarFilesReader.java:41) at org.apache.kylin.engine.mr.streaming.DictsReader.<init>(DictsReader.java:43) at org.apache.kylin.engine.mr.streaming.ColumnarSplitDictReader.init(ColumnarSplitDictReader.java:65) at org.apache.kylin.engine.mr.streaming.ColumnarSplitDictReader.<init>(ColumnarSplitDictReader.java:52) at org.apache.kylin.engine.mr.streaming.ColumnarSplitDictInputFormat.createRecordReader(ColumnarSplitDictInputFormat.java:32) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:524) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169) at org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:173) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179) at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179) at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)