You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Nishant Bangarwa (JIRA)" <ji...@apache.org> on 2019/04/18 12:25:00 UTC
[jira] [Created] (HIVE-21628) Use druid-s3-extensions when using S3
as druid deep storage
Nishant Bangarwa created HIVE-21628:
---------------------------------------
Summary: Use druid-s3-extensions when using S3 as druid deep storage
Key: HIVE-21628
URL: https://issues.apache.org/jira/browse/HIVE-21628
Project: Hive
Issue Type: Task
Reporter: Nishant Bangarwa
Currently DruidStorageHandler always use druid-hdfs-extensions for S3 as well as HDFS.
HDFS extension, pushes the segment to an intermediate directory and then does rename to copy it to final path.
1) The rename causes additional copy of data over, which is avoided by druid-s3 extension
2) rename may fail when the pushed file is not yet available due to eventual consistent model of S3. Refer exception below -
{code}
Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.io.FileNotFoundException: No such file or directory: s3a://edws-nishant-test/druid/druid-1555443464-ggdf/data/workingDirectory/.staging-hive_20190417170114_a7fb3dcd-623b-46ca-bb87-9aac2fb50c6c/intermediateSegmentDir/default.cmv_basetable_d_7/11b3ceeb8d2843508336aac3347687cb/0_index.zip
at org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
at org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
at org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
at org.apache.hadoop.hive.druid.io.DruidRecordWriter.pushSegments(DruidRecordWriter.java:184)
... 22 more
Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: No such file or directory: s3a://edws-nishant-test/druid/druid-1555443464-ggdf/data/workingDirectory/.staging-hive_20190417170114_a7fb3dcd-623b-46ca-bb87-9aac2fb50c6c/intermediateSegmentDir/default.cmv_basetable_d_7/11b3ceeb8d2843508336aac3347687cb/0_index.zip
at org.apache.hive.druid.com.google.common.base.Throwables.propagate(Throwables.java:160)
at org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl.mergeAndPush(AppenderatorImpl.java:665)
at org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl.lambda$push$0(AppenderatorImpl.java:528)
at org.apache.hive.druid.com.google.common.util.concurrent.Futures$1.apply(Futures.java:713)
at org.apache.hive.druid.com.google.common.util.concurrent.Futures$ChainingListenableFuture.run(Futures.java:861)
... 3 more
Caused by: java.io.FileNotFoundException: No such file or directory: s3a://edws-nishant-test/druid/druid-1555443464-ggdf/data/workingDirectory/.staging-hive_20190417170114_a7fb3dcd-623b-46ca-bb87-9aac2fb50c6c/intermediateSegmentDir/default.cmv_basetable_d_7/11b3ceeb8d2843508336aac3347687cb/0_index.zip
at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2488)
at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2382)
at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2321)
at org.apache.hadoop.fs.FileSystem.getFileLinkStatus(FileSystem.java:2727)
at org.apache.hadoop.fs.FileSystem.rename(FileSystem.java:1560)
at org.apache.hadoop.fs.HadoopFsWrapper.rename(HadoopFsWrapper.java:53)
at org.apache.hive.druid.io.druid.storage.hdfs.HdfsDataSegmentPusher.copyFilesWithChecks(HdfsDataSegmentPusher.java:168)
at org.apache.hive.druid.io.druid.storage.hdfs.HdfsDataSegmentPusher.push(HdfsDataSegmentPusher.java:149)
at org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl.lambda$mergeAndPush$3(AppenderatorImpl.java:647)
at org.apache.hive.druid.io.druid.java.util.common.RetryUtils.retry(RetryUtils.java:63)
at org.apache.hive.druid.io.druid.java.util.common.RetryUtils.retry(RetryUtils.java:81)
at org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl.mergeAndPush(AppenderatorImpl.java:638)
... 6 more
{code}
This task is add the ability to switch to using druid-s3-extension when using S3A file scheme for druid storage directory.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)