You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Nishant Bangarwa (JIRA)" <ji...@apache.org> on 2019/04/18 12:25:00 UTC

[jira] [Created] (HIVE-21628) Use druid-s3-extensions when using S3 as druid deep storage

Nishant Bangarwa created HIVE-21628:
---------------------------------------

             Summary: Use druid-s3-extensions when using S3 as druid deep storage
                 Key: HIVE-21628
                 URL: https://issues.apache.org/jira/browse/HIVE-21628
             Project: Hive
          Issue Type: Task
            Reporter: Nishant Bangarwa


Currently DruidStorageHandler always use druid-hdfs-extensions for S3 as well as HDFS.
HDFS extension, pushes the segment to an intermediate directory and then does rename to copy it to final path. 
1) The rename causes additional copy of data over, which is avoided by druid-s3 extension
2) rename may fail when the pushed file is not yet available due to eventual consistent model of S3. Refer exception below - 

{code} 
Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.io.FileNotFoundException: No such file or directory: s3a://edws-nishant-test/druid/druid-1555443464-ggdf/data/workingDirectory/.staging-hive_20190417170114_a7fb3dcd-623b-46ca-bb87-9aac2fb50c6c/intermediateSegmentDir/default.cmv_basetable_d_7/11b3ceeb8d2843508336aac3347687cb/0_index.zip
	at org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
	at org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
	at org.apache.hive.druid.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
	at org.apache.hadoop.hive.druid.io.DruidRecordWriter.pushSegments(DruidRecordWriter.java:184)
	... 22 more
Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: No such file or directory: s3a://edws-nishant-test/druid/druid-1555443464-ggdf/data/workingDirectory/.staging-hive_20190417170114_a7fb3dcd-623b-46ca-bb87-9aac2fb50c6c/intermediateSegmentDir/default.cmv_basetable_d_7/11b3ceeb8d2843508336aac3347687cb/0_index.zip
	at org.apache.hive.druid.com.google.common.base.Throwables.propagate(Throwables.java:160)
	at org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl.mergeAndPush(AppenderatorImpl.java:665)
	at org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl.lambda$push$0(AppenderatorImpl.java:528)
	at org.apache.hive.druid.com.google.common.util.concurrent.Futures$1.apply(Futures.java:713)
	at org.apache.hive.druid.com.google.common.util.concurrent.Futures$ChainingListenableFuture.run(Futures.java:861)
	... 3 more
Caused by: java.io.FileNotFoundException: No such file or directory: s3a://edws-nishant-test/druid/druid-1555443464-ggdf/data/workingDirectory/.staging-hive_20190417170114_a7fb3dcd-623b-46ca-bb87-9aac2fb50c6c/intermediateSegmentDir/default.cmv_basetable_d_7/11b3ceeb8d2843508336aac3347687cb/0_index.zip
	at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2488)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2382)
	at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2321)
	at org.apache.hadoop.fs.FileSystem.getFileLinkStatus(FileSystem.java:2727)
	at org.apache.hadoop.fs.FileSystem.rename(FileSystem.java:1560)
	at org.apache.hadoop.fs.HadoopFsWrapper.rename(HadoopFsWrapper.java:53)
	at org.apache.hive.druid.io.druid.storage.hdfs.HdfsDataSegmentPusher.copyFilesWithChecks(HdfsDataSegmentPusher.java:168)
	at org.apache.hive.druid.io.druid.storage.hdfs.HdfsDataSegmentPusher.push(HdfsDataSegmentPusher.java:149)
	at org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl.lambda$mergeAndPush$3(AppenderatorImpl.java:647)
	at org.apache.hive.druid.io.druid.java.util.common.RetryUtils.retry(RetryUtils.java:63)
	at org.apache.hive.druid.io.druid.java.util.common.RetryUtils.retry(RetryUtils.java:81)
	at org.apache.hive.druid.io.druid.segment.realtime.appenderator.AppenderatorImpl.mergeAndPush(AppenderatorImpl.java:638)
	... 6 more
{code}   

This task is add the ability to switch to using druid-s3-extension when using S3A file scheme for druid storage directory. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)