You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/05/29 01:28:30 UTC

[GitHub] [druid] zachjsh opened a new pull request #11316: Fix Index hadoop failing with index.zip is not a valid DFS filename

zachjsh opened a new pull request #11316:
URL: https://github.com/apache/druid/pull/11316


   This fixes a bug which caused hadoop based ingestion tasks to fail whenever the deep storage directory has a character that is not permitted in hdfs. This caused by the deep storage segment file path being returned without the scheme (s3, gcs, etc). Hadoop client uses the scheme to determine the FileSystem class implementation to use when performing file operations. If no scheme is set, it default to hdfs file. Fixed the issue by returning the pull uri of the segment file, which includes the scheme.
   
   This PR has:
   - [ ] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a change in pull request #11316: Fix Index hadoop failing with index.zip is not a valid DFS filename

Posted by GitBox <gi...@apache.org>.
jon-wei commented on a change in pull request #11316:
URL: https://github.com/apache/druid/pull/11316#discussion_r643326054



##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/common/task/HadoopIndexTask.java
##########
@@ -550,13 +545,8 @@ private void renameSegmentIndexFilesJob(
       String dataSegmentAndIndexZipFilePathListStr
   )
   {
-    final ClassLoader oldLoader = Thread.currentThread().getContextClassLoader();
+    final ClassLoader loader = Thread.currentThread().getContextClassLoader();
     try {
-      ClassLoader loader = HadoopTask.buildClassLoader(
-          getHadoopDependencyCoordinates(),
-          taskConfig.getDefaultHadoopCoordinates()
-      );
-
       Object renameSegmentIndexFilesRunner = getForeignClassloaderObject(

Review comment:
       It already builds a hadoopy classloader on line 320, which is the current classloader in this context: https://github.com/apache/druid/pull/11316/files#diff-2dcdadad4e69316f4771828f0d6b63e83f9735a56cf21881e81e4aa72296b11bR320
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] rohangarg commented on a change in pull request #11316: Fix Index hadoop failing with index.zip is not a valid DFS filename

Posted by GitBox <gi...@apache.org>.
rohangarg commented on a change in pull request #11316:
URL: https://github.com/apache/druid/pull/11316#discussion_r641938894



##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/common/task/HadoopIndexTask.java
##########
@@ -550,13 +545,8 @@ private void renameSegmentIndexFilesJob(
       String dataSegmentAndIndexZipFilePathListStr
   )
   {
-    final ClassLoader oldLoader = Thread.currentThread().getContextClassLoader();
+    final ClassLoader loader = Thread.currentThread().getContextClassLoader();
     try {
-      ClassLoader loader = HadoopTask.buildClassLoader(
-          getHadoopDependencyCoordinates(),
-          taskConfig.getDefaultHadoopCoordinates()
-      );
-
       Object renameSegmentIndexFilesRunner = getForeignClassloaderObject(

Review comment:
       why do we not want to use the custom hadoop classloader? also, does this make sense since we're now using the current classloader itself?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] rohangarg commented on a change in pull request #11316: Fix Index hadoop failing with index.zip is not a valid DFS filename

Posted by GitBox <gi...@apache.org>.
rohangarg commented on a change in pull request #11316:
URL: https://github.com/apache/druid/pull/11316#discussion_r643331088



##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/common/task/HadoopIndexTask.java
##########
@@ -550,13 +545,8 @@ private void renameSegmentIndexFilesJob(
       String dataSegmentAndIndexZipFilePathListStr
   )
   {
-    final ClassLoader oldLoader = Thread.currentThread().getContextClassLoader();
+    final ClassLoader loader = Thread.currentThread().getContextClassLoader();
     try {
-      ClassLoader loader = HadoopTask.buildClassLoader(
-          getHadoopDependencyCoordinates(),
-          taskConfig.getDefaultHadoopCoordinates()
-      );
-
       Object renameSegmentIndexFilesRunner = getForeignClassloaderObject(

Review comment:
       ok, I see. but then we should remove the `getForeignClassloaderObject` reference right? since the current class loader is the hadoop classloader. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh merged pull request #11316: Fix Index hadoop failing with index.zip is not a valid DFS filename

Posted by GitBox <gi...@apache.org>.
zachjsh merged pull request #11316:
URL: https://github.com/apache/druid/pull/11316


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org