You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/04/07 00:29:43 UTC

[GitHub] [druid] zachjsh opened a new pull request #11075: Adjust HadoopIndexTask temp segment renaming to avoid potential race conditions

zachjsh opened a new pull request #11075:
URL: https://github.com/apache/druid/pull/11075


   ### Description
   
   WIP
   
   This PR has:
   - [ ] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] capistrant edited a comment on pull request #11075: Adjust HadoopIndexTask temp segment renaming to avoid potential race conditions

Posted by GitBox <gi...@apache.org>.
capistrant edited a comment on pull request #11075:
URL: https://github.com/apache/druid/pull/11075#issuecomment-825128559


   > Hm, we don't have any existing unit tests for HadoopIndexTask, so I think it'd be fine to ignore the coverage failure from that.
   > 
   > Can you run ITHadoopIndexTest locally and check if that passes?
   
   I have a suspicion that this PR may have had an impact on jdk11 execution of Indexing Modules Test. @zachjsh were you able to pass tests locally with jdk11 by chance? If so, I could be wrong here
   
   Edit: It seems to be powermock trying to access java internals from what I can tell


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on pull request #11075: Adjust HadoopIndexTask temp segment renaming to avoid potential race conditions

Posted by GitBox <gi...@apache.org>.
jon-wei commented on pull request #11075:
URL: https://github.com/apache/druid/pull/11075#issuecomment-825225567


   @capistrant Thanks, I've reverted that patch


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a change in pull request #11075: Adjust HadoopIndexTask temp segment renaming to avoid potential race conditions

Posted by GitBox <gi...@apache.org>.
zachjsh commented on a change in pull request #11075:
URL: https://github.com/apache/druid/pull/11075#discussion_r616977741



##########
File path: indexing-hadoop/src/main/java/org/apache/druid/indexer/FileSystemHelper.java
##########
@@ -0,0 +1,34 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.indexer;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+
+import java.io.IOException;
+import java.net.URI;
+
+public class FileSystemHelper
+{
+  public static FileSystem get(URI uri, Configuration conf) throws IOException

Review comment:
       Added comment




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on pull request #11075: Adjust HadoopIndexTask temp segment renaming to avoid potential race conditions

Posted by GitBox <gi...@apache.org>.
zachjsh commented on pull request #11075:
URL: https://github.com/apache/druid/pull/11075#issuecomment-823903095


   > Hm, we don't have any existing unit tests for HadoopIndexTask, so I think it'd be fine to ignore the coverage failure from that.
   > 
   > Can you run ITHadoopIndexTest locally and check if that passes?
   
   Ran IT, everything passed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a change in pull request #11075: Adjust HadoopIndexTask temp segment renaming to avoid potential race conditions

Posted by GitBox <gi...@apache.org>.
zachjsh commented on a change in pull request #11075:
URL: https://github.com/apache/druid/pull/11075#discussion_r612138093



##########
File path: indexing-hadoop/src/main/java/org/apache/druid/indexer/FileSystemHelper.java
##########
@@ -0,0 +1,34 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.indexer;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+
+import java.io.IOException;
+import java.net.URI;
+
+public class FileSystemHelper
+{
+  public static FileSystem get(URI uri, Configuration conf) throws IOException

Review comment:
       I needed it for the test that I'm using it in. I wasnt able to mock the raw Filesystem.get routine, kept running into assist issues.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a change in pull request #11075: Adjust HadoopIndexTask temp segment renaming to avoid potential race conditions

Posted by GitBox <gi...@apache.org>.
jon-wei commented on a change in pull request #11075:
URL: https://github.com/apache/druid/pull/11075#discussion_r610964106



##########
File path: indexing-hadoop/src/main/java/org/apache/druid/indexer/JobHelper.java
##########
@@ -386,29 +386,13 @@ public static void writeJobIdToFile(String hadoopJobIdFileName, String hadoopJob
     }
   }
 
-  public static boolean runSingleJob(Jobby job, HadoopDruidIndexerConfig config)
+  public static boolean runSingleJob(Jobby job)
   {
     boolean succeeded = job.run();
-
-    if (!config.getSchema().getTuningConfig().isLeaveIntermediate()) {

Review comment:
       Hm, would removing the cleanup here prevent HadoopDruidDetermineConfigurationJob from doing any necessary cleanup?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a change in pull request #11075: Adjust HadoopIndexTask temp segment renaming to avoid potential race conditions

Posted by GitBox <gi...@apache.org>.
zachjsh commented on a change in pull request #11075:
URL: https://github.com/apache/druid/pull/11075#discussion_r611831329



##########
File path: indexing-hadoop/src/main/java/org/apache/druid/indexer/JobHelper.java
##########
@@ -386,29 +386,13 @@ public static void writeJobIdToFile(String hadoopJobIdFileName, String hadoopJob
     }
   }
 
-  public static boolean runSingleJob(Jobby job, HadoopDruidIndexerConfig config)
+  public static boolean runSingleJob(Jobby job)
   {
     boolean succeeded = job.run();
-
-    if (!config.getSchema().getTuningConfig().isLeaveIntermediate()) {

Review comment:
       Good catch! Fixed by adding a subsequent call to do cleanup in HadoopDruidDetermineConfigurationJob. Also found another place  in CliInternalHadoopIndexer where we were maybe not doing necessary cleanup, and fixed in similar way.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a change in pull request #11075: Adjust HadoopIndexTask temp segment renaming to avoid potential race conditions

Posted by GitBox <gi...@apache.org>.
jon-wei commented on a change in pull request #11075:
URL: https://github.com/apache/druid/pull/11075#discussion_r612124990



##########
File path: indexing-hadoop/src/main/java/org/apache/druid/indexer/JobHelper.java
##########
@@ -482,20 +475,16 @@ public long push() throws IOException
         .withSize(size.get())
         .withBinaryVersion(SegmentUtils.getVersionFromDir(mergedBase));
 
-    if (!renameIndexFiles(outputFS, tmpPath, finalIndexZipFilePath)) {
-      throw new IOE(
-          "Unable to rename [%s] to [%s]",
-          tmpPath.toUri().toString(),
-          finalIndexZipFilePath.toUri().toString()
-      );
-    }
-
-    return finalSegment;
+    return new DataSegmentAndIndexZipFilePath(
+        finalSegment,
+        tmpPath.toUri().getPath(),
+        finalIndexZipFilePath.toUri().getPath()
+    );
   }
 
   public static void writeSegmentDescriptor(

Review comment:
       this method also deletes and creates a file, I think the descriptor creation should also be moved into the main task (it could be handled in `renameIndexFile` where you have access to a FileSystem). The mappers/reducers would produce a segment file at a temp location and the main task would handle the rename->create descriptor->publish flow




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on pull request #11075: Adjust HadoopIndexTask temp segment renaming to avoid potential race conditions

Posted by GitBox <gi...@apache.org>.
zachjsh commented on pull request #11075:
URL: https://github.com/apache/druid/pull/11075#issuecomment-831624436


   > > Hm, we don't have any existing unit tests for HadoopIndexTask, so I think it'd be fine to ignore the coverage failure from that.
   > > Can you run ITHadoopIndexTest locally and check if that passes?
   > 
   > I have a suspicion that this PR may have had an impact on jdk11 execution of Indexing Modules Test. @zachjsh were you able to pass tests locally with jdk11 by chance? If so, I could be wrong here
   > 
   > Edit: It seems to be powermock trying to access java internals from what I can tell
   
   thanks @capistrant! I believe the recent updates I've added fix the issue. I ran both the `indexing-hadoop` and `indexing-service` module unit tests with both java8 and java11 locally, and ran ITHadoopIndexTest locally with both java8 and java11.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a change in pull request #11075: Adjust HadoopIndexTask temp segment renaming to avoid potential race conditions

Posted by GitBox <gi...@apache.org>.
zachjsh commented on a change in pull request #11075:
URL: https://github.com/apache/druid/pull/11075#discussion_r612151354



##########
File path: indexing-hadoop/src/main/java/org/apache/druid/indexer/JobHelper.java
##########
@@ -482,20 +475,16 @@ public long push() throws IOException
         .withSize(size.get())
         .withBinaryVersion(SegmentUtils.getVersionFromDir(mergedBase));
 
-    if (!renameIndexFiles(outputFS, tmpPath, finalIndexZipFilePath)) {
-      throw new IOE(
-          "Unable to rename [%s] to [%s]",
-          tmpPath.toUri().toString(),
-          finalIndexZipFilePath.toUri().toString()
-      );
-    }
-
-    return finalSegment;
+    return new DataSegmentAndIndexZipFilePath(
+        finalSegment,
+        tmpPath.toUri().getPath(),
+        finalIndexZipFilePath.toUri().getPath()
+    );
   }
 
   public static void writeSegmentDescriptor(

Review comment:
       We seem to be storing the information about the segments that we publish in the descriptor file, and then read the data written to this this file / directory in main task in order to know the list of segments that were produced. If we dont create this file in the sub task / job, how will we know what segments were created?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a change in pull request #11075: Adjust HadoopIndexTask temp segment renaming to avoid potential race conditions

Posted by GitBox <gi...@apache.org>.
zachjsh commented on a change in pull request #11075:
URL: https://github.com/apache/druid/pull/11075#discussion_r615152908



##########
File path: indexing-hadoop/src/main/java/org/apache/druid/indexer/JobHelper.java
##########
@@ -482,20 +475,16 @@ public long push() throws IOException
         .withSize(size.get())
         .withBinaryVersion(SegmentUtils.getVersionFromDir(mergedBase));
 
-    if (!renameIndexFiles(outputFS, tmpPath, finalIndexZipFilePath)) {
-      throw new IOE(
-          "Unable to rename [%s] to [%s]",
-          tmpPath.toUri().toString(),
-          finalIndexZipFilePath.toUri().toString()
-      );
-    }
-
-    return finalSegment;
+    return new DataSegmentAndIndexZipFilePath(
+        finalSegment,
+        tmpPath.toUri().getPath(),
+        finalIndexZipFilePath.toUri().getPath()
+    );
   }
 
   public static void writeSegmentDescriptor(

Review comment:
       Changed it so that the segment descriptor for a segment is not overwritten if it already exists, as we discussed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh edited a comment on pull request #11075: Adjust HadoopIndexTask temp segment renaming to avoid potential race conditions

Posted by GitBox <gi...@apache.org>.
zachjsh edited a comment on pull request #11075:
URL: https://github.com/apache/druid/pull/11075#issuecomment-831624436


   > > Hm, we don't have any existing unit tests for HadoopIndexTask, so I think it'd be fine to ignore the coverage failure from that.
   > > Can you run ITHadoopIndexTest locally and check if that passes?
   > 
   > I have a suspicion that this PR may have had an impact on jdk11 execution of Indexing Modules Test. @zachjsh were you able to pass tests locally with jdk11 by chance? If so, I could be wrong here
   > 
   > Edit: It seems to be powermock trying to access java internals from what I can tell
   
   thanks @capistrant! I believe the recent updates I've added fix the issue. I ran both the `indexing-hadoop` and `indexing-service` module unit tests with both java8 and java11 locally, and ran ITHadoopIndexTest locally with both java8 and java11. New pr can be found here https://github.com/apache/druid/pull/11194


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a change in pull request #11075: Adjust HadoopIndexTask temp segment renaming to avoid potential race conditions

Posted by GitBox <gi...@apache.org>.
jon-wei commented on a change in pull request #11075:
URL: https://github.com/apache/druid/pull/11075#discussion_r612075889



##########
File path: indexing-hadoop/src/main/java/org/apache/druid/indexer/FileSystemHelper.java
##########
@@ -0,0 +1,34 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.indexer;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+
+import java.io.IOException;
+import java.net.URI;
+
+public class FileSystemHelper
+{
+  public static FileSystem get(URI uri, Configuration conf) throws IOException

Review comment:
       This class seems unnecessary

##########
File path: indexing-hadoop/src/main/java/org/apache/druid/indexer/JobHelper.java
##########
@@ -482,20 +475,16 @@ public long push() throws IOException
         .withSize(size.get())
         .withBinaryVersion(SegmentUtils.getVersionFromDir(mergedBase));
 
-    if (!renameIndexFiles(outputFS, tmpPath, finalIndexZipFilePath)) {
-      throw new IOE(
-          "Unable to rename [%s] to [%s]",
-          tmpPath.toUri().toString(),
-          finalIndexZipFilePath.toUri().toString()
-      );
-    }
-
-    return finalSegment;
+    return new DataSegmentAndIndexZipFilePath(
+        finalSegment,
+        tmpPath.toUri().getPath(),
+        finalIndexZipFilePath.toUri().getPath()
+    );
   }
 
   public static void writeSegmentDescriptor(

Review comment:
       this method also deletes and creates a file, I think the descriptor creation should also be moved into the main task (it could be handled in `renameIndexFile` where you have access to a FileSystem)

##########
File path: indexing-hadoop/src/main/java/org/apache/druid/indexer/JobHelper.java
##########
@@ -418,8 +402,19 @@ public static boolean runJobs(List<Jobby> jobs, HadoopDruidIndexerConfig config)
       }
     }
 
+    return succeeded;
+  }
+
+  public static void maybeDeleteIntermediatePath(
+      boolean indexerGeneratorJobSucceeded,

Review comment:
       I think this should just be `jobSucceeded` since it's used by more than one job

##########
File path: indexing-hadoop/src/main/java/org/apache/druid/indexer/JobHelper.java
##########
@@ -418,8 +402,19 @@ public static boolean runJobs(List<Jobby> jobs, HadoopDruidIndexerConfig config)
       }
     }
 
+    return succeeded;
+  }
+
+  public static void maybeDeleteIntermediatePath(
+      boolean indexerGeneratorJobSucceeded,
+      HadoopIngestionSpec indexerSchema)
+  {
+    HadoopDruidIndexerConfig config = HadoopDruidIndexerConfig.fromSpec(indexerSchema);
+    final Configuration configuration = JobHelper.injectSystemProperties(new Configuration(), config);

Review comment:
       I think you could reuse this `Configuration` within the `try` block below




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei merged pull request #11075: Adjust HadoopIndexTask temp segment renaming to avoid potential race conditions

Posted by GitBox <gi...@apache.org>.
jon-wei merged pull request #11075:
URL: https://github.com/apache/druid/pull/11075


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a change in pull request #11075: Adjust HadoopIndexTask temp segment renaming to avoid potential race conditions

Posted by GitBox <gi...@apache.org>.
jon-wei commented on a change in pull request #11075:
URL: https://github.com/apache/druid/pull/11075#discussion_r616330048



##########
File path: indexing-hadoop/src/main/java/org/apache/druid/indexer/FileSystemHelper.java
##########
@@ -0,0 +1,34 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.indexer;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+
+import java.io.IOException;
+import java.net.URI;
+
+public class FileSystemHelper
+{
+  public static FileSystem get(URI uri, Configuration conf) throws IOException

Review comment:
       Can you add a comment here about that?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] capistrant commented on pull request #11075: Adjust HadoopIndexTask temp segment renaming to avoid potential race conditions

Posted by GitBox <gi...@apache.org>.
capistrant commented on pull request #11075:
URL: https://github.com/apache/druid/pull/11075#issuecomment-825128559


   > Hm, we don't have any existing unit tests for HadoopIndexTask, so I think it'd be fine to ignore the coverage failure from that.
   > 
   > Can you run ITHadoopIndexTest locally and check if that passes?
   
   I have a suspicion that this PR may have had an impact on jdk11 execution of Indexing Modules Test. @zachjsh were you able to pass tests locally with jdk11 by chance? If so, I could be wrong here


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a change in pull request #11075: Adjust HadoopIndexTask temp segment renaming to avoid potential race conditions

Posted by GitBox <gi...@apache.org>.
zachjsh commented on a change in pull request #11075:
URL: https://github.com/apache/druid/pull/11075#discussion_r615152813



##########
File path: indexing-hadoop/src/main/java/org/apache/druid/indexer/JobHelper.java
##########
@@ -386,29 +386,13 @@ public static void writeJobIdToFile(String hadoopJobIdFileName, String hadoopJob
     }
   }
 
-  public static boolean runSingleJob(Jobby job, HadoopDruidIndexerConfig config)
+  public static boolean runSingleJob(Jobby job)
   {
     boolean succeeded = job.run();
-
-    if (!config.getSchema().getTuningConfig().isLeaveIntermediate()) {

Review comment:
       Changed it so that the segment descriptor for a segment is not overwritten if it already exists, as we discussed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a change in pull request #11075: Adjust HadoopIndexTask temp segment renaming to avoid potential race conditions

Posted by GitBox <gi...@apache.org>.
zachjsh commented on a change in pull request #11075:
URL: https://github.com/apache/druid/pull/11075#discussion_r611831329



##########
File path: indexing-hadoop/src/main/java/org/apache/druid/indexer/JobHelper.java
##########
@@ -386,29 +386,13 @@ public static void writeJobIdToFile(String hadoopJobIdFileName, String hadoopJob
     }
   }
 
-  public static boolean runSingleJob(Jobby job, HadoopDruidIndexerConfig config)
+  public static boolean runSingleJob(Jobby job)
   {
     boolean succeeded = job.run();
-
-    if (!config.getSchema().getTuningConfig().isLeaveIntermediate()) {

Review comment:
       Good catch! Fixed. Also found another place where we were maybe not doing necessary cleanup.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on pull request #11075: Adjust HadoopIndexTask temp segment renaming to avoid potential race conditions

Posted by GitBox <gi...@apache.org>.
jon-wei commented on pull request #11075:
URL: https://github.com/apache/druid/pull/11075#issuecomment-822979610


   Hm, we don't have any existing unit tests for HadoopIndexTask, so I think it'd be fine to ignore the coverage failure from that.
   
   Can you run ITHadoopIndexTest locally and check if that passes?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a change in pull request #11075: Adjust HadoopIndexTask temp segment renaming to avoid potential race conditions

Posted by GitBox <gi...@apache.org>.
zachjsh commented on a change in pull request #11075:
URL: https://github.com/apache/druid/pull/11075#discussion_r615152908



##########
File path: indexing-hadoop/src/main/java/org/apache/druid/indexer/JobHelper.java
##########
@@ -482,20 +475,16 @@ public long push() throws IOException
         .withSize(size.get())
         .withBinaryVersion(SegmentUtils.getVersionFromDir(mergedBase));
 
-    if (!renameIndexFiles(outputFS, tmpPath, finalIndexZipFilePath)) {
-      throw new IOE(
-          "Unable to rename [%s] to [%s]",
-          tmpPath.toUri().toString(),
-          finalIndexZipFilePath.toUri().toString()
-      );
-    }
-
-    return finalSegment;
+    return new DataSegmentAndIndexZipFilePath(
+        finalSegment,
+        tmpPath.toUri().getPath(),
+        finalIndexZipFilePath.toUri().getPath()
+    );
   }
 
   public static void writeSegmentDescriptor(

Review comment:
       Changed it so that the segment descriptor for a segment is not deleted, and overwritten if it already exists, as we discussed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] zachjsh commented on a change in pull request #11075: Adjust HadoopIndexTask temp segment renaming to avoid potential race conditions

Posted by GitBox <gi...@apache.org>.
zachjsh commented on a change in pull request #11075:
URL: https://github.com/apache/druid/pull/11075#discussion_r616974825



##########
File path: indexing-hadoop/src/main/java/org/apache/druid/indexer/JobHelper.java
##########
@@ -418,8 +402,19 @@ public static boolean runJobs(List<Jobby> jobs, HadoopDruidIndexerConfig config)
       }
     }
 
+    return succeeded;
+  }
+
+  public static void maybeDeleteIntermediatePath(
+      boolean indexerGeneratorJobSucceeded,

Review comment:
       fixed

##########
File path: indexing-hadoop/src/main/java/org/apache/druid/indexer/JobHelper.java
##########
@@ -418,8 +402,19 @@ public static boolean runJobs(List<Jobby> jobs, HadoopDruidIndexerConfig config)
       }
     }
 
+    return succeeded;
+  }
+
+  public static void maybeDeleteIntermediatePath(
+      boolean indexerGeneratorJobSucceeded,
+      HadoopIngestionSpec indexerSchema)
+  {
+    HadoopDruidIndexerConfig config = HadoopDruidIndexerConfig.fromSpec(indexerSchema);
+    final Configuration configuration = JobHelper.injectSystemProperties(new Configuration(), config);

Review comment:
       fixed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org