You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2022/04/22 17:00:21 UTC

[GitHub] [druid] AmatyaAvadhanula opened a new pull request, #12473: Enable usage of multiple base task directories

AmatyaAvadhanula opened a new pull request, #12473:
URL: https://github.com/apache/druid/pull/12473

   ### Description
   
   Enable usage of multiple base task directories for task data using List<StorageTaskLocation> instead of a single File.
   
   The current change uses uniformly random assignment based on taskId to one of the locations.
   
   TODO: Distribute tasks to Locations with count proportional to their maxSize.
   
   
   <br/>
   
   NOTE: Enforcement of task data to Location maxSize limits has not been implemented
   
   
   <hr>
   
   ##### Key changed/added classes in this PR
    * `TaskConfig`
    * `WorkerTaskManager`
   
   <hr>
   
   
   This PR has:
   - [ ] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] FrankChen021 commented on a diff in pull request #12473: Enable usage of multiple base task directories

Posted by GitBox <gi...@apache.org>.
FrankChen021 commented on code in PR #12473:
URL: https://github.com/apache/druid/pull/12473#discussion_r857073026


##########
indexing-service/src/main/java/org/apache/druid/indexing/common/config/TaskConfig.java:
##########
@@ -184,14 +228,48 @@ public String getBaseDir()
   }
 
   @JsonProperty
-  public File getBaseTaskDir()
+  public List<StorageLocationConfig> getBaseTaskDirLocations()
+  {
+    return baseTaskDirLocations;
+  }
+
+  public List<StorageLocationConfig> getBaseTaskDirChildrenLocations(String child)
+  {
+    return getBaseTaskDirLocations().stream()
+                                    .map(location -> new StorageLocationConfig(
+                                        getChildFileFromLocation(location, child),
+                                        location.getMaxSize(),
+                                        location.getFreeSpacePercent()))
+                                    .collect(Collectors.toList());
+  }
+
+  private File getChildFileFromLocation(StorageLocationConfig location, String child)
+  {
+    return new File(location.getPath(), child);
+  }
+
+  private StorageLocationConfig getBaseTaskDirLocationForId(String taskId)
+  {
+    if (taskId == null) {
+      return null;
+    }
+    // TODO: consider the size of each of the base locations for distribution
+    // use uniformly RANDOM assignment
+    int index = taskId.hashCode() % baseTaskDirLocations.size();

Review Comment:
   ```suggestion
       int index = Math.abs(taskId.hashCode() % baseTaskDirLocations.size());
   ```
   
   I think this can be used to eliminate following if statement checking, and is more easy to understand.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] AmatyaAvadhanula commented on a diff in pull request #12473: WIP: Enable usage of multiple base task directories

Posted by GitBox <gi...@apache.org>.
AmatyaAvadhanula commented on code in PR #12473:
URL: https://github.com/apache/druid/pull/12473#discussion_r857074563


##########
indexing-service/src/main/java/org/apache/druid/indexing/common/config/TaskConfig.java:
##########
@@ -131,11 +167,11 @@ public TaskConfig(
       @JsonProperty("ignoreTimestampSpecForDruidInputSource") boolean ignoreTimestampSpecForDruidInputSource,
       @JsonProperty("batchMemoryMappedIndex") boolean batchMemoryMappedIndex, // deprecated, only set to true to fall back to older behavior
       @JsonProperty("batchProcessingMode") String batchProcessingMode,
-      @JsonProperty("storeEmptyColumns") @Nullable Boolean storeEmptyColumns
+      @JsonProperty("storeEmptyColumns") @Nullable Boolean storeEmptyColumns,
+      @JsonProperty("baseTaskDirLocations") @Nullable List<StorageLocationConfig> baseTaskDirLocations

Review Comment:
   Yes. I'll do that as well



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] FrankChen021 commented on a diff in pull request #12473: Enable usage of multiple base task directories

Posted by GitBox <gi...@apache.org>.
FrankChen021 commented on code in PR #12473:
URL: https://github.com/apache/druid/pull/12473#discussion_r857072751


##########
indexing-service/src/main/java/org/apache/druid/indexing/common/config/TaskConfig.java:
##########
@@ -184,14 +228,48 @@ public String getBaseDir()
   }
 
   @JsonProperty
-  public File getBaseTaskDir()
+  public List<StorageLocationConfig> getBaseTaskDirLocations()
+  {
+    return baseTaskDirLocations;
+  }
+
+  public List<StorageLocationConfig> getBaseTaskDirChildrenLocations(String child)
+  {
+    return getBaseTaskDirLocations().stream()
+                                    .map(location -> new StorageLocationConfig(
+                                        getChildFileFromLocation(location, child),
+                                        location.getMaxSize(),
+                                        location.getFreeSpacePercent()))
+                                    .collect(Collectors.toList());
+  }
+
+  private File getChildFileFromLocation(StorageLocationConfig location, String child)
+  {
+    return new File(location.getPath(), child);
+  }
+
+  private StorageLocationConfig getBaseTaskDirLocationForId(String taskId)
+  {
+    if (taskId == null) {
+      return null;
+    }
+    // TODO: consider the size of each of the base locations for distribution
+    // use uniformly RANDOM assignment
+    int index = taskId.hashCode() % baseTaskDirLocations.size();
+    if (index < 0) {
+      index = -(index + 1);
+    }
+    return baseTaskDirLocations.get(index);
+  }
+
+  public File getTaskBaseDir(String taskId)
   {
-    return baseTaskDir;
+    return getBaseTaskDirLocationForId(taskId).getPath();
   }
 
   public File getTaskDir(String taskId)
   {
-    return new File(baseTaskDir, taskId);
+    return new File(getTaskBaseDir(taskId), taskId);

Review Comment:
   From the code above, the `getTaskBaseDir` can return null. This would cause the ctor of `File` throw NPE.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] lgtm-com[bot] commented on pull request #12473: Enable usage of multiple base task directories

Posted by GitBox <gi...@apache.org>.
lgtm-com[bot] commented on PR #12473:
URL: https://github.com/apache/druid/pull/12473#issuecomment-1107463030

   This pull request **introduces 2 alerts** and **fixes 3** when merging daf53af124f502370b44009a99ca98108b773e57 into 4781af9921a5ed1ffe88aa9083cb93265e0ec8ba - [view on LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-b279187099027f04f61125eb154ee6099ed21cc5)
   
   **new alerts:**
   
   * 2 for Uncontrolled data used in path expression
   
   **fixed alerts:**
   
   * 3 for Uncontrolled data used in path expression


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] FrankChen021 commented on a diff in pull request #12473: Enable usage of multiple base task directories

Posted by GitBox <gi...@apache.org>.
FrankChen021 commented on code in PR #12473:
URL: https://github.com/apache/druid/pull/12473#discussion_r857073920


##########
indexing-service/src/main/java/org/apache/druid/indexing/common/config/TaskConfig.java:
##########
@@ -131,11 +167,11 @@ public TaskConfig(
       @JsonProperty("ignoreTimestampSpecForDruidInputSource") boolean ignoreTimestampSpecForDruidInputSource,
       @JsonProperty("batchMemoryMappedIndex") boolean batchMemoryMappedIndex, // deprecated, only set to true to fall back to older behavior
       @JsonProperty("batchProcessingMode") String batchProcessingMode,
-      @JsonProperty("storeEmptyColumns") @Nullable Boolean storeEmptyColumns
+      @JsonProperty("storeEmptyColumns") @Nullable Boolean storeEmptyColumns,
+      @JsonProperty("baseTaskDirLocations") @Nullable List<StorageLocationConfig> baseTaskDirLocations

Review Comment:
   Do we need to add this new parameter to the configuration document?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] FrankChen021 commented on a diff in pull request #12473: Enable usage of multiple base task directories

Posted by GitBox <gi...@apache.org>.
FrankChen021 commented on code in PR #12473:
URL: https://github.com/apache/druid/pull/12473#discussion_r857073326


##########
indexing-service/src/main/java/org/apache/druid/indexing/common/config/TaskConfig.java:
##########
@@ -175,6 +211,14 @@ public TaskConfig(
     }
     log.debug("Batch processing mode:[%s]", this.batchProcessingMode);
     this.storeEmptyColumns = storeEmptyColumns == null ? DEFAULT_STORE_EMPTY_COLUMNS : storeEmptyColumns;
+
+    if (baseTaskDirLocations == null || baseTaskDirLocations.size() == 0) {

Review Comment:
   Can use `org.apache.druid.utils.CollectionUtils.isEmpty` to replace these two conditions



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] FrankChen021 commented on a diff in pull request #12473: WIP: Enable usage of multiple base task directories

Posted by GitBox <gi...@apache.org>.
FrankChen021 commented on code in PR #12473:
URL: https://github.com/apache/druid/pull/12473#discussion_r857652180


##########
indexing-service/src/main/java/org/apache/druid/indexing/common/config/TaskConfig.java:
##########
@@ -184,14 +228,48 @@ public String getBaseDir()
   }
 
   @JsonProperty
-  public File getBaseTaskDir()
+  public List<StorageLocationConfig> getBaseTaskDirLocations()
+  {
+    return baseTaskDirLocations;
+  }
+
+  public List<StorageLocationConfig> getBaseTaskDirChildrenLocations(String child)
+  {
+    return getBaseTaskDirLocations().stream()
+                                    .map(location -> new StorageLocationConfig(
+                                        getChildFileFromLocation(location, child),
+                                        location.getMaxSize(),
+                                        location.getFreeSpacePercent()))
+                                    .collect(Collectors.toList());
+  }
+
+  private File getChildFileFromLocation(StorageLocationConfig location, String child)
+  {
+    return new File(location.getPath(), child);
+  }
+
+  private StorageLocationConfig getBaseTaskDirLocationForId(String taskId)
+  {
+    if (taskId == null) {
+      return null;
+    }
+    // TODO: consider the size of each of the base locations for distribution
+    // use uniformly RANDOM assignment
+    int index = taskId.hashCode() % baseTaskDirLocations.size();

Review Comment:
   > I agree, however the bug analyzer complains that Math.abs(Integer.MIN_VALUE) is invalid
   
   Which bug analyzer? spotbugs?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] AmatyaAvadhanula commented on a diff in pull request #12473: Enable usage of multiple base task directories

Posted by GitBox <gi...@apache.org>.
AmatyaAvadhanula commented on code in PR #12473:
URL: https://github.com/apache/druid/pull/12473#discussion_r857073366


##########
indexing-service/src/main/java/org/apache/druid/indexing/common/config/TaskConfig.java:
##########
@@ -184,14 +228,48 @@ public String getBaseDir()
   }
 
   @JsonProperty
-  public File getBaseTaskDir()
+  public List<StorageLocationConfig> getBaseTaskDirLocations()
+  {
+    return baseTaskDirLocations;
+  }
+
+  public List<StorageLocationConfig> getBaseTaskDirChildrenLocations(String child)
+  {
+    return getBaseTaskDirLocations().stream()
+                                    .map(location -> new StorageLocationConfig(
+                                        getChildFileFromLocation(location, child),
+                                        location.getMaxSize(),
+                                        location.getFreeSpacePercent()))
+                                    .collect(Collectors.toList());
+  }
+
+  private File getChildFileFromLocation(StorageLocationConfig location, String child)
+  {
+    return new File(location.getPath(), child);
+  }
+
+  private StorageLocationConfig getBaseTaskDirLocationForId(String taskId)
+  {
+    if (taskId == null) {
+      return null;
+    }
+    // TODO: consider the size of each of the base locations for distribution
+    // use uniformly RANDOM assignment
+    int index = taskId.hashCode() % baseTaskDirLocations.size();

Review Comment:
   I agree, however the bug analyzer complains that Math.abs(Integer.MIN_VALUE) is invalid



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] AmatyaAvadhanula closed pull request #12473: WIP: Enable usage of multiple base task directories

Posted by GitBox <gi...@apache.org>.
AmatyaAvadhanula closed pull request #12473: WIP: Enable usage of multiple base task directories
URL: https://github.com/apache/druid/pull/12473


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] AmatyaAvadhanula commented on a diff in pull request #12473: Enable usage of multiple base task directories

Posted by GitBox <gi...@apache.org>.
AmatyaAvadhanula commented on code in PR #12473:
URL: https://github.com/apache/druid/pull/12473#discussion_r857073759


##########
indexing-service/src/main/java/org/apache/druid/indexing/common/config/TaskConfig.java:
##########
@@ -184,14 +228,48 @@ public String getBaseDir()
   }
 
   @JsonProperty
-  public File getBaseTaskDir()
+  public List<StorageLocationConfig> getBaseTaskDirLocations()
+  {
+    return baseTaskDirLocations;
+  }
+
+  public List<StorageLocationConfig> getBaseTaskDirChildrenLocations(String child)
+  {
+    return getBaseTaskDirLocations().stream()
+                                    .map(location -> new StorageLocationConfig(
+                                        getChildFileFromLocation(location, child),
+                                        location.getMaxSize(),
+                                        location.getFreeSpacePercent()))
+                                    .collect(Collectors.toList());
+  }
+
+  private File getChildFileFromLocation(StorageLocationConfig location, String child)
+  {
+    return new File(location.getPath(), child);
+  }
+
+  private StorageLocationConfig getBaseTaskDirLocationForId(String taskId)
+  {
+    if (taskId == null) {
+      return null;
+    }
+    // TODO: consider the size of each of the base locations for distribution
+    // use uniformly RANDOM assignment
+    int index = taskId.hashCode() % baseTaskDirLocations.size();
+    if (index < 0) {
+      index = -(index + 1);
+    }
+    return baseTaskDirLocations.get(index);
+  }
+
+  public File getTaskBaseDir(String taskId)
   {
-    return baseTaskDir;
+    return getBaseTaskDirLocationForId(taskId).getPath();
   }
 
   public File getTaskDir(String taskId)
   {
-    return new File(baseTaskDir, taskId);
+    return new File(getTaskBaseDir(taskId), taskId);

Review Comment:
   Sorry, could you please explain how?
   The list will have at least one value, and I'll be returning a File corresponding to one of those for a given task



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] FrankChen021 commented on a diff in pull request #12473: WIP: Enable usage of multiple base task directories

Posted by GitBox <gi...@apache.org>.
FrankChen021 commented on code in PR #12473:
URL: https://github.com/apache/druid/pull/12473#discussion_r857650933


##########
indexing-service/src/main/java/org/apache/druid/indexing/common/config/TaskConfig.java:
##########
@@ -184,14 +228,48 @@ public String getBaseDir()
   }
 
   @JsonProperty
-  public File getBaseTaskDir()
+  public List<StorageLocationConfig> getBaseTaskDirLocations()
+  {
+    return baseTaskDirLocations;
+  }
+
+  public List<StorageLocationConfig> getBaseTaskDirChildrenLocations(String child)
+  {
+    return getBaseTaskDirLocations().stream()
+                                    .map(location -> new StorageLocationConfig(
+                                        getChildFileFromLocation(location, child),
+                                        location.getMaxSize(),
+                                        location.getFreeSpacePercent()))
+                                    .collect(Collectors.toList());
+  }
+
+  private File getChildFileFromLocation(StorageLocationConfig location, String child)
+  {
+    return new File(location.getPath(), child);
+  }
+
+  private StorageLocationConfig getBaseTaskDirLocationForId(String taskId)
+  {
+    if (taskId == null) {
+      return null;
+    }
+    // TODO: consider the size of each of the base locations for distribution
+    // use uniformly RANDOM assignment
+    int index = taskId.hashCode() % baseTaskDirLocations.size();
+    if (index < 0) {
+      index = -(index + 1);
+    }
+    return baseTaskDirLocations.get(index);
+  }
+
+  public File getTaskBaseDir(String taskId)
   {
-    return baseTaskDir;
+    return getBaseTaskDirLocationForId(taskId).getPath();
   }
 
   public File getTaskDir(String taskId)
   {
-    return new File(baseTaskDir, taskId);
+    return new File(getTaskBaseDir(taskId), taskId);

Review Comment:
   `getTaskBaseDir` checks if the `taskId` is null. And if it's null, a null is returned and will be passed to the ctor of `File` which then throws an NPE.
   
   Usually, we check the arguments at the public methods level. So, if the `taskId` could be null, it's better to check it in this function.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] AmatyaAvadhanula commented on a diff in pull request #12473: Enable usage of multiple base task directories

Posted by GitBox <gi...@apache.org>.
AmatyaAvadhanula commented on code in PR #12473:
URL: https://github.com/apache/druid/pull/12473#discussion_r857073506


##########
indexing-service/src/main/java/org/apache/druid/indexing/common/config/TaskConfig.java:
##########
@@ -184,14 +228,48 @@ public String getBaseDir()
   }
 
   @JsonProperty
-  public File getBaseTaskDir()
+  public List<StorageLocationConfig> getBaseTaskDirLocations()
+  {
+    return baseTaskDirLocations;
+  }
+
+  public List<StorageLocationConfig> getBaseTaskDirChildrenLocations(String child)
+  {
+    return getBaseTaskDirLocations().stream()
+                                    .map(location -> new StorageLocationConfig(
+                                        getChildFileFromLocation(location, child),
+                                        location.getMaxSize(),
+                                        location.getFreeSpacePercent()))
+                                    .collect(Collectors.toList());
+  }
+
+  private File getChildFileFromLocation(StorageLocationConfig location, String child)
+  {
+    return new File(location.getPath(), child);
+  }
+
+  private StorageLocationConfig getBaseTaskDirLocationForId(String taskId)
+  {
+    if (taskId == null) {
+      return null;
+    }
+    // TODO: consider the size of each of the base locations for distribution
+    // use uniformly RANDOM assignment
+    int index = taskId.hashCode() % baseTaskDirLocations.size();

Review Comment:
   The argument will never be Integer.MIN_VALUE logically but I had to do it in a roundabout manner to get past it.
   Is there a way to suppress it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] FrankChen021 commented on a diff in pull request #12473: Enable usage of multiple base task directories

Posted by GitBox <gi...@apache.org>.
FrankChen021 commented on code in PR #12473:
URL: https://github.com/apache/druid/pull/12473#discussion_r857072565


##########
indexing-service/src/main/java/org/apache/druid/indexing/common/config/TaskConfig.java:
##########
@@ -175,6 +211,14 @@ public TaskConfig(
     }
     log.debug("Batch processing mode:[%s]", this.batchProcessingMode);
     this.storeEmptyColumns = storeEmptyColumns == null ? DEFAULT_STORE_EMPTY_COLUMNS : storeEmptyColumns;
+
+    if (baseTaskDirLocations == null || baseTaskDirLocations.size() == 0) {
+      File baseTaskFile = new File(defaultDir(baseTaskDir, "persistent/task"));

Review Comment:
   the `baseTaskDir` configuration now is meaningful when the new `baseTaskDirLocations` is not set. For the long term evolution, should the `baseTaskDir` be marked as deprecated?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] AmatyaAvadhanula commented on a diff in pull request #12473: WIP: Enable usage of multiple base task directories

Posted by GitBox <gi...@apache.org>.
AmatyaAvadhanula commented on code in PR #12473:
URL: https://github.com/apache/druid/pull/12473#discussion_r857654957


##########
indexing-service/src/main/java/org/apache/druid/indexing/common/config/TaskConfig.java:
##########
@@ -184,14 +228,48 @@ public String getBaseDir()
   }
 
   @JsonProperty
-  public File getBaseTaskDir()
+  public List<StorageLocationConfig> getBaseTaskDirLocations()
+  {
+    return baseTaskDirLocations;
+  }
+
+  public List<StorageLocationConfig> getBaseTaskDirChildrenLocations(String child)
+  {
+    return getBaseTaskDirLocations().stream()
+                                    .map(location -> new StorageLocationConfig(
+                                        getChildFileFromLocation(location, child),
+                                        location.getMaxSize(),
+                                        location.getFreeSpacePercent()))
+                                    .collect(Collectors.toList());
+  }
+
+  private File getChildFileFromLocation(StorageLocationConfig location, String child)
+  {
+    return new File(location.getPath(), child);
+  }
+
+  private StorageLocationConfig getBaseTaskDirLocationForId(String taskId)
+  {
+    if (taskId == null) {
+      return null;
+    }
+    // TODO: consider the size of each of the base locations for distribution
+    // use uniformly RANDOM assignment
+    int index = taskId.hashCode() % baseTaskDirLocations.size();
+    if (index < 0) {
+      index = -(index + 1);
+    }
+    return baseTaskDirLocations.get(index);
+  }
+
+  public File getTaskBaseDir(String taskId)
   {
-    return baseTaskDir;
+    return getBaseTaskDirLocationForId(taskId).getPath();
   }
 
   public File getTaskDir(String taskId)
   {
-    return new File(baseTaskDir, taskId);
+    return new File(getTaskBaseDir(taskId), taskId);

Review Comment:
   Ah yes, sorry. I'll add the check



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] AmatyaAvadhanula commented on a diff in pull request #12473: WIP: Enable usage of multiple base task directories

Posted by GitBox <gi...@apache.org>.
AmatyaAvadhanula commented on code in PR #12473:
URL: https://github.com/apache/druid/pull/12473#discussion_r857652822


##########
indexing-service/src/main/java/org/apache/druid/indexing/common/config/TaskConfig.java:
##########
@@ -184,14 +228,48 @@ public String getBaseDir()
   }
 
   @JsonProperty
-  public File getBaseTaskDir()
+  public List<StorageLocationConfig> getBaseTaskDirLocations()
+  {
+    return baseTaskDirLocations;
+  }
+
+  public List<StorageLocationConfig> getBaseTaskDirChildrenLocations(String child)
+  {
+    return getBaseTaskDirLocations().stream()
+                                    .map(location -> new StorageLocationConfig(
+                                        getChildFileFromLocation(location, child),
+                                        location.getMaxSize(),
+                                        location.getFreeSpacePercent()))
+                                    .collect(Collectors.toList());
+  }
+
+  private File getChildFileFromLocation(StorageLocationConfig location, String child)
+  {
+    return new File(location.getPath(), child);
+  }
+
+  private StorageLocationConfig getBaseTaskDirLocationForId(String taskId)
+  {
+    if (taskId == null) {
+      return null;
+    }
+    // TODO: consider the size of each of the base locations for distribution
+    // use uniformly RANDOM assignment
+    int index = taskId.hashCode() % baseTaskDirLocations.size();

Review Comment:
   Yes



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] AmatyaAvadhanula commented on pull request #12473: WIP: Enable usage of multiple base task directories

Posted by GitBox <gi...@apache.org>.
AmatyaAvadhanula commented on PR #12473:
URL: https://github.com/apache/druid/pull/12473#issuecomment-1333678273

   Closing this PR. Raised https://github.com/apache/druid/pull/13476 instead


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org