You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/08/05 09:27:43 UTC

[GitHub] [druid] maytasm opened a new pull request #11553: Fix ingestion task failure when no input split to process

maytasm opened a new pull request #11553:
URL: https://github.com/apache/druid/pull/11553


   Fix ingestion task failure when no input split to process
   
   ### Description
   This PR fix a regression introduced by https://github.com/apache/druid/pull/11189
   The regression causes ingestion task in parallel mode to fail when there is no input split to process (previously the task would succeed). 
   This is because of a null pointer exception occurred in getReports() at https://github.com/apache/druid/blob/2df42143aec6c50e9ac31d89cd75749d10d37a3d/indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexPhaseRunner.java#L296. This is because this method accesses taskMonitor without checking if it is null or not. This is in contrast to all other code in the class, which check if taskMonitor is null or not. Also see in the run() method of the class https://github.com/apache/druid/blob/2df42143aec6c50e9ac31d89cd75749d10d37a3d/indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexPhaseRunner.java#L111, 
   that taskMonitor is initialized in line 121, but that does not always happen. If the method returned in line 116, which is when there is no input source to split, then taskMonitor stays null. 
   
   This PR has:
   - [x] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [x] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm commented on pull request #11553: Fix ingestion task failure when no input split to process

Posted by GitBox <gi...@apache.org>.
maytasm commented on pull request #11553:
URL: https://github.com/apache/druid/pull/11553#issuecomment-895351902


   Merging this in to fix regression. Improvement for logging and UI can be done separately 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm commented on pull request #11553: Fix ingestion task failure when no input split to process

Posted by GitBox <gi...@apache.org>.
maytasm commented on pull request #11553:
URL: https://github.com/apache/druid/pull/11553#issuecomment-894619136


   > This sounds like an interesting edge case. ~Hope can there be an interesting y'all with no input split?~ How can there be an ingestion spec with no input split
   > 
   > I think we should add an integration test for this.
   > 
   > EDIT: Wow auto-correct on my phone - fixed my comment
   
   To name a few cases:
   - Reindex with datasource name that doesn’t exist
   - Reindex with interval that does not contain any data
   - Ingestion with interval in granularitySpec mismatching the inputSpec


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm commented on a change in pull request #11553: Fix ingestion task failure when no input split to process

Posted by GitBox <gi...@apache.org>.
maytasm commented on a change in pull request #11553:
URL: https://github.com/apache/druid/pull/11553#discussion_r684598493



##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexPhaseRunner.java
##########
@@ -289,13 +289,15 @@ private void stopInternal()
   @Override
   public void collectReport(SubTaskReportType report)
   {
-    taskMonitor.collectReport(report);
+    if (taskMonitor != null) {
+      taskMonitor.collectReport(report);
+    }

Review comment:
       Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm edited a comment on pull request #11553: Fix ingestion task failure when no input split to process

Posted by GitBox <gi...@apache.org>.
maytasm edited a comment on pull request #11553:
URL: https://github.com/apache/druid/pull/11553#issuecomment-895290411


   > > To name a few cases:
   > > 
   > > * Reindex with datasource name that doesn’t exist
   > > * Reindex with interval that does not contain any data
   > > * Ingestion with interval in granularitySpec mismatching the inputSpec
   > 
   > The first 2 cases sound like a user doing something they shouldn't be doing - and the re-index is a no-op. Maybe we should surface this as a typo to the end user?
   > 
   > Can you provide an example of the 3rd case?
   > 
   > I'm wondering if all these cases are situations where the end user should be warned of something "unexpected" happening.
   
   For the 3rd case, imagine you have a hash partitioning with the following spec:
   ```
      ...
       "ioConfig": {
         "type": "index_parallel",
         "inputSource": {
           "type": "druid",
           "dataSource": "mydatasource",
           "interval": "2020-07-31T00:00:00.000Z/2020-08-01T00:00:00.000Z",
           "filter": null,
           "dimensions": null,
           "metrics": [
             ...
           ]
         },
         "inputFormat": null,
         "appendToExisting": false
       },
       ....
       "granularitySpec": {
           "type": "uniform",
           "segmentGranularity": "DAY",
           "queryGranularity": "HOUR",
           "rollup": true,
           "intervals": [
             "2020-08-01T00:00:00.000Z/2020-08-02T00:00:00.000Z"
           ]
         },
       ....
       "partitionsSpec": {
           "type": "hashed",
           "numShards": null,
           "partitionDimensions": [],
           "partitionFunction": "murmur3_32_abs",
           "maxRowsPerSegment": 5000000
         },
       ....
       ```
   
   The partial_segment_merge phase tasks will have no input split to process as the interval in the ioConfig does not match with intervals in granularitySpec


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on pull request #11553: Fix ingestion task failure when no input split to process

Posted by GitBox <gi...@apache.org>.
suneet-s commented on pull request #11553:
URL: https://github.com/apache/druid/pull/11553#issuecomment-895277721


   > To name a few cases:
   > 
   > * Reindex with datasource name that doesn’t exist
   > * Reindex with interval that does not contain any data
   > * Ingestion with interval in granularitySpec mismatching the inputSpec
   
   The first 2 cases sound like a user doing something they shouldn't be doing - and the re-index is a no-op. Maybe we should surface this as a typo to the end user?
   
   Can you provide an example of the 3rd case?
   
   I'm wondering if all these cases are situations where the end user should be warned of something "unexpected" happening.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #11553: Fix ingestion task failure when no input split to process

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #11553:
URL: https://github.com/apache/druid/pull/11553#discussion_r683602732



##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexPhaseRunner.java
##########
@@ -289,13 +289,15 @@ private void stopInternal()
   @Override
   public void collectReport(SubTaskReportType report)
   {
-    taskMonitor.collectReport(report);
+    if (taskMonitor != null) {
+      taskMonitor.collectReport(report);
+    }

Review comment:
       `collectReport` should be called only when there is a subtask sending its report. Since `TaskMonitor` is responsible for spawning subtasks, it seems quite strange if `taskMonitor` is null when this method is called. How about adding a precondition that explodes when it's null?

##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexPhaseRunner.java
##########
@@ -289,13 +289,15 @@ private void stopInternal()
   @Override
   public void collectReport(SubTaskReportType report)
   {
-    taskMonitor.collectReport(report);
+    if (taskMonitor != null) {
+      taskMonitor.collectReport(report);
+    }
   }
 
   @Override
   public Map<String, SubTaskReportType> getReports()
   {
-    return taskMonitor.getReports();
+    return taskMonitor == null ? Collections.emptyMap() : taskMonitor.getReports();

Review comment:
       :+1: 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #11553: Fix ingestion task failure when no input split to process

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #11553:
URL: https://github.com/apache/druid/pull/11553#discussion_r683963400



##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexPhaseRunner.java
##########
@@ -289,13 +289,15 @@ private void stopInternal()
   @Override
   public void collectReport(SubTaskReportType report)
   {
-    taskMonitor.collectReport(report);
+    if (taskMonitor != null) {
+      taskMonitor.collectReport(report);
+    }

Review comment:
       You can set the severity of the "Nullability problems" intellij inspection to error instead of warning. I think we haven't done it because there could be lots of places where the inspection will complain about nulls. It could be worth doing it at some point though.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm commented on a change in pull request #11553: Fix ingestion task failure when no input split to process

Posted by GitBox <gi...@apache.org>.
maytasm commented on a change in pull request #11553:
URL: https://github.com/apache/druid/pull/11553#discussion_r683606663



##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexPhaseRunner.java
##########
@@ -289,13 +289,15 @@ private void stopInternal()
   @Override
   public void collectReport(SubTaskReportType report)
   {
-    taskMonitor.collectReport(report);
+    if (taskMonitor != null) {
+      taskMonitor.collectReport(report);
+    }

Review comment:
       I just saw that the `taskMonitor` was accessed without a null check so I added the null check just in case. The bug mentioned in this ticket can be fix with just the change in `getReports()`. Given the context from your comment, maybe it is better to leave it as is and adds a comment mentioning something like... collectReport is only called when there is a subtask sending its report. Since TaskMonitor is responsible for spawning subtasks, the taskMonitor cannot be null`

##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexPhaseRunner.java
##########
@@ -289,13 +289,15 @@ private void stopInternal()
   @Override
   public void collectReport(SubTaskReportType report)
   {
-    taskMonitor.collectReport(report);
+    if (taskMonitor != null) {
+      taskMonitor.collectReport(report);
+    }

Review comment:
       I just saw that the `taskMonitor` was accessed without a null check so I added the null check just in case. The bug mentioned in this ticket can be fix with just the change in `getReports()`. Given the context from your comment, maybe it is better to leave it as is and adds a comment mentioning something like... collectReport is only called when there is a subtask sending its report. Since TaskMonitor is responsible for spawning subtasks, the taskMonitor cannot be null




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #11553: Fix ingestion task failure when no input split to process

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #11553:
URL: https://github.com/apache/druid/pull/11553#discussion_r683613297



##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexPhaseRunner.java
##########
@@ -289,13 +289,15 @@ private void stopInternal()
   @Override
   public void collectReport(SubTaskReportType report)
   {
-    taskMonitor.collectReport(report);
+    if (taskMonitor != null) {
+      taskMonitor.collectReport(report);
+    }

Review comment:
       The comment sounds good. I think having a null check is also a good convention because someone like me could break the contract around `taskMonitor` unintentionally in the future. Perhaps we can add `assert taskMonitor != null;`. This will be ran only in tests, but not in production.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm commented on pull request #11553: Fix ingestion task failure when no input split to process

Posted by GitBox <gi...@apache.org>.
maytasm commented on pull request #11553:
URL: https://github.com/apache/druid/pull/11553#issuecomment-894620573


   > This sounds like an interesting edge case. ~Hope can there be an interesting y'all with no input split?~ How can there be an ingestion spec with no input split
   > 
   > I think we should add an integration test for this.
   > 
   > EDIT: Wow auto-correct on my phone - fixed my comment
   
   Added integration test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #11553: Fix ingestion task failure when no input split to process

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #11553:
URL: https://github.com/apache/druid/pull/11553#discussion_r683849572



##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexPhaseRunner.java
##########
@@ -289,13 +289,15 @@ private void stopInternal()
   @Override
   public void collectReport(SubTaskReportType report)
   {
-    taskMonitor.collectReport(report);
+    if (taskMonitor != null) {
+      taskMonitor.collectReport(report);
+    }

Review comment:
       Is there a way to have static analysis flag all these uses? I've tried using combinations of `@Nullable`(doesn't realize that once it's set, it's good for the rest of the method) and `@MonotonicNonNull` but haven't been able to get intelliJ to flag these issues for me locally yet. I figured I would just leave a comment here and maybe you will have better luck




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm merged pull request #11553: Fix ingestion task failure when no input split to process

Posted by GitBox <gi...@apache.org>.
maytasm merged pull request #11553:
URL: https://github.com/apache/druid/pull/11553


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm commented on pull request #11553: Fix ingestion task failure when no input split to process

Posted by GitBox <gi...@apache.org>.
maytasm commented on pull request #11553:
URL: https://github.com/apache/druid/pull/11553#issuecomment-895290411


   > > To name a few cases:
   > > 
   > > * Reindex with datasource name that doesn’t exist
   > > * Reindex with interval that does not contain any data
   > > * Ingestion with interval in granularitySpec mismatching the inputSpec
   > 
   > The first 2 cases sound like a user doing something they shouldn't be doing - and the re-index is a no-op. Maybe we should surface this as a typo to the end user?
   > 
   > Can you provide an example of the 3rd case?
   > 
   > I'm wondering if all these cases are situations where the end user should be warned of something "unexpected" happening.
   
   For the 3rd case, imagine you have a hash partitioning with the following spec:
   `    ...
       "ioConfig": {
         "type": "index_parallel",
         "inputSource": {
           "type": "druid",
           "dataSource": "mydatasource",
           "interval": "2020-07-31T00:00:00.000Z/2020-08-01T00:00:00.000Z",
           "filter": null,
           "dimensions": null,
           "metrics": [
             ...
           ]
         },
         "inputFormat": null,
         "appendToExisting": false
       },
       ....
       "granularitySpec": {
           "type": "uniform",
           "segmentGranularity": "DAY",
           "queryGranularity": "HOUR",
           "rollup": true,
           "intervals": [
             "2020-08-01T00:00:00.000Z/2020-08-02T00:00:00.000Z"
           ]
         },
       ....
       "partitionsSpec": {
           "type": "hashed",
           "numShards": null,
           "partitionDimensions": [],
           "partitionFunction": "murmur3_32_abs",
           "maxRowsPerSegment": 5000000
         },
       ....`
   
   The partial_segment_merge phase tasks will have no input split to process as the interval in the ioConfig does not match with intervals in granularitySpec


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm edited a comment on pull request #11553: Fix ingestion task failure when no input split to process

Posted by GitBox <gi...@apache.org>.
maytasm edited a comment on pull request #11553:
URL: https://github.com/apache/druid/pull/11553#issuecomment-895290411


   > > To name a few cases:
   > > 
   > > * Reindex with datasource name that doesn’t exist
   > > * Reindex with interval that does not contain any data
   > > * Ingestion with interval in granularitySpec mismatching the inputSpec
   > 
   > The first 2 cases sound like a user doing something they shouldn't be doing - and the re-index is a no-op. Maybe we should surface this as a typo to the end user?
   > 
   > Can you provide an example of the 3rd case?
   > 
   > I'm wondering if all these cases are situations where the end user should be warned of something "unexpected" happening.
   
   For the 3rd case, imagine you have a hash partitioning with the following spec:
   ```
      ...
       "ioConfig": {
         "type": "index_parallel",
         "inputSource": {
           "type": "druid",
           "dataSource": "mydatasource",
           "interval": "2020-07-31T00:00:00.000Z/2020-08-01T00:00:00.000Z",
           "filter": null,
           "dimensions": null,
           "metrics": [
             ...
           ]
         },
         "inputFormat": null,
         "appendToExisting": false
       },
       ....
       "granularitySpec": {
           "type": "uniform",
           "segmentGranularity": "DAY",
           "queryGranularity": "HOUR",
           "rollup": true,
           "intervals": [
             "2020-08-01T00:00:00.000Z/2020-08-02T00:00:00.000Z"
           ]
         },
       ....
       "partitionsSpec": {
           "type": "hashed",
           "numShards": null,
           "partitionDimensions": [],
           "partitionFunction": "murmur3_32_abs",
           "maxRowsPerSegment": 5000000
         },
       ....
       
   ```
   
   The partial_segment_merge phase tasks will have no input split to process as the interval in the ioConfig does not match with intervals in granularitySpec


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s edited a comment on pull request #11553: Fix ingestion task failure when no input split to process

Posted by GitBox <gi...@apache.org>.
suneet-s edited a comment on pull request #11553:
URL: https://github.com/apache/druid/pull/11553#issuecomment-893469600


   This sounds like an interesting edge case. ~~Hope can there be an interesting y'all with no input split?~~ How can there be an ingestion spec with no input split
   
   I think we should add an integration test for this.
   
   EDIT: Wow auto-correct on my phone - fixed my comment


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s edited a comment on pull request #11553: Fix ingestion task failure when no input split to process

Posted by GitBox <gi...@apache.org>.
suneet-s edited a comment on pull request #11553:
URL: https://github.com/apache/druid/pull/11553#issuecomment-893469600


   This sounds like an interesting edge case. --Hope can there be an interesting y'all with no input split?-- How can there be an ingestion spec with no input split
   
   I think we should add an integration test for this.
   
   EDIT: Wow auto-correct on my phone - fixed my comment


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on pull request #11553: Fix ingestion task failure when no input split to process

Posted by GitBox <gi...@apache.org>.
suneet-s commented on pull request #11553:
URL: https://github.com/apache/druid/pull/11553#issuecomment-893469600


   This sounds like an interesting edge case. Hope can there be an interesting y'all with no input split?
   
   I think we should add an integration test for this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm commented on pull request #11553: Fix ingestion task failure when no input split to process

Posted by GitBox <gi...@apache.org>.
maytasm commented on pull request #11553:
URL: https://github.com/apache/druid/pull/11553#issuecomment-895292005


   > > > To name a few cases:
   > > > 
   > > > * Reindex with datasource name that doesn’t exist
   > > > * Reindex with interval that does not contain any data
   > > > * Ingestion with interval in granularitySpec mismatching the inputSpec
   > > 
   > > 
   > > The first 2 cases sound like a user doing something they shouldn't be doing - and the re-index is a no-op. Maybe we should surface this as a typo to the end user?
   > > Can you provide an example of the 3rd case?
   > > I'm wondering if all these cases are situations where the end user should be warned of something "unexpected" happening.
   > 
   > For the 3rd case, imagine you have a hash partitioning with the following spec:
   > 
   > ```
   >    ...
   >     "ioConfig": {
   >       "type": "index_parallel",
   >       "inputSource": {
   >         "type": "druid",
   >         "dataSource": "mydatasource",
   >         "interval": "2020-07-31T00:00:00.000Z/2020-08-01T00:00:00.000Z",
   >         "filter": null,
   >         "dimensions": null,
   >         "metrics": [
   >           ...
   >         ]
   >       },
   >       "inputFormat": null,
   >       "appendToExisting": false
   >     },
   >     ....
   >     "granularitySpec": {
   >         "type": "uniform",
   >         "segmentGranularity": "DAY",
   >         "queryGranularity": "HOUR",
   >         "rollup": true,
   >         "intervals": [
   >           "2020-08-01T00:00:00.000Z/2020-08-02T00:00:00.000Z"
   >         ]
   >       },
   >     ....
   >     "partitionsSpec": {
   >         "type": "hashed",
   >         "numShards": null,
   >         "partitionDimensions": [],
   >         "partitionFunction": "murmur3_32_abs",
   >         "maxRowsPerSegment": 5000000
   >       },
   >     ....
   >     
   > ```
   > 
   > The partial_segment_merge phase tasks will have no input split to process as the interval in the ioConfig does not match with intervals in granularitySpec
   
   The cases I thought of are due to user error in their ingestionSpec. I am not sure if there are valid cases where we can run into no input split to process situation though. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org