You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@helix.apache.org by GitBox <gi...@apache.org> on 2020/11/23 06:26:39 UTC

[GitHub] [helix] alirezazamani opened a new pull request #1548: Fix targeted job quota calculation for given up tasks

alirezazamani opened a new pull request #1548:
URL: https://github.com/apache/helix/pull/1548


   ### Issues
   
   - [X] My PR addresses the following Helix issues and references them in the PR description:
   Fixes #1547 
   
   ### Description
   
   - [x] Here are some details about my PR, including screenshots of any UI changes:
   In this PR, the tasks that should not be retried again, will not occupy any quota. The would avoid the jobs
   being blocked because of quota usage of given up tasks.
   
   ### Tests
   
   - [x] The following tests are written for this issue:
   TestTaskErrorMaxRetriesQuotaRelease
   
   - [x] The following is the result of the "mvn test" command on the appropriate module:
   Helix-core:
   ```
   [ERROR] org.apache.helix.tools.TestClusterStateVerifier.beforeMethod(org.apache.helix.tools.TestClusterStateVerifier)
   [ERROR]   Run 1: TestClusterStateVerifier.beforeMethod:57 ยป Helix cluster null is not setup yet
   [INFO]   Run 2: PASS
   [INFO]
   [INFO]
   [ERROR] Tests run: 1253, Failures: 1, Errors: 0, Skipped: 3
   [INFO]
   [INFO] ------------------------------------------------------------------------
   [INFO] BUILD FAILURE
   [INFO] ------------------------------------------------------------------------
   [INFO] Total time:  01:29 h
   [INFO] Finished at: 2020-11-22T19:02:28-08:00
   [INFO] ------------------------------------------------------------------------
   ```
   The failed test is unrelated to this change and is failing even without this PR.
   
   Helix-rest:
   ```
   [INFO] Tests run: 171, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 246.03 s - in TestSuite
   [INFO]
   [INFO] Results:
   [INFO]
   [INFO] Tests run: 171, Failures: 0, Errors: 0, Skipped: 0
   [INFO]
   [INFO] ------------------------------------------------------------------------
   [INFO] BUILD SUCCESS
   [INFO] ------------------------------------------------------------------------
   [INFO] Total time:  04:10 min
   [INFO] Finished at: 2020-11-22T22:21:02-08:00
   [INFO] ------------------------------------------------------------------------
   ```
   
   ### Commits
   
   - My commits all reference appropriate Apache Helix GitHub issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)":
     1. Subject is separated from body by a blank line
     1. Subject is limited to 50 characters (not including Jira issue reference)
     1. Subject does not end with a period
     1. Subject uses the imperative mood ("add", not "adding")
     1. Body wraps at 72 characters
     1. Body explains "what" and "why", not "how"
   
   ### Code Quality
   
   - My diff has been formatted using helix-style.xml 
   (helix-style-intellij.xml if IntelliJ IDE is used)
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] alirezazamani commented on a change in pull request #1548: Fix targeted job quota calculation for given up tasks

Posted by GitBox <gi...@apache.org>.
alirezazamani commented on a change in pull request #1548:
URL: https://github.com/apache/helix/pull/1548#discussion_r529023074



##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestTaskErrorMaxRetriesQuotaRelease.java
##########
@@ -0,0 +1,63 @@
+package org.apache.helix.integration.task;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import org.apache.helix.TestHelper;
+import org.apache.helix.task.JobConfig;
+import org.apache.helix.task.JobQueue;
+import org.apache.helix.task.TaskState;
+import org.apache.helix.task.TaskUtil;
+import org.testng.annotations.AfterClass;
+import org.testng.annotations.BeforeClass;
+import org.testng.annotations.Test;
+
+import com.google.common.collect.ImmutableMap;
+
+public class TestTaskErrorMaxRetriesQuotaRelease extends TaskTestBase {

Review comment:
       Basically, I use this theory in my tests. Write a test that fails without this PR. Add the PR and make sure the test will be passed now. The setup to have a failing test is not available in other tests. That is why I need to have a new setup (scenario) for this PR.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] alirezazamani commented on a change in pull request #1548: Fix targeted job quota calculation for given up tasks

Posted by GitBox <gi...@apache.org>.
alirezazamani commented on a change in pull request #1548:
URL: https://github.com/apache/helix/pull/1548#discussion_r529010453



##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -838,7 +839,7 @@ private static void addCompletedTasks(Set<Integer> set, JobContext ctx, Iterable
    */
   private boolean isTaskNotInTerminalState(TaskPartitionState state) {

Review comment:
       We need to note that this is not a design. This is a bug that is introduced before and is impacting the production result. This PR is to fix the bug. I tried differently but I believe this is the best and more understandable way. I added more comments for clarification.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] alirezazamani commented on a change in pull request #1548: Fix targeted job quota calculation for given up tasks

Posted by GitBox <gi...@apache.org>.
alirezazamani commented on a change in pull request #1548:
URL: https://github.com/apache/helix/pull/1548#discussion_r529013432



##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -838,7 +839,7 @@ private static void addCompletedTasks(Set<Integer> set, JobContext ctx, Iterable
    */
   private boolean isTaskNotInTerminalState(TaskPartitionState state) {

Review comment:
       Also here we are trying to minimize the usage of TaskPartitionState.... in this function and move the logic to other functions. If not, the code will be ugly and hard to follow. By just using given up, we mean this task is not retriable by the controller.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] alirezazamani commented on a change in pull request #1548: Fix targeted job quota calculation for given up tasks

Posted by GitBox <gi...@apache.org>.
alirezazamani commented on a change in pull request #1548:
URL: https://github.com/apache/helix/pull/1548#discussion_r529010924



##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestTaskErrorMaxRetriesQuotaRelease.java
##########
@@ -0,0 +1,63 @@
+package org.apache.helix.integration.task;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import org.apache.helix.TestHelper;
+import org.apache.helix.task.JobConfig;
+import org.apache.helix.task.JobQueue;
+import org.apache.helix.task.TaskState;
+import org.apache.helix.task.TaskUtil;
+import org.testng.annotations.AfterClass;
+import org.testng.annotations.BeforeClass;
+import org.testng.annotations.Test;
+
+import com.google.common.collect.ImmutableMap;
+
+public class TestTaskErrorMaxRetriesQuotaRelease extends TaskTestBase {

Review comment:
       I don't think it is possible. It specifically needs a new setup. One instance with a lot of partitions. Adding it to new tests would make other test logic change.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] alirezazamani commented on a change in pull request #1548: Fix targeted job quota calculation for given up tasks

Posted by GitBox <gi...@apache.org>.
alirezazamani commented on a change in pull request #1548:
URL: https://github.com/apache/helix/pull/1548#discussion_r529013432



##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -838,7 +839,7 @@ private static void addCompletedTasks(Set<Integer> set, JobContext ctx, Iterable
    */
   private boolean isTaskNotInTerminalState(TaskPartitionState state) {

Review comment:
       Also here we are trying to minimize the usage of TaskPartitionState.... in this function and move the logic to other functions. If not, the code will be ugly and hard to follow. By just using given up, we mean this task is not retriable by the controller.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] alirezazamani commented on a change in pull request #1548: Fix targeted job quota calculation for given up tasks

Posted by GitBox <gi...@apache.org>.
alirezazamani commented on a change in pull request #1548:
URL: https://github.com/apache/helix/pull/1548#discussion_r529046942



##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestTaskErrorMaxRetriesQuotaRelease.java
##########
@@ -0,0 +1,63 @@
+package org.apache.helix.integration.task;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import org.apache.helix.TestHelper;
+import org.apache.helix.task.JobConfig;
+import org.apache.helix.task.JobQueue;
+import org.apache.helix.task.TaskState;
+import org.apache.helix.task.TaskUtil;
+import org.testng.annotations.AfterClass;
+import org.testng.annotations.BeforeClass;
+import org.testng.annotations.Test;
+
+import com.google.common.collect.ImmutableMap;
+
+public class TestTaskErrorMaxRetriesQuotaRelease extends TaskTestBase {

Review comment:
       Moved to another existing class.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] alirezazamani commented on a change in pull request #1548: Fix targeted job quota calculation for given up tasks

Posted by GitBox <gi...@apache.org>.
alirezazamani commented on a change in pull request #1548:
URL: https://github.com/apache/helix/pull/1548#discussion_r529018503



##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -838,7 +839,7 @@ private static void addCompletedTasks(Set<Integer> set, JobContext ctx, Iterable
    */
   private boolean isTaskNotInTerminalState(TaskPartitionState state) {

Review comment:
       One more thing to add. I think the designer of this method (I was not involved in designing this method BTW) used the variable `partitionsToRetryOnLiveInstanceChangeForTargetedJob` to give some hint to the reader about what are these tasks and why we are adding them back. Hence, I think keeping it as it is still the best solution.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] alirezazamani commented on a change in pull request #1548: Fix targeted job quota calculation for given up tasks

Posted by GitBox <gi...@apache.org>.
alirezazamani commented on a change in pull request #1548:
URL: https://github.com/apache/helix/pull/1548#discussion_r529011059



##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestTaskErrorMaxRetriesQuotaRelease.java
##########
@@ -0,0 +1,63 @@
+package org.apache.helix.integration.task;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import org.apache.helix.TestHelper;
+import org.apache.helix.task.JobConfig;
+import org.apache.helix.task.JobQueue;
+import org.apache.helix.task.TaskState;
+import org.apache.helix.task.TaskUtil;
+import org.testng.annotations.AfterClass;
+import org.testng.annotations.BeforeClass;
+import org.testng.annotations.Test;
+
+import com.google.common.collect.ImmutableMap;
+
+public class TestTaskErrorMaxRetriesQuotaRelease extends TaskTestBase {
+
+  @BeforeClass
+  public void beforeClass() throws Exception {
+    _numNodes = 1;
+    _numPartitions = 100;
+    super.beforeClass();
+  }
+
+  @AfterClass
+  public void afterClass() throws Exception {
+    super.afterClass();

Review comment:
       Ok, I will remove it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] alirezazamani commented on pull request #1548: Fix targeted job quota calculation for given up tasks

Posted by GitBox <gi...@apache.org>.
alirezazamani commented on pull request #1548:
URL: https://github.com/apache/helix/pull/1548#issuecomment-733950379


   This PR is ready to be merged. 
   
   Final commit message:
   Fix targeted job quota calculation for given up tasks
   
   In this commit, the tasks that should not be retried again, will not occupy 
   any quota. The would avoid the jobs being blocked because of quota 
   usage of given up tasks.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] alirezazamani commented on a change in pull request #1548: Fix targeted job quota calculation for given up tasks

Posted by GitBox <gi...@apache.org>.
alirezazamani commented on a change in pull request #1548:
URL: https://github.com/apache/helix/pull/1548#discussion_r529013432



##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -838,7 +839,7 @@ private static void addCompletedTasks(Set<Integer> set, JobContext ctx, Iterable
    */
   private boolean isTaskNotInTerminalState(TaskPartitionState state) {

Review comment:
       Also here we are trying to minimize the usage of TaskPartitionState.... in this function and move the logic to other functions. If not, the code will be ugly and hard to follow. By just using given up, we mean this task is not reliable by the controller.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] alirezazamani commented on a change in pull request #1548: Fix targeted job quota calculation for given up tasks

Posted by GitBox <gi...@apache.org>.
alirezazamani commented on a change in pull request #1548:
URL: https://github.com/apache/helix/pull/1548#discussion_r529010924



##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestTaskErrorMaxRetriesQuotaRelease.java
##########
@@ -0,0 +1,63 @@
+package org.apache.helix.integration.task;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import org.apache.helix.TestHelper;
+import org.apache.helix.task.JobConfig;
+import org.apache.helix.task.JobQueue;
+import org.apache.helix.task.TaskState;
+import org.apache.helix.task.TaskUtil;
+import org.testng.annotations.AfterClass;
+import org.testng.annotations.BeforeClass;
+import org.testng.annotations.Test;
+
+import com.google.common.collect.ImmutableMap;
+
+public class TestTaskErrorMaxRetriesQuotaRelease extends TaskTestBase {

Review comment:
       I don't think it is possible. It specifically needs a new setup. One instance with a lot of partitions. Adding it to new tests would make other test logic change.

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestTaskErrorMaxRetriesQuotaRelease.java
##########
@@ -0,0 +1,63 @@
+package org.apache.helix.integration.task;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import org.apache.helix.TestHelper;
+import org.apache.helix.task.JobConfig;
+import org.apache.helix.task.JobQueue;
+import org.apache.helix.task.TaskState;
+import org.apache.helix.task.TaskUtil;
+import org.testng.annotations.AfterClass;
+import org.testng.annotations.BeforeClass;
+import org.testng.annotations.Test;
+
+import com.google.common.collect.ImmutableMap;
+
+public class TestTaskErrorMaxRetriesQuotaRelease extends TaskTestBase {

Review comment:
       Basically, I use this theory in my tests. Write a test that fails without this PR. Add the PR and make sure the test will be passed now. The setup to have a failing test is not available in other tests. That is why I need to have a new setup (scenario) for this PR.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] jiajunwang commented on a change in pull request #1548: Fix targeted job quota calculation for given up tasks

Posted by GitBox <gi...@apache.org>.
jiajunwang commented on a change in pull request #1548:
URL: https://github.com/apache/helix/pull/1548#discussion_r528987288



##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestTaskErrorMaxRetriesQuotaRelease.java
##########
@@ -0,0 +1,63 @@
+package org.apache.helix.integration.task;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import org.apache.helix.TestHelper;
+import org.apache.helix.task.JobConfig;
+import org.apache.helix.task.JobQueue;
+import org.apache.helix.task.TaskState;
+import org.apache.helix.task.TaskUtil;
+import org.testng.annotations.AfterClass;
+import org.testng.annotations.BeforeClass;
+import org.testng.annotations.Test;
+
+import com.google.common.collect.ImmutableMap;
+
+public class TestTaskErrorMaxRetriesQuotaRelease extends TaskTestBase {
+
+  @BeforeClass
+  public void beforeClass() throws Exception {
+    _numNodes = 1;
+    _numPartitions = 100;
+    super.beforeClass();
+  }
+
+  @AfterClass
+  public void afterClass() throws Exception {
+    super.afterClass();

Review comment:
       This is not necessary. If you don't have any additional logic, then you can remove the method in the child class. And the parent class afterClass() will be triggered.

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestTaskErrorMaxRetriesQuotaRelease.java
##########
@@ -0,0 +1,63 @@
+package org.apache.helix.integration.task;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import org.apache.helix.TestHelper;
+import org.apache.helix.task.JobConfig;
+import org.apache.helix.task.JobQueue;
+import org.apache.helix.task.TaskState;
+import org.apache.helix.task.TaskUtil;
+import org.testng.annotations.AfterClass;
+import org.testng.annotations.BeforeClass;
+import org.testng.annotations.Test;
+
+import com.google.common.collect.ImmutableMap;
+
+public class TestTaskErrorMaxRetriesQuotaRelease extends TaskTestBase {

Review comment:
       Is it possible to add the method to an existing test class such as "TestQuotaBasedScheduling"? Note every new test class will require a new cluster to be recreated and removed. In most cases, it is not necessary.

##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -838,7 +839,7 @@ private static void addCompletedTasks(Set<Integer> set, JobContext ctx, Iterable
    */
   private boolean isTaskNotInTerminalState(TaskPartitionState state) {

Review comment:
       This change also impacts filterTasks(). The logic looks to be valid even after this change.
   The concerning part is that there are some duplicate checks here and there. For example,
   
         // Allow tasks eligible for scheduling
         if (state == null || state == TaskPartitionState.STOPPED
             || state == TaskPartitionState.TIMED_OUT || state == TaskPartitionState.TASK_ERROR
             || state == TaskPartitionState.DROPPED) {
           filteredTasks.add(partitionNumber);
         }
         // Allow tasks whose assigned instances are no longer live for rescheduling
         if (isTaskNotInTerminalState(state)) {
           ...
         }
   
   This section checks for different state combinations and then react differently. First of all, it is hard for the reviewers to follow the logic. Secondly, it is very easy for the coder to introduce bugs since the combinations are separately done in different private methods.
   Is it possible that we conclude all the states check within 3 or 4 methods? Or we need to clean up the state. The current design is not reviewable. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] alirezazamani merged pull request #1548: Fix targeted job quota calculation for given up tasks

Posted by GitBox <gi...@apache.org>.
alirezazamani merged pull request #1548:
URL: https://github.com/apache/helix/pull/1548


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] alirezazamani commented on a change in pull request #1548: Fix targeted job quota calculation for given up tasks

Posted by GitBox <gi...@apache.org>.
alirezazamani commented on a change in pull request #1548:
URL: https://github.com/apache/helix/pull/1548#discussion_r529018503



##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -838,7 +839,7 @@ private static void addCompletedTasks(Set<Integer> set, JobContext ctx, Iterable
    */
   private boolean isTaskNotInTerminalState(TaskPartitionState state) {

Review comment:
       One more thing to add. I think the designer of this method (I was not involved in designing this method BTW) used the variable `partitionsToRetryOnLiveInstanceChangeForTargetedJob` to give some hint to the reader about what are these tasks and why we are adding them back. Hence, I think keeping it as it is still the best solution.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] alirezazamani commented on a change in pull request #1548: Fix targeted job quota calculation for given up tasks

Posted by GitBox <gi...@apache.org>.
alirezazamani commented on a change in pull request #1548:
URL: https://github.com/apache/helix/pull/1548#discussion_r529010453



##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -838,7 +839,7 @@ private static void addCompletedTasks(Set<Integer> set, JobContext ctx, Iterable
    */
   private boolean isTaskNotInTerminalState(TaskPartitionState state) {

Review comment:
       We need to note that this no design. This is a bug which is introduced before and is impacting production result. I tried differently but I believe this is the best way.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] alirezazamani commented on a change in pull request #1548: Fix targeted job quota calculation for given up tasks

Posted by GitBox <gi...@apache.org>.
alirezazamani commented on a change in pull request #1548:
URL: https://github.com/apache/helix/pull/1548#discussion_r529046942



##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestTaskErrorMaxRetriesQuotaRelease.java
##########
@@ -0,0 +1,63 @@
+package org.apache.helix.integration.task;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import org.apache.helix.TestHelper;
+import org.apache.helix.task.JobConfig;
+import org.apache.helix.task.JobQueue;
+import org.apache.helix.task.TaskState;
+import org.apache.helix.task.TaskUtil;
+import org.testng.annotations.AfterClass;
+import org.testng.annotations.BeforeClass;
+import org.testng.annotations.Test;
+
+import com.google.common.collect.ImmutableMap;
+
+public class TestTaskErrorMaxRetriesQuotaRelease extends TaskTestBase {

Review comment:
       Move to another exiting classes.

##########
File path: helix-core/src/test/java/org/apache/helix/integration/task/TestTaskErrorMaxRetriesQuotaRelease.java
##########
@@ -0,0 +1,63 @@
+package org.apache.helix.integration.task;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+import org.apache.helix.TestHelper;
+import org.apache.helix.task.JobConfig;
+import org.apache.helix.task.JobQueue;
+import org.apache.helix.task.TaskState;
+import org.apache.helix.task.TaskUtil;
+import org.testng.annotations.AfterClass;
+import org.testng.annotations.BeforeClass;
+import org.testng.annotations.Test;
+
+import com.google.common.collect.ImmutableMap;
+
+public class TestTaskErrorMaxRetriesQuotaRelease extends TaskTestBase {

Review comment:
       Move to another existing class.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] alirezazamani commented on a change in pull request #1548: Fix targeted job quota calculation for given up tasks

Posted by GitBox <gi...@apache.org>.
alirezazamani commented on a change in pull request #1548:
URL: https://github.com/apache/helix/pull/1548#discussion_r529010453



##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -838,7 +839,7 @@ private static void addCompletedTasks(Set<Integer> set, JobContext ctx, Iterable
    */
   private boolean isTaskNotInTerminalState(TaskPartitionState state) {

Review comment:
       We need to note that this is not design. This is a bug which is introduced before and is impacting the production result. This PR is to fix the bug. I tried differently but I believe this is the best way.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org


[GitHub] [helix] alirezazamani commented on a change in pull request #1548: Fix targeted job quota calculation for given up tasks

Posted by GitBox <gi...@apache.org>.
alirezazamani commented on a change in pull request #1548:
URL: https://github.com/apache/helix/pull/1548#discussion_r529010453



##########
File path: helix-core/src/main/java/org/apache/helix/task/AbstractTaskDispatcher.java
##########
@@ -838,7 +839,7 @@ private static void addCompletedTasks(Set<Integer> set, JobContext ctx, Iterable
    */
   private boolean isTaskNotInTerminalState(TaskPartitionState state) {

Review comment:
       We need to note that this is not design. This is a bug which is introduced before and is impacting the production result. This PR is to fix the bug. I tried differently but I believe this is the best way. Please also note that that filterTask will be used for all of the jobs. So it is generic. This change only affects the targeted jobs.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@helix.apache.org
For additional commands, e-mail: reviews-help@helix.apache.org