You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2020/02/20 12:57:32 UTC

[GitHub] [incubator-doris] worker24h opened a new pull request #2958: Auto Resume RoutineLoadJob

worker24h opened a new pull request #2958: Auto Resume RoutineLoadJob
URL: https://github.com/apache/incubator-doris/pull/2958
 
 
   When all backends restart, the routineload job can't RUNNING.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman merged pull request #2958: Auto Resume RoutineLoadJob

Posted by GitBox <gi...@apache.org>.
morningman merged pull request #2958: Auto Resume RoutineLoadJob
URL: https://github.com/apache/incubator-doris/pull/2958
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a change in pull request #2958: Auto Resume RoutineLoadJob

Posted by GitBox <gi...@apache.org>.
morningman commented on a change in pull request #2958: Auto Resume RoutineLoadJob
URL: https://github.com/apache/incubator-doris/pull/2958#discussion_r382584927
 
 

 ##########
 File path: fe/src/main/java/org/apache/doris/common/Config.java
 ##########
 @@ -999,5 +999,16 @@
      */
     @ConfField
     public static boolean check_java_version = true;
+
+    
+    @ConfField(mutable = true)
 
 Review comment:
   ```suggestion
       @ConfField(mutable = true, masterOnly = true)
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a change in pull request #2958: Auto Resume RoutineLoadJob

Posted by GitBox <gi...@apache.org>.
morningman commented on a change in pull request #2958: Auto Resume RoutineLoadJob
URL: https://github.com/apache/incubator-doris/pull/2958#discussion_r382587269
 
 

 ##########
 File path: fe/src/main/java/org/apache/doris/load/routineload/ScheduleRule.java
 ##########
 @@ -0,0 +1,81 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+package org.apache.doris.load.routineload;
+
+import org.apache.doris.catalog.Catalog;
+import org.apache.doris.common.Config;
+import org.apache.doris.common.InternalErrorCode;
+import org.apache.doris.system.SystemInfoService;
+
+public class ScheduleRule {
 
 Review comment:
   Add some comments for this class

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] worker24h commented on a change in pull request #2958: Auto Resume RoutineLoadJob

Posted by GitBox <gi...@apache.org>.
worker24h commented on a change in pull request #2958: Auto Resume RoutineLoadJob
URL: https://github.com/apache/incubator-doris/pull/2958#discussion_r382403361
 
 

 ##########
 File path: fe/src/main/java/org/apache/doris/common/InternalErrorCode.java
 ##########
 @@ -0,0 +1,72 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.common;
+
+public class InternalErrorCode {
+    private long code;
+    private String msg;
+    public static long NORMAL = 0;
 
 Review comment:
   ok

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a change in pull request #2958: Auto Resume RoutineLoadJob

Posted by GitBox <gi...@apache.org>.
morningman commented on a change in pull request #2958: Auto Resume RoutineLoadJob
URL: https://github.com/apache/incubator-doris/pull/2958#discussion_r382067831
 
 

 ##########
 File path: fe/src/main/java/org/apache/doris/common/InternalErrorCode.java
 ##########
 @@ -0,0 +1,72 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.common;
+
+public class InternalErrorCode {
 
 Review comment:
   It is strange to define a "ErrorCode" class with a `msg` in it.
   How about just leave the "code" in `InternalErrorCode`, without the `msg`.
   
   And you can put the `InternalErrorCode` into an `Exception` class.
   
   And in routine load, you can also define a new class named `ErrorReason` with an `InternalErrorCode` and a `msg` in it.
   
   If so, it would be more flexible to use the code everywhere we like.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a change in pull request #2958: Auto Resume RoutineLoadJob

Posted by GitBox <gi...@apache.org>.
morningman commented on a change in pull request #2958: Auto Resume RoutineLoadJob
URL: https://github.com/apache/incubator-doris/pull/2958#discussion_r382052529
 
 

 ##########
 File path: fe/src/main/java/org/apache/doris/common/InternalErrorCode.java
 ##########
 @@ -0,0 +1,72 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.common;
+
+public class InternalErrorCode {
+    private long code;
+    private String msg;
+    public static long NORMAL = 0;
+
+    // for common error
+    public static long IMPOSSIBLE_ERROR_ERR = 1; // 用于不可能出现的错误的地方 出现了一个错误
+    public static long INTERNAL_ERR = 2;
+    public static long REPLICA_FEW_ERR = 3;
+    public static long PARTITIONS_ERR = 4;
+    public static long DB_ERR = 5;
+    public static long TABLE_ERR = 6;
+    public static long META_NOT_FOUND_ERR = 7;
+
+    // for load job error
+    public static long MANUAL_PAUSE_ERR = 100;
+    public static long MANUAL_STOP_ERR = 101;
+    public static long TOO_MANY_FAILURE_ROWS_ERR = 102;
+    public static long CREATE_TASKS_ERR = 103;
+    public static long TASKS_ABORT_ERR = 104;
+
+    public InternalErrorCode()
+    {
 
 Review comment:
   Code style, `{` should be after the last line

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a change in pull request #2958: Auto Resume RoutineLoadJob

Posted by GitBox <gi...@apache.org>.
morningman commented on a change in pull request #2958: Auto Resume RoutineLoadJob
URL: https://github.com/apache/incubator-doris/pull/2958#discussion_r382045606
 
 

 ##########
 File path: fe/src/main/java/org/apache/doris/load/routineload/ScheduleRule.java
 ##########
 @@ -0,0 +1,78 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+package org.apache.doris.load.routineload;
+
+import org.apache.doris.catalog.Catalog;
+import org.apache.doris.common.InternalErrorCode;
+import org.apache.doris.system.SystemInfoService;
+
+public class ScheduleRule {
+
+    private static int aliveBeCount(String clusterName) {
+        SystemInfoService systemInfoService = Catalog.getCurrentSystemInfo();
+        return systemInfoService.getClusterBackendIds(clusterName, true).size();
+    }
+    /**
+     * 是否开启自动调度
+     * 只针对状态是PAUSED JOB 才需要进行判断 其他都返回false
 
 Review comment:
   Better in English

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a change in pull request #2958: Auto Resume RoutineLoadJob

Posted by GitBox <gi...@apache.org>.
morningman commented on a change in pull request #2958: Auto Resume RoutineLoadJob
URL: https://github.com/apache/incubator-doris/pull/2958#discussion_r382585729
 
 

 ##########
 File path: fe/src/main/java/org/apache/doris/common/InternalErrorCode.java
 ##########
 @@ -0,0 +1,72 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.common;
+
+public class InternalErrorCode {
 
 Review comment:
   Why not just using Enum?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a change in pull request #2958: Auto Resume RoutineLoadJob

Posted by GitBox <gi...@apache.org>.
morningman commented on a change in pull request #2958: Auto Resume RoutineLoadJob
URL: https://github.com/apache/incubator-doris/pull/2958#discussion_r382059620
 
 

 ##########
 File path: fe/src/main/java/org/apache/doris/load/routineload/ScheduleRule.java
 ##########
 @@ -0,0 +1,78 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+package org.apache.doris.load.routineload;
+
+import org.apache.doris.catalog.Catalog;
+import org.apache.doris.common.InternalErrorCode;
+import org.apache.doris.system.SystemInfoService;
+
+public class ScheduleRule {
+
+    private static int aliveBeCount(String clusterName) {
+        SystemInfoService systemInfoService = Catalog.getCurrentSystemInfo();
+        return systemInfoService.getClusterBackendIds(clusterName, true).size();
+    }
+    /**
+     * 是否开启自动调度
+     * 只针对状态是PAUSED JOB 才需要进行判断 其他都返回false
+     * @param jobRoutine
+     * @return
+     */
+    public static boolean isNeedAutoSchedule(RoutineLoadJob jobRoutine) {
+        if (jobRoutine.state != RoutineLoadJob.JobState.PAUSED) {
+            return false;
+        }
+        if (jobRoutine.autoResumeLock) {//表示只能手动恢复
+            return false;
+        }
+        /**
+         * 这种错误是 BE挂掉才会出现的场景
+         * 增加计数器、恢复锁定功能主要是为了避免无限制的恢复
+         */
+        if (jobRoutine.pauseReason.getCode() == InternalErrorCode.REPLICA_FEW_ERR) {
+            int alive =  aliveBeCount(jobRoutine.clusterName);
+            if (alive < jobRoutine.replicationNum) {// be存活个数小于最小副本数
 
 Review comment:
   only `replicationNum` of BE alive is not enough. Consider we have 100 BEs in a cluster, only 3 alive is not enough. 
   And the `replicationNum` will because 0 once FE restart or Master changing, cause you don't persist it.
   
   I didn't think about how many BE alive would be more suitable. Consider we have 100 BEs in a cluster,97 is alive, but the job may still fail because all replicas of a table may be on that 3 dead BE.
   
   Maybe we can add a config like `min_alive_be_num_for_auto_resume` ?
   default is 0. And here we judge like: 
   `max(quorum of BE, min_alive_be_num_for_auto_resume)` ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a change in pull request #2958: Auto Resume RoutineLoadJob

Posted by GitBox <gi...@apache.org>.
morningman commented on a change in pull request #2958: Auto Resume RoutineLoadJob
URL: https://github.com/apache/incubator-doris/pull/2958#discussion_r382060571
 
 

 ##########
 File path: fe/src/main/java/org/apache/doris/load/routineload/ScheduleRule.java
 ##########
 @@ -0,0 +1,78 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+package org.apache.doris.load.routineload;
+
+import org.apache.doris.catalog.Catalog;
+import org.apache.doris.common.InternalErrorCode;
+import org.apache.doris.system.SystemInfoService;
+
+public class ScheduleRule {
+
+    private static int aliveBeCount(String clusterName) {
+        SystemInfoService systemInfoService = Catalog.getCurrentSystemInfo();
+        return systemInfoService.getClusterBackendIds(clusterName, true).size();
+    }
+    /**
+     * 是否开启自动调度
+     * 只针对状态是PAUSED JOB 才需要进行判断 其他都返回false
+     * @param jobRoutine
+     * @return
+     */
+    public static boolean isNeedAutoSchedule(RoutineLoadJob jobRoutine) {
+        if (jobRoutine.state != RoutineLoadJob.JobState.PAUSED) {
+            return false;
+        }
+        if (jobRoutine.autoResumeLock) {//表示只能手动恢复
+            return false;
+        }
+        /**
+         * 这种错误是 BE挂掉才会出现的场景
+         * 增加计数器、恢复锁定功能主要是为了避免无限制的恢复
+         */
+        if (jobRoutine.pauseReason.getCode() == InternalErrorCode.REPLICA_FEW_ERR) {
+            int alive =  aliveBeCount(jobRoutine.clusterName);
+            if (alive < jobRoutine.replicationNum) {// be存活个数小于最小副本数
+                return false;
+            }
+            if (jobRoutine.firstResumeTimestamp == 0) {//说明第一次自动恢复
+                jobRoutine.firstResumeTimestamp = System.currentTimeMillis();
+                jobRoutine.autoResumeCount = 1;
+                return true;
+            } else {
+                long current = System.currentTimeMillis();
+                if (current - jobRoutine.firstResumeTimestamp < 5*60*1000) {//在5分钟之内
 
 Review comment:
   Make the `5min` configuration. Add it to `Config.java`.
   And I think we should add another config to disable this `auto resume` feature, in case we've got some bad cases.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a change in pull request #2958: Auto Resume RoutineLoadJob

Posted by GitBox <gi...@apache.org>.
morningman commented on a change in pull request #2958: Auto Resume RoutineLoadJob
URL: https://github.com/apache/incubator-doris/pull/2958#discussion_r382587032
 
 

 ##########
 File path: fe/src/main/java/org/apache/doris/load/routineload/RoutineLoadTaskScheduler.java
 ##########
 @@ -130,10 +131,15 @@ private void scheduleOneTask(RoutineLoadTaskInfo routineLoadTaskInfo) throws Exc
                 // allocate failed, push it back to the queue to wait next scheduling
                 needScheduleTasksQueue.put(routineLoadTaskInfo);
             }
-        } catch (Exception e) {
+        } catch (UserException e) {
+            routineLoadManager.getJob(routineLoadTaskInfo.getJobId()).
+                    updateState(JobState.PAUSED,
+                    new ErrorReason(e.getErrorCode().getCode(), e.getMessage()), false);
+            throw e;
+        }catch (Exception e) {
 
 Review comment:
   ```suggestion
           } catch (Exception e) {
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] worker24h commented on a change in pull request #2958: Auto Resume RoutineLoadJob

Posted by GitBox <gi...@apache.org>.
worker24h commented on a change in pull request #2958: Auto Resume RoutineLoadJob
URL: https://github.com/apache/incubator-doris/pull/2958#discussion_r382403102
 
 

 ##########
 File path: fe/src/main/java/org/apache/doris/load/routineload/ScheduleRule.java
 ##########
 @@ -0,0 +1,78 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+package org.apache.doris.load.routineload;
+
+import org.apache.doris.catalog.Catalog;
+import org.apache.doris.common.InternalErrorCode;
+import org.apache.doris.system.SystemInfoService;
+
+public class ScheduleRule {
+
+    private static int aliveBeCount(String clusterName) {
+        SystemInfoService systemInfoService = Catalog.getCurrentSystemInfo();
+        return systemInfoService.getClusterBackendIds(clusterName, true).size();
+    }
+    /**
+     * 是否开启自动调度
+     * 只针对状态是PAUSED JOB 才需要进行判断 其他都返回false
 
 Review comment:
   ok

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a change in pull request #2958: Auto Resume RoutineLoadJob

Posted by GitBox <gi...@apache.org>.
morningman commented on a change in pull request #2958: Auto Resume RoutineLoadJob
URL: https://github.com/apache/incubator-doris/pull/2958#discussion_r382585022
 
 

 ##########
 File path: fe/src/main/java/org/apache/doris/common/Config.java
 ##########
 @@ -999,5 +999,16 @@
      */
     @ConfField
     public static boolean check_java_version = true;
+
+    
+    @ConfField(mutable = true)
 
 Review comment:
   Also add comment for this config. Better to tell the user the best practice to set its value.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a change in pull request #2958: Auto Resume RoutineLoadJob

Posted by GitBox <gi...@apache.org>.
morningman commented on a change in pull request #2958: Auto Resume RoutineLoadJob
URL: https://github.com/apache/incubator-doris/pull/2958#discussion_r382586161
 
 

 ##########
 File path: fe/src/main/java/org/apache/doris/load/routineload/RoutineLoadJob.java
 ##########
 @@ -173,8 +175,12 @@ public boolean isFinalState() {
     protected int currentTaskConcurrentNum;
     protected RoutineLoadProgress progress;
 
-    protected String pauseReason = "";
-    protected String cancelReason = "";
+    protected long firstResumeTimestamp; //第一次恢复时间戳
 
 Review comment:
   English

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] worker24h commented on a change in pull request #2958: Auto Resume RoutineLoadJob

Posted by GitBox <gi...@apache.org>.
worker24h commented on a change in pull request #2958: Auto Resume RoutineLoadJob
URL: https://github.com/apache/incubator-doris/pull/2958#discussion_r382403312
 
 

 ##########
 File path: fe/src/main/java/org/apache/doris/load/routineload/KafkaRoutineLoadJob.java
 ##########
 @@ -299,7 +306,11 @@ protected boolean unprotectNeedReschedule() throws UserException {
                     return true;
                 }
             }
-        } else {
+        }
 
 Review comment:
   ok

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] worker24h commented on a change in pull request #2958: Auto Resume RoutineLoadJob

Posted by GitBox <gi...@apache.org>.
worker24h commented on a change in pull request #2958: Auto Resume RoutineLoadJob
URL: https://github.com/apache/incubator-doris/pull/2958#discussion_r382403211
 
 

 ##########
 File path: fe/src/main/java/org/apache/doris/load/routineload/ScheduleRule.java
 ##########
 @@ -0,0 +1,78 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+package org.apache.doris.load.routineload;
+
+import org.apache.doris.catalog.Catalog;
+import org.apache.doris.common.InternalErrorCode;
+import org.apache.doris.system.SystemInfoService;
+
+public class ScheduleRule {
+
+    private static int aliveBeCount(String clusterName) {
+        SystemInfoService systemInfoService = Catalog.getCurrentSystemInfo();
+        return systemInfoService.getClusterBackendIds(clusterName, true).size();
+    }
+    /**
+     * 是否开启自动调度
+     * 只针对状态是PAUSED JOB 才需要进行判断 其他都返回false
+     * @param jobRoutine
+     * @return
+     */
+    public static boolean isNeedAutoSchedule(RoutineLoadJob jobRoutine) {
+        if (jobRoutine.state != RoutineLoadJob.JobState.PAUSED) {
+            return false;
+        }
+        if (jobRoutine.autoResumeLock) {//表示只能手动恢复
+            return false;
+        }
+        /**
+         * 这种错误是 BE挂掉才会出现的场景
+         * 增加计数器、恢复锁定功能主要是为了避免无限制的恢复
+         */
+        if (jobRoutine.pauseReason.getCode() == InternalErrorCode.REPLICA_FEW_ERR) {
+            int alive =  aliveBeCount(jobRoutine.clusterName);
+            if (alive < jobRoutine.replicationNum) {// be存活个数小于最小副本数
 
 Review comment:
   Good idea

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a change in pull request #2958: Auto Resume RoutineLoadJob

Posted by GitBox <gi...@apache.org>.
morningman commented on a change in pull request #2958: Auto Resume RoutineLoadJob
URL: https://github.com/apache/incubator-doris/pull/2958#discussion_r382587347
 
 

 ##########
 File path: fe/src/main/java/org/apache/doris/load/routineload/ScheduleRule.java
 ##########
 @@ -0,0 +1,81 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+package org.apache.doris.load.routineload;
+
+import org.apache.doris.catalog.Catalog;
+import org.apache.doris.common.Config;
+import org.apache.doris.common.InternalErrorCode;
+import org.apache.doris.system.SystemInfoService;
+
+public class ScheduleRule {
+
+    private static int aliveBeCount(String clusterName) {
+        SystemInfoService systemInfoService = Catalog.getCurrentSystemInfo();
+        return systemInfoService.getClusterBackendIds(clusterName, true).size();
+    }
+
+    /**
+     * check if RoutineLoadJob is auto schedule
+     * @param jobRoutine
+     * @return
+     */
+    public static boolean isNeedAutoSchedule(RoutineLoadJob jobRoutine) {
+        if (jobRoutine.state != RoutineLoadJob.JobState.PAUSED) {
+            return false;
+        }
+        if (jobRoutine.autoResumeLock) {//only manual resume for unlock
+            return false;
+        }
+        /**
+         * 这种错误是 BE挂掉才会出现的场景
 
 Review comment:
   English

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] worker24h commented on a change in pull request #2958: Auto Resume RoutineLoadJob

Posted by GitBox <gi...@apache.org>.
worker24h commented on a change in pull request #2958: Auto Resume RoutineLoadJob
URL: https://github.com/apache/incubator-doris/pull/2958#discussion_r382403269
 
 

 ##########
 File path: fe/src/main/java/org/apache/doris/load/routineload/ScheduleRule.java
 ##########
 @@ -0,0 +1,78 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+package org.apache.doris.load.routineload;
+
+import org.apache.doris.catalog.Catalog;
+import org.apache.doris.common.InternalErrorCode;
+import org.apache.doris.system.SystemInfoService;
+
+public class ScheduleRule {
+
+    private static int aliveBeCount(String clusterName) {
+        SystemInfoService systemInfoService = Catalog.getCurrentSystemInfo();
+        return systemInfoService.getClusterBackendIds(clusterName, true).size();
+    }
+    /**
+     * 是否开启自动调度
+     * 只针对状态是PAUSED JOB 才需要进行判断 其他都返回false
+     * @param jobRoutine
+     * @return
+     */
+    public static boolean isNeedAutoSchedule(RoutineLoadJob jobRoutine) {
+        if (jobRoutine.state != RoutineLoadJob.JobState.PAUSED) {
+            return false;
+        }
+        if (jobRoutine.autoResumeLock) {//表示只能手动恢复
+            return false;
+        }
+        /**
+         * 这种错误是 BE挂掉才会出现的场景
+         * 增加计数器、恢复锁定功能主要是为了避免无限制的恢复
+         */
+        if (jobRoutine.pauseReason.getCode() == InternalErrorCode.REPLICA_FEW_ERR) {
+            int alive =  aliveBeCount(jobRoutine.clusterName);
+            if (alive < jobRoutine.replicationNum) {// be存活个数小于最小副本数
+                return false;
+            }
+            if (jobRoutine.firstResumeTimestamp == 0) {//说明第一次自动恢复
+                jobRoutine.firstResumeTimestamp = System.currentTimeMillis();
+                jobRoutine.autoResumeCount = 1;
+                return true;
+            } else {
+                long current = System.currentTimeMillis();
+                if (current - jobRoutine.firstResumeTimestamp < 5*60*1000) {//在5分钟之内
 
 Review comment:
   I add a configuration for the time

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] worker24h commented on a change in pull request #2958: Auto Resume RoutineLoadJob

Posted by GitBox <gi...@apache.org>.
worker24h commented on a change in pull request #2958: Auto Resume RoutineLoadJob
URL: https://github.com/apache/incubator-doris/pull/2958#discussion_r382403138
 
 

 ##########
 File path: fe/src/main/java/org/apache/doris/common/InternalErrorCode.java
 ##########
 @@ -0,0 +1,72 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.common;
+
+public class InternalErrorCode {
+    private long code;
+    private String msg;
+    public static long NORMAL = 0;
+
+    // for common error
+    public static long IMPOSSIBLE_ERROR_ERR = 1; // 用于不可能出现的错误的地方 出现了一个错误
+    public static long INTERNAL_ERR = 2;
+    public static long REPLICA_FEW_ERR = 3;
+    public static long PARTITIONS_ERR = 4;
+    public static long DB_ERR = 5;
+    public static long TABLE_ERR = 6;
+    public static long META_NOT_FOUND_ERR = 7;
+
+    // for load job error
+    public static long MANUAL_PAUSE_ERR = 100;
+    public static long MANUAL_STOP_ERR = 101;
+    public static long TOO_MANY_FAILURE_ROWS_ERR = 102;
+    public static long CREATE_TASKS_ERR = 103;
+    public static long TASKS_ABORT_ERR = 104;
+
+    public InternalErrorCode()
+    {
 
 Review comment:
   I changed

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a change in pull request #2958: Auto Resume RoutineLoadJob

Posted by GitBox <gi...@apache.org>.
morningman commented on a change in pull request #2958: Auto Resume RoutineLoadJob
URL: https://github.com/apache/incubator-doris/pull/2958#discussion_r382063037
 
 

 ##########
 File path: fe/src/main/java/org/apache/doris/load/routineload/KafkaRoutineLoadJob.java
 ##########
 @@ -299,7 +306,11 @@ protected boolean unprotectNeedReschedule() throws UserException {
                     return true;
                 }
             }
-        } else {
+        }
 
 Review comment:
   Code style

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on issue #2958: Auto Resume RoutineLoadJob

Posted by GitBox <gi...@apache.org>.
morningman commented on issue #2958: Auto Resume RoutineLoadJob
URL: https://github.com/apache/incubator-doris/pull/2958#issuecomment-593225378
 
 
   When all backends restart, the routine load job can be resumed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a change in pull request #2958: Auto Resume RoutineLoadJob

Posted by GitBox <gi...@apache.org>.
morningman commented on a change in pull request #2958: Auto Resume RoutineLoadJob
URL: https://github.com/apache/incubator-doris/pull/2958#discussion_r382068262
 
 

 ##########
 File path: fe/src/main/java/org/apache/doris/common/InternalErrorCode.java
 ##########
 @@ -0,0 +1,72 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.common;
+
+public class InternalErrorCode {
+    private long code;
+    private String msg;
+    public static long NORMAL = 0;
 
 Review comment:
   ```suggestion
       public static long OK = 0;
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org