You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2022/04/27 12:32:17 UTC

[GitHub] [incubator-doris] morningman opened a new pull request, #9266: [fix](catalog) fix bug that replica missing version cause query -214 error

morningman opened a new pull request, #9266:
URL: https://github.com/apache/incubator-doris/pull/9266

   # Proposed changes
   
   Issue Number: close #xxx
   
   ## Problem Summary:
   
   Describe the overview of changes.
   
   ## Checklist(Required)
   
   1. Does it affect the original behavior: (Yes/No/I Don't know)
   2. Has unit tests been added: (Yes/No/No Need)
   3. Has document been added or modified: (Yes/No/No Need)
   4. Does it need to update dependencies: (Yes/No)
   5. Are there any changes that cannot be rolled back: (Yes/No)
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] github-actions[bot] commented on pull request #9266: [fix](catalog) fix bug that replica missing version cause query -214 error

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #9266:
URL: https://github.com/apache/incubator-doris/pull/9266#issuecomment-1114845562

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a diff in pull request #9266: [fix](catalog) fix bug that replica missing version cause query -214 error

Posted by GitBox <gi...@apache.org>.
morningman commented on code in PR #9266:
URL: https://github.com/apache/incubator-doris/pull/9266#discussion_r859919542


##########
fe/fe-core/src/main/java/org/apache/doris/transaction/DatabaseTransactionMgr.java:
##########
@@ -857,49 +857,19 @@ public void finishTransaction(long transactionId, Set<Long> errorReplicaIds) thr
                         }
                     }
 
+                    // check success replica number for each tablet.
+                    // a success replica means:
+                    //  1. Not in errorReplicaIds: succeed in both commit and publish phase
+                    //  2. last failed version < 0: is a health replica before
+                    //  3. version catch up: not with a stale version
+                    // Here we only check number, the replica version will be updated in updateCatalogAfterVisible()
                     for (MaterializedIndex index : allIndices) {
                         for (Tablet tablet : index.getTablets()) {
                             int healthReplicaNum = 0;
                             for (Replica replica : tablet.getReplicas()) {
                                 if (!errorReplicaIds.contains(replica.getId())
-                                        && replica.getLastFailedVersion() < 0) {
-                                    // this means the replica is a healthy replica,
-                                    // it is healthy in the past and does not have error in current load
-                                    if (replica.checkVersionCatchUp(partition.getVisibleVersion(), true)) {
-                                        // during rollup, the rollup replica's last failed version < 0,
-                                        // it may be treated as a normal replica.
-                                        // the replica is not failed during commit or publish
-                                        // during upgrade, one replica's last version maybe invalid,
-                                        // has to compare version hash.
-
-                                        // Here we still update the replica's info even if we failed to publish
-                                        // this txn, for the following case:
-                                        // replica A,B,C is successfully committed, but only A is successfully
-                                        // published,
-                                        // B and C is crashed, now we need a Clone task to repair this tablet.
-                                        // So, here we update A's version info, so that clone task will clone
-                                        // the latest version of data.
-
-                                        replica.updateVersionInfo(partitionCommitInfo.getVersion(),
-                                                replica.getDataSize(), replica.getRowCount());
-                                        ++healthReplicaNum;
-                                    } else {
-                                        // this means the replica has error in the past, but we did not observe it
-                                        // during upgrade, one job maybe in quorum finished state, for example, A,B,C 3 replica
-                                        // A,B 's version is 10, C's version is 10 but C' 10 is abnormal should be rollback
-                                        // then we will detect this and set C's last failed version to 10 and last success version to 11
-                                        // this logic has to be replayed in checkpoint thread
-                                        replica.updateVersionInfo(replica.getVersion(),
-                                                partition.getVisibleVersion(), 
-                                                partitionCommitInfo.getVersion());
-                                        LOG.warn("transaction state {} has error, the replica [{}] not appeared in error replica list "
-                                                + " and its version not equal to partition commit version or commit version - 1"
-                                                + " if its not a upgrade stage, its a fatal error. ", transactionState, replica);
-                                    }
-                                } else if (replica.getVersion() >= partitionCommitInfo.getVersion()) {
-                                    // the replica's version is larger than or equal to current transaction partition's version
-                                    // the replica is normal, then remove it from error replica ids
-                                    errorReplicaIds.remove(replica.getId());
+                                        && replica.getLastFailedVersion() < 0
+                                        && replica.checkVersionCatchUp(partition.getVisibleVersion(), true)) {

Review Comment:
   For reviewer: Remove this part because we will update the replica version in `updateCatalogAfterVisible()`, so no need to update here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman merged pull request #9266: [fix](catalog) fix bug that replica missing version cause query -214 error

Posted by GitBox <gi...@apache.org>.
morningman merged PR #9266:
URL: https://github.com/apache/incubator-doris/pull/9266


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a diff in pull request #9266: [fix](catalog) fix bug that replica missing version cause query -214 error

Posted by GitBox <gi...@apache.org>.
morningman commented on code in PR #9266:
URL: https://github.com/apache/incubator-doris/pull/9266#discussion_r862310804


##########
fe/fe-core/src/main/java/org/apache/doris/transaction/DatabaseTransactionMgr.java:
##########
@@ -857,49 +857,19 @@ public void finishTransaction(long transactionId, Set<Long> errorReplicaIds) thr
                         }
                     }
 
+                    // check success replica number for each tablet.
+                    // a success replica means:
+                    //  1. Not in errorReplicaIds: succeed in both commit and publish phase
+                    //  2. last failed version < 0: is a health replica before
+                    //  3. version catch up: not with a stale version
+                    // Here we only check number, the replica version will be updated in updateCatalogAfterVisible()
                     for (MaterializedIndex index : allIndices) {
                         for (Tablet tablet : index.getTablets()) {
                             int healthReplicaNum = 0;
                             for (Replica replica : tablet.getReplicas()) {
                                 if (!errorReplicaIds.contains(replica.getId())
-                                        && replica.getLastFailedVersion() < 0) {
-                                    // this means the replica is a healthy replica,
-                                    // it is healthy in the past and does not have error in current load
-                                    if (replica.checkVersionCatchUp(partition.getVisibleVersion(), true)) {
-                                        // during rollup, the rollup replica's last failed version < 0,
-                                        // it may be treated as a normal replica.
-                                        // the replica is not failed during commit or publish
-                                        // during upgrade, one replica's last version maybe invalid,
-                                        // has to compare version hash.
-
-                                        // Here we still update the replica's info even if we failed to publish
-                                        // this txn, for the following case:
-                                        // replica A,B,C is successfully committed, but only A is successfully
-                                        // published,
-                                        // B and C is crashed, now we need a Clone task to repair this tablet.
-                                        // So, here we update A's version info, so that clone task will clone
-                                        // the latest version of data.
-
-                                        replica.updateVersionInfo(partitionCommitInfo.getVersion(),
-                                                replica.getDataSize(), replica.getRowCount());
-                                        ++healthReplicaNum;
-                                    } else {
-                                        // this means the replica has error in the past, but we did not observe it
-                                        // during upgrade, one job maybe in quorum finished state, for example, A,B,C 3 replica
-                                        // A,B 's version is 10, C's version is 10 but C' 10 is abnormal should be rollback
-                                        // then we will detect this and set C's last failed version to 10 and last success version to 11
-                                        // this logic has to be replayed in checkpoint thread
-                                        replica.updateVersionInfo(replica.getVersion(),
-                                                partition.getVisibleVersion(), 
-                                                partitionCommitInfo.getVersion());
-                                        LOG.warn("transaction state {} has error, the replica [{}] not appeared in error replica list "
-                                                + " and its version not equal to partition commit version or commit version - 1"
-                                                + " if its not a upgrade stage, its a fatal error. ", transactionState, replica);
-                                    }
-                                } else if (replica.getVersion() >= partitionCommitInfo.getVersion()) {
-                                    // the replica's version is larger than or equal to current transaction partition's version
-                                    // the replica is normal, then remove it from error replica ids

Review Comment:
   I added it back



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yiguolei commented on a diff in pull request #9266: [fix](catalog) fix bug that replica missing version cause query -214 error

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #9266:
URL: https://github.com/apache/incubator-doris/pull/9266#discussion_r860547279


##########
fe/fe-core/src/main/java/org/apache/doris/master/ReportHandler.java:
##########
@@ -831,8 +831,12 @@ private static void handleRecoverTablet(ListMultimap<Long, Long> tabletRecoveryM
                                 // The absolute value is meaningless, as long as it is greater than 0.

Review Comment:
   delete or update the comment



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yiguolei commented on a diff in pull request #9266: [fix](catalog) fix bug that replica missing version cause query -214 error

Posted by GitBox <gi...@apache.org>.
yiguolei commented on code in PR #9266:
URL: https://github.com/apache/incubator-doris/pull/9266#discussion_r860759307


##########
fe/fe-core/src/main/java/org/apache/doris/transaction/DatabaseTransactionMgr.java:
##########
@@ -857,49 +857,19 @@ public void finishTransaction(long transactionId, Set<Long> errorReplicaIds) thr
                         }
                     }
 
+                    // check success replica number for each tablet.
+                    // a success replica means:
+                    //  1. Not in errorReplicaIds: succeed in both commit and publish phase
+                    //  2. last failed version < 0: is a health replica before
+                    //  3. version catch up: not with a stale version
+                    // Here we only check number, the replica version will be updated in updateCatalogAfterVisible()
                     for (MaterializedIndex index : allIndices) {
                         for (Tablet tablet : index.getTablets()) {
                             int healthReplicaNum = 0;
                             for (Replica replica : tablet.getReplicas()) {
                                 if (!errorReplicaIds.contains(replica.getId())
-                                        && replica.getLastFailedVersion() < 0) {
-                                    // this means the replica is a healthy replica,
-                                    // it is healthy in the past and does not have error in current load
-                                    if (replica.checkVersionCatchUp(partition.getVisibleVersion(), true)) {
-                                        // during rollup, the rollup replica's last failed version < 0,
-                                        // it may be treated as a normal replica.
-                                        // the replica is not failed during commit or publish
-                                        // during upgrade, one replica's last version maybe invalid,
-                                        // has to compare version hash.
-
-                                        // Here we still update the replica's info even if we failed to publish
-                                        // this txn, for the following case:
-                                        // replica A,B,C is successfully committed, but only A is successfully
-                                        // published,
-                                        // B and C is crashed, now we need a Clone task to repair this tablet.
-                                        // So, here we update A's version info, so that clone task will clone
-                                        // the latest version of data.
-
-                                        replica.updateVersionInfo(partitionCommitInfo.getVersion(),
-                                                replica.getDataSize(), replica.getRowCount());
-                                        ++healthReplicaNum;
-                                    } else {
-                                        // this means the replica has error in the past, but we did not observe it
-                                        // during upgrade, one job maybe in quorum finished state, for example, A,B,C 3 replica
-                                        // A,B 's version is 10, C's version is 10 but C' 10 is abnormal should be rollback
-                                        // then we will detect this and set C's last failed version to 10 and last success version to 11
-                                        // this logic has to be replayed in checkpoint thread
-                                        replica.updateVersionInfo(replica.getVersion(),
-                                                partition.getVisibleVersion(), 
-                                                partitionCommitInfo.getVersion());
-                                        LOG.warn("transaction state {} has error, the replica [{}] not appeared in error replica list "
-                                                + " and its version not equal to partition commit version or commit version - 1"
-                                                + " if its not a upgrade stage, its a fatal error. ", transactionState, replica);
-                                    }
-                                } else if (replica.getVersion() >= partitionCommitInfo.getVersion()) {
-                                    // the replica's version is larger than or equal to current transaction partition's version
-                                    // the replica is normal, then remove it from error replica ids

Review Comment:
   This logic is not covered by updateCatalogAfterVisible.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on a diff in pull request #9266: [fix](catalog) fix bug that replica missing version cause query -214 error

Posted by GitBox <gi...@apache.org>.
morningman commented on code in PR #9266:
URL: https://github.com/apache/incubator-doris/pull/9266#discussion_r861540716


##########
fe/fe-core/src/main/java/org/apache/doris/master/ReportHandler.java:
##########
@@ -831,8 +831,12 @@ private static void handleRecoverTablet(ListMultimap<Long, Long> tabletRecoveryM
                                 // The absolute value is meaningless, as long as it is greater than 0.

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] github-actions[bot] commented on pull request #9266: [fix](catalog) fix bug that replica missing version cause query -214 error

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #9266:
URL: https://github.com/apache/incubator-doris/pull/9266#issuecomment-1114845662

   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org