You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2020/06/17 04:29:32 UTC

[GitHub] [incubator-doris] gaodayue opened a new issue #3893: [Proposal] remove strict report version check of TabletReport

gaodayue opened a new issue #3893:
URL: https://github.com/apache/incubator-doris/issues/3893


   In a cluster with frequent load activities, FE will ignore most tablet report from BE  because currently it only handle reports whose version >= BE's latest report version (which is increased each time a transaction is published). This can be observed from FE's log, with many 
    logs like `out of date report version 15919277405765 from backend[177969252]. current report version[15919277405766]` in it.
   
   However many system functionalities rely on TabletReport processing to work properly. For example
   1. bad or version miss replica is detected and repaired during TabletReport
   2. storage medium migration decision and action is made based on TabletReport
   3. BE's old transaction is cleared/republished during TabletReport
   
   After reading `ReportHandler.tabletReport`, I think the strict report version check is not required. In fact it's not possible for FE to make decision based on the latest state of BE, because BE's state and report version could still change during FE's processing of tabletReport. **In practice, we have removed the version check on many of our clusters for more than a month, nothing bad has happened.** However, we do record the version of each BE's last report and make sure only report of bigger version is handled.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman closed issue #3893: [Proposal] remove strict report version check of TabletReport

Posted by GitBox <gi...@apache.org>.
morningman closed issue #3893:
URL: https://github.com/apache/incubator-doris/issues/3893


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on issue #3893: [Proposal] remove strict report version check of TabletReport

Posted by GitBox <gi...@apache.org>.
morningman commented on issue #3893:
URL: https://github.com/apache/incubator-doris/issues/3893#issuecomment-646394143


   So I suggest to just remove the line 730, and all things could be go well. 
   
   https://github.com/apache/incubator-doris/blob/1d9fa5071d1bd80582d4148c13e6e8d2d985c9e0/be/src/agent/task_worker_pool.cpp#L720-L732


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] morningman commented on issue #3893: [Proposal] remove strict report version check of TabletReport

Posted by GitBox <gi...@apache.org>.
morningman commented on issue #3893:
URL: https://github.com/apache/incubator-doris/issues/3893#issuecomment-646393889


   The `report version` is mainly used to avoid errors in the following scenarios:
   
   ```
                                             Time Line
   
                                                 +
                   +-------------------------+   |   +--------------------------+
                   | FE: Create Table Thread |   |   | BE: Tablet Report Thread |
                   +-------------------------+   |   +--------------------------+
                                                 |
                                                 |
                                                 |   1. Fetch all tablets currently exist
                                                 |      on BE(which is empty), and ready to
                                                 |      report.
                                                 |
   2. FE send Create Tablet Task to BE.          |
      BE receives the task and create Tablet X.  |
                                                 |
                                                 |
   3. BE finished the Create Tablet Task         |
      and send a "FinishTaskReport" to FE.       |
                                                 |
                                                 |
   4. FE receive the "FinishTaskReport", and     |
      finally, Tablet X takes effect on FE's     |
      meta.                                      |
                                                 |
                                                 |    5. The tablet report arrives, which
                                                 |       contains no tablets.
                                                 |       Therefore, the FE will think that
                                                 |       the Tablet X does not exist on the BE,
                                                 |       and will delete the Tablet X informationn
                                                 |       in the metadata, thereby causing information loss.
                                                 v
   ```
   
   But with the report version. In step 1, The report thread when take report version 0.
   And in step 3 and 4, the report version of the BE will be updated to 1.
   So finally in step 5, FE will find that the report version is stale, and ignore that tablet report.
   
   
   The reason why there are so many `out-of-date` reports in the production environment is because we update the report version in some unnecessary places. For example, when the BE processes the publish task, we will also increase the report version of the BE. If the load is very frequent, it will result in a large number of `out-of-date` reports.
   
   It is not necessary to update the report version after the publish task. Because this is actually a problem left over by history. In the reporting logic of the current version, we will no longer decrease the version information of the replica in the FE metadata according to the report. So even if we receive a stale version of the report, it does not matter.
   
   In our test environment (8 BEs), the average `out-of-date` report will occur 60-80 times an hour before, and after the upgrade, it will be reduced to 1-2 times per hour. This will no longer affect the cluster.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org