You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by GitBox <gi...@apache.org> on 2020/08/14 08:36:27 UTC

[GitHub] [hbase] wchevreuil commented on pull request #2255: HBASE-24877 Add option to avoid aborting RS process upon uncaught exc…

wchevreuil commented on pull request #2255:
URL: https://github.com/apache/hbase/pull/2255#issuecomment-673960675


   > What's next if we ignore the exception? We will retry later? Or we will just go on without this replication source? 
   
   As you can see on `ReplicationSource.startup`, it keeps looping until `initialize` succeeds without throwing any uncaught exceptions.
   
   > Users will then find out that the cluster is fine but data has not been replicated out?
   
   It's common practice to verify replication status after a maintenance. 
   
   >  I'm not sure if this is correct way, we fix an issue but introduce another hard to find issue?
   
   It does not fail silently, errors will get logged, and it gives operators the chance to look after what's going wrong without a complete downtime of their source clusters.
   
   >Adding a flag can keep the old behavior but we give users an impression that the exception can be ignored? Still not sure if this is the correct way to fix this... 
   Mind explaining more on your real usage?
   
   We do use some custom replication endpoints that under certain unavailability of some target peer hosts ended up throwing uncaught exception and aborting the source RSes. Sure, there could be improvements on the custom code, and it was an internal infra issue, but with a flag like this, we wouldn't need to face a period of outage at the source.
    


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org