You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@bookkeeper.apache.org by GitBox <gi...@apache.org> on 2021/01/15 10:24:01 UTC

[GitHub] [bookkeeper] eolivelli opened a new pull request #2538: Reduce noisy log during re-replication in case of down bookie

eolivelli opened a new pull request #2538:
URL: https://github.com/apache/bookkeeper/pull/2538


   ### Motivation
   
   When a bookie is down you can see many error both on command line tools and during replicator work.
   
   ### Changes
   
   Reduce the log level, and save printing useless stacktraces 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] fpj commented on a change in pull request #2538: Fix BookKeeper shell "list bookies" in case of downbookie and reduce noisy log during re-replication in case of down bookie

Posted by GitBox <gi...@apache.org>.
fpj commented on a change in pull request #2538:
URL: https://github.com/apache/bookkeeper/pull/2538#discussion_r558420072



##########
File path: bookkeeper-server/src/main/java/org/apache/bookkeeper/client/DefaultBookieAddressResolver.java
##########
@@ -59,13 +59,13 @@ public BookieSocketAddress resolve(BookieId bookieId) {
                 log.debug("Resolving dummy bookie Id {} using legacy bookie resolver", bookieId);
                 return BookieSocketAddress.resolveDummyBookieId(bookieId);
             }
-            log.info("Cannot resolve {}, bookie is unknown", bookieId, ex);
+            log.debug("Cannot resolve {}, bookie is unknown", bookieId, ex.toString());
             throw new BookieIdNotResolvedException(bookieId, ex);
         } catch (Exception ex) {
             if (ex instanceof InterruptedException) {
                 Thread.currentThread().interrupt();
             }
-            log.info("Cannot resolve {} ", bookieId, ex);
+            log.debug("Cannot resolve {} ", bookieId, ex);

Review comment:
       In this case here, we would be logging messages like the one of line 47 above at debug level, but at the same time we are rethrowing, so it is possible that this log message here is unnecessary altogether.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] eolivelli commented on pull request #2538: Fix BookKeeper shell "list bookies" in case of downbookie and reduce noisy log during re-replication in case of down bookie

Posted by GitBox <gi...@apache.org>.
eolivelli commented on pull request #2538:
URL: https://github.com/apache/bookkeeper/pull/2538#issuecomment-762658262


   @fpj  I have updated the patch as we said yesterday


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] eolivelli commented on a change in pull request #2538: Fix BookKeeper shell "list bookies" in case of downbookie and reduce noisy log during re-replication in case of down bookie

Posted by GitBox <gi...@apache.org>.
eolivelli commented on a change in pull request #2538:
URL: https://github.com/apache/bookkeeper/pull/2538#discussion_r559746916



##########
File path: bookkeeper-server/src/main/java/org/apache/bookkeeper/client/DefaultBookieAddressResolver.java
##########
@@ -59,13 +59,13 @@ public BookieSocketAddress resolve(BookieId bookieId) {
                 log.debug("Resolving dummy bookie Id {} using legacy bookie resolver", bookieId);
                 return BookieSocketAddress.resolveDummyBookieId(bookieId);
             }
-            log.info("Cannot resolve {}, bookie is unknown", bookieId, ex);
+            log.debug("Cannot resolve {}, bookie is unknown", bookieId, ex.toString());
             throw new BookieIdNotResolvedException(bookieId, ex);
         } catch (Exception ex) {
             if (ex instanceof InterruptedException) {
                 Thread.currentThread().interrupt();
             }
-            log.info("Cannot resolve {} ", bookieId, ex);
+            log.debug("Cannot resolve {} ", bookieId, ex);

Review comment:
       Right. I will remove this line at all.
   Less noise is better while debugging 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] eolivelli commented on a change in pull request #2538: Fix BookKeeper shell "list bookies" in case of downbookie and reduce noisy log during re-replication in case of down bookie

Posted by GitBox <gi...@apache.org>.
eolivelli commented on a change in pull request #2538:
URL: https://github.com/apache/bookkeeper/pull/2538#discussion_r559748526



##########
File path: bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/PerChannelBookieClient.java
##########
@@ -542,7 +542,8 @@ protected ChannelFuture connect() {
         try {
             addr = bookieAddressResolver.resolve(bookieId);
         } catch (BookieAddressResolver.BookieIdNotResolvedException err) {
-            LOG.error("Cannot connect to {} as endpopint resolution failed", bookieId, err);
+            LOG.error("Cannot connect to {} as endpoint resolution failed (probably bookie is down)",

Review comment:
       That exception means that we currently cannot resolve  the is.
   So I am keeping the message, according to your answer.
   Basically in this API we are throwing that Exception instead of returning null.
   
   Beside note: we are not performing blocking accesses to ZK for this operation, we are always reading from an in-memory cache




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] eolivelli merged pull request #2538: Fix BookKeeper shell "list bookies" in case of downbookie and reduce noisy log during re-replication in case of down bookie

Posted by GitBox <gi...@apache.org>.
eolivelli merged pull request #2538:
URL: https://github.com/apache/bookkeeper/pull/2538


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] fpj commented on a change in pull request #2538: Fix BookKeeper shell "list bookies" in case of downbookie and reduce noisy log during re-replication in case of down bookie

Posted by GitBox <gi...@apache.org>.
fpj commented on a change in pull request #2538:
URL: https://github.com/apache/bookkeeper/pull/2538#discussion_r558421599



##########
File path: bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/PerChannelBookieClient.java
##########
@@ -542,7 +542,8 @@ protected ChannelFuture connect() {
         try {
             addr = bookieAddressResolver.resolve(bookieId);
         } catch (BookieAddressResolver.BookieIdNotResolvedException err) {
-            LOG.error("Cannot connect to {} as endpopint resolution failed", bookieId, err);
+            LOG.error("Cannot connect to {} as endpoint resolution failed (probably bookie is down)",

Review comment:
       I'm a bit confused about the `probably bookie is down` part. Why resolving the address requires the bookie to be up? I'm probably missing something basic.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] eolivelli commented on a change in pull request #2538: Fix BookKeeper shell "list bookies" in case of downbookie and reduce noisy log during re-replication in case of down bookie

Posted by GitBox <gi...@apache.org>.
eolivelli commented on a change in pull request #2538:
URL: https://github.com/apache/bookkeeper/pull/2538#discussion_r558426981



##########
File path: bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/PerChannelBookieClient.java
##########
@@ -542,7 +542,8 @@ protected ChannelFuture connect() {
         try {
             addr = bookieAddressResolver.resolve(bookieId);
         } catch (BookieAddressResolver.BookieIdNotResolvedException err) {
-            LOG.error("Cannot connect to {} as endpopint resolution failed", bookieId, err);
+            LOG.error("Cannot connect to {} as endpoint resolution failed (probably bookie is down)",

Review comment:
       this "resolution" is about looking up the "current" Bookie network address on ZooKeeper, it is not DNS resolution.
   
   When the bookie is up, it advertises all of its open endpoints (bookie-rpc,http....) on ZK, in the same znode that we are using to detect if the bookie is writable/readonly, it is an ephemeral znode.
   
   When you want to connect to a bookie, you start from a BookieID (an opaque string), you lookup the current network address on ZK (we have a cache of course) and then you are good to go.
   
   but if the Bookie is down the ephemeral znode does not exist and the client does not know the network address to connect to.
   
   Usually when you see this log line the problem is that you cannot connect to the bookie because you do not know the network address, because it is not advertising it, and this usually happens when the bookie is down. 
   
   for reference:
   http://bookkeeper.apache.org/bps/BP-41-bookieid/
   http://bookkeeper.apache.org/bps/BP-38-bookie-endpoint-discovery/
   
   
   
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] eolivelli commented on a change in pull request #2538: Fix BookKeeper shell "list bookies" in case of downbookie and reduce noisy log during re-replication in case of down bookie

Posted by GitBox <gi...@apache.org>.
eolivelli commented on a change in pull request #2538:
URL: https://github.com/apache/bookkeeper/pull/2538#discussion_r558427458



##########
File path: bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/PerChannelBookieClient.java
##########
@@ -542,7 +542,8 @@ protected ChannelFuture connect() {
         try {
             addr = bookieAddressResolver.resolve(bookieId);
         } catch (BookieAddressResolver.BookieIdNotResolvedException err) {
-            LOG.error("Cannot connect to {} as endpopint resolution failed", bookieId, err);
+            LOG.error("Cannot connect to {} as endpoint resolution failed (probably bookie is down)",

Review comment:
       I can remove it, but I find it quite useful in the day to day support to people looking at this kind of logs




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] fpj commented on a change in pull request #2538: Fix BookKeeper shell "list bookies" in case of downbookie and reduce noisy log during re-replication in case of down bookie

Posted by GitBox <gi...@apache.org>.
fpj commented on a change in pull request #2538:
URL: https://github.com/apache/bookkeeper/pull/2538#discussion_r559727882



##########
File path: bookkeeper-server/src/main/java/org/apache/bookkeeper/client/DefaultBookieAddressResolver.java
##########
@@ -59,13 +59,13 @@ public BookieSocketAddress resolve(BookieId bookieId) {
                 log.debug("Resolving dummy bookie Id {} using legacy bookie resolver", bookieId);
                 return BookieSocketAddress.resolveDummyBookieId(bookieId);
             }
-            log.info("Cannot resolve {}, bookie is unknown", bookieId, ex);
+            log.debug("Cannot resolve {}, bookie is unknown", bookieId, ex.toString());
             throw new BookieIdNotResolvedException(bookieId, ex);
         } catch (Exception ex) {
             if (ex instanceof InterruptedException) {
                 Thread.currentThread().interrupt();
             }
-            log.info("Cannot resolve {} ", bookieId, ex);
+            log.debug("Cannot resolve {} ", bookieId, ex);

Review comment:
       Quick question: since we are re-throwing, should we simply eliminate this log message altogether and let it be logged upstream? 

##########
File path: bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/PerChannelBookieClient.java
##########
@@ -542,7 +542,8 @@ protected ChannelFuture connect() {
         try {
             addr = bookieAddressResolver.resolve(bookieId);
         } catch (BookieAddressResolver.BookieIdNotResolvedException err) {
-            LOG.error("Cannot connect to {} as endpopint resolution failed", bookieId, err);
+            LOG.error("Cannot connect to {} as endpoint resolution failed (probably bookie is down)",

Review comment:
       Could this be a ZK error too, like ZK is down and it can't talk to ZK? If so, then perhaps add it to the message. Otherwise, it is fine to leave it there.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] fpj commented on a change in pull request #2538: Fix BookKeeper shell "list bookies" in case of downbookie and reduce noisy log during re-replication in case of down bookie

Posted by GitBox <gi...@apache.org>.
fpj commented on a change in pull request #2538:
URL: https://github.com/apache/bookkeeper/pull/2538#discussion_r558251442



##########
File path: bookkeeper-server/src/main/java/org/apache/bookkeeper/client/DefaultBookieAddressResolver.java
##########
@@ -59,13 +59,13 @@ public BookieSocketAddress resolve(BookieId bookieId) {
                 log.debug("Resolving dummy bookie Id {} using legacy bookie resolver", bookieId);
                 return BookieSocketAddress.resolveDummyBookieId(bookieId);
             }
-            log.info("Cannot resolve {}, bookie is unknown", bookieId, ex);
+            log.debug("Cannot resolve {}, bookie is unknown", bookieId, ex.toString());

Review comment:
       It is important to know of an address resolution error, I'd not make this `debug`, I'd say it is at least `info` if not `warn`.
   
   Also, removing the stack traces certainly makes sense for the shell, but they might be useful for general debugging.

##########
File path: bookkeeper-server/src/main/java/org/apache/bookkeeper/client/DefaultBookieAddressResolver.java
##########
@@ -59,13 +59,13 @@ public BookieSocketAddress resolve(BookieId bookieId) {
                 log.debug("Resolving dummy bookie Id {} using legacy bookie resolver", bookieId);
                 return BookieSocketAddress.resolveDummyBookieId(bookieId);
             }
-            log.info("Cannot resolve {}, bookie is unknown", bookieId, ex);
+            log.debug("Cannot resolve {}, bookie is unknown", bookieId, ex.toString());
             throw new BookieIdNotResolvedException(bookieId, ex);
         } catch (Exception ex) {
             if (ex instanceof InterruptedException) {
                 Thread.currentThread().interrupt();
             }
-            log.info("Cannot resolve {} ", bookieId, ex);
+            log.debug("Cannot resolve {} ", bookieId, ex);

Review comment:
       Same here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [bookkeeper] eolivelli commented on a change in pull request #2538: Fix BookKeeper shell "list bookies" in case of downbookie and reduce noisy log during re-replication in case of down bookie

Posted by GitBox <gi...@apache.org>.
eolivelli commented on a change in pull request #2538:
URL: https://github.com/apache/bookkeeper/pull/2538#discussion_r558281325



##########
File path: bookkeeper-server/src/main/java/org/apache/bookkeeper/client/DefaultBookieAddressResolver.java
##########
@@ -59,13 +59,13 @@ public BookieSocketAddress resolve(BookieId bookieId) {
                 log.debug("Resolving dummy bookie Id {} using legacy bookie resolver", bookieId);
                 return BookieSocketAddress.resolveDummyBookieId(bookieId);
             }
-            log.info("Cannot resolve {}, bookie is unknown", bookieId, ex);
+            log.debug("Cannot resolve {}, bookie is unknown", bookieId, ex.toString());

Review comment:
       that was my thought when I first wrote this code, but actually we are appending the exception to the chain of the thrown BookieIdNotResolvedException, so we are going to report this error twice (here and in the caller).
   I can leave it at "info" level, but the stacktrace is redundant this is case

##########
File path: bookkeeper-server/src/main/java/org/apache/bookkeeper/client/DefaultBookieAddressResolver.java
##########
@@ -59,13 +59,13 @@ public BookieSocketAddress resolve(BookieId bookieId) {
                 log.debug("Resolving dummy bookie Id {} using legacy bookie resolver", bookieId);
                 return BookieSocketAddress.resolveDummyBookieId(bookieId);
             }
-            log.info("Cannot resolve {}, bookie is unknown", bookieId, ex);
+            log.debug("Cannot resolve {}, bookie is unknown", bookieId, ex.toString());
             throw new BookieIdNotResolvedException(bookieId, ex);
         } catch (Exception ex) {
             if (ex instanceof InterruptedException) {
                 Thread.currentThread().interrupt();
             }
-            log.info("Cannot resolve {} ", bookieId, ex);
+            log.debug("Cannot resolve {} ", bookieId, ex);

Review comment:
        we are appending the exception to the chain of the thrown BookieIdNotResolvedException, so we are going to report this error twice (here and in the caller).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org