You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@zookeeper.apache.org by GitBox <gi...@apache.org> on 2021/07/09 01:48:34 UTC

[GitHub] [zookeeper] qwedsazzcc commented on pull request #1680: ZOOKEEPER-4021: Solve Poll timeout failure caused by POLLNVAL

qwedsazzcc commented on pull request #1680:
URL: https://github.com/apache/zookeeper/pull/1680#issuecomment-876852544


   > Hi @qwedsazzcc,
   > 
   > Thank you for your investigation and contribution. I understand what you are reporting, and imagine that your patch effectively makes the 100% CPU usage situation disappear—but unless I am missing something, it does not address the root cause.
   > 
   > Unfortunately, I have been unable to reproduce the specific sequence you are observing.
   > 
   > According to POSIX, `POLLNVAL` means "file descriptor not open," and while I could imagine some code closing the FD which is being polled, it is difficult to imagine how it would happen more than once and/or lead to a busy loop.
   > 
   > In your experience, is the problem deterministic? Is it "easy" to reproduce? Also: would you have log entries or some other kind of trace which might help figure out the exact conditions?
   > 
   > As for the error condition(s), which should indeed be handled, how about the following:
   > 
   > ```diff
   > --- a/zookeeper-client/zookeeper-client-c/src/mt_adaptor.c
   > +++ b/zookeeper-client/zookeeper-client-c/src/mt_adaptor.c
   > @@ -388,7 +388,7 @@ void *do_io(void *v)
   >          timeout=tv.tv_sec * 1000 + (tv.tv_usec/1000);
   > 
   >          poll(fds,maxfd,timeout);
   > -        if (fd != -1) {
   > +        if (fd != 1 && (fds[1].revents&POLLNVAL) == 0) {
   >              interest=(fds[1].revents&POLLIN)?ZOOKEEPER_READ:0;
   >              interest|=((fds[1].revents&POLLOUT)||(fds[1].revents&POLLHUP))?ZOOKEEPER_WRITE:0;
   >          }
   > ```
   > 
   > Not resetting `interest` should lead to `check_events` accessing the socket and noticing the closed state. Or does such a patch still cause 100% CPU usage?
   > 
   > Cheers, -D
   
   The error log in console is
   
   ```
   2021-07-09 01:31:17,352:5615(0x7fa0f9ffb700):ZOO_ERROR@resolve_hosts@947: getaddrinfo: Name or service not known
   
   2021-07-09 01:31:17,374:5615(0x7fa0f9ffb700):ZOO_ERROR@resolve_hosts@947: getaddrinfo: Name or service not known
   
   2021-07-09 01:31:17,385:5615(0x7fa0f9ffb700):ZOO_ERROR@resolve_hosts@947: getaddrinfo: Name or service not known
   
   2021-07-09 01:31:17,417:5615(0x7fa0f9ffb700):ZOO_ERROR@resolve_hosts@947: getaddrinfo: Name or service not known
   
   2021-07-09 01:31:17,432:5615(0x7fa0f9ffb700):ZOO_ERROR@resolve_hosts@947: getaddrinfo: Name or service not known
   
   2021-07-09 01:31:17,442:5615(0x7fa0f9ffb700):ZOO_ERROR@resolve_hosts@947: getaddrinfo: Name or service not known
   
   2021-07-09 01:31:17,458:5615(0x7fa0f9ffb700):ZOO_ERROR@resolve_hosts@947: getaddrinfo: Name or service not known
   
   2021-07-09 01:31:17,476:5615(0x7fa0f9ffb700):ZOO_ERROR@resolve_hosts@947: getaddrinfo: Name or service not known
   
   2021-07-09 01:31:17,492:5615(0x7fa0f9ffb700):ZOO_ERROR@resolve_hosts@947: getaddrinfo: Name or service not known
   
   2021-07-09 01:31:17,509:5615(0x7fa0f9ffb700):ZOO_ERROR@resolve_hosts@947: getaddrinfo: Name or service not known
   
   2021-07-09 01:31:17,521:5615(0x7fa0f9ffb700):ZOO_ERROR@resolve_hosts@947: getaddrinfo: Name or service not known
   
   2021-07-09 01:31:17,538:5615(0x7fa0f9ffb700):ZOO_ERROR@resolve_hosts@947: getaddrinfo: Name or service not known
   
   2021-07-09 01:31:17,552:5615(0x7fa0f9ffb700):ZOO_ERROR@resolve_hosts@947: getaddrinfo: Name or service not known
   
   2021-07-09 01:31:17,570:5615(0x7fa0f9ffb700):ZOO_ERROR@resolve_hosts@947: getaddrinfo: Name or service not known
   
   2021-07-09 01:31:17,587:5615(0x7fa0f9ffb700):ZOO_ERROR@resolve_hosts@947: getaddrinfo: Name or service not known
   
   2021-07-09 01:31:17,604:5615(0x7fa0f9ffb700):ZOO_ERROR@resolve_hosts@947: getaddrinfo: Name or service not known
   
   2021-07-09 01:31:17,621:5615(0x7fa0f9ffb700):ZOO_ERROR@resolve_hosts@947: getaddrinfo: Name or service not known
   
   2021-07-09 01:31:17,632:5615(0x7fa0f9ffb700):ZOO_ERROR@resolve_hosts@947: getaddrinfo: Name or service not known
   
   2021-07-09 01:31:17,647:5615(0x7fa0f9ffb700):ZOO_ERROR@resolve_hosts@947: getaddrinfo: Name or service not known
   
   2021-07-09 01:31:17,658:5615(0x7fa0f9ffb700):ZOO_ERROR@resolve_hosts@947: getaddrinfo: Name or service not known
   
   2021-07-09 01:31:17,674:5615(0x7fa0f9ffb700):ZOO_ERROR@resolve_hosts@947: getaddrinfo: Name or service not known
   
   2021-07-09 01:31:17,683:5615(0x7fa0f9ffb700):ZOO_ERROR@resolve_hosts@947: getaddrinfo: Name or service not known
   
   2021-07-09 01:31:17,702:5615(0x7fa0f9ffb700):ZOO_ERROR@resolve_hosts@947: getaddrinfo: Name or service not known
   
   2021-07-09 01:31:17,717:5615(0x7fa0f9ffb700):ZOO_ERROR@resolve_hosts@947: getaddrinfo: Name or service not known
   
   2021-07-09 01:31:17,735:5615(0x7fa0f9ffb700):ZOO_ERROR@resolve_hosts@947: getaddrinfo: Name or service not known
   
   2021-07-09 01:31:17,750:5615(0x7fa0f9ffb700):ZOO_ERROR@resolve_hosts@947: getaddrinfo: Name or service not known
   
   2021-07-09 01:31:17,760:5615(0x7fa0f9ffb700):ZOO_ERROR@resolve_hosts@947: getaddrinfo: Name or service not known
   
   2021-07-09 01:31:17,778:5615(0x7fa0f9ffb700):ZOO_ERROR@resolve_hosts@947: getaddrinfo: Name or service not known
   
   2021-07-09 01:31:17,793:5615(0x7fa0f9ffb700):ZOO_ERROR@resolve_hosts@947: getaddrinfo: Name or service not known
   
   2021-07-09 01:31:17,813:5615(0x7fa0f9ffb700):ZOO_ERROR@resolve_hosts@947: getaddrinfo: Name or service not known
   
   2021-07-09 01:31:17,831:5615(0x7fa0f9ffb700):ZOO_ERROR@resolve_hosts@947: getaddrinfo: Name or service not known
   
   2021-07-09 01:31:17,841:5615(0x7fa0f9ffb700):ZOO_ERROR@resolve_hosts@947: getaddrinfo: Name or service not known
   
   2021-07-09 01:31:17,858:5615(0x7fa0f9ffb700):ZOO_ERROR@resolve_hosts@947: getaddrinfo: Name or service not known
   
   2021-07-09 01:31:17,873:5615(0x7fa0f9ffb700):ZOO_ERROR@resolve_hosts@947: getaddrinfo: Name or service not known
   
   2021-07-09 01:31:17,892:5615(0x7fa0f9ffb700):ZOO_ERROR@resolve_hosts@947: getaddrinfo: Name or service not known
   
   2021-07-09 01:31:17,905:5615(0x7fa0f9ffb700):ZOO_ERROR@resolve_hosts@947: getaddrinfo: Name or service not known
   
   2021-07-09 01:31:17,929:5615(0x7fa0f9ffb700):ZOO_ERROR@resolve_hosts@947: getaddrinfo: Name or service not known
   
   2021-07-09 01:31:17,938:5615(0x7fa0f9ffb700):ZOO_ERROR@resolve_hosts@947: getaddrinfo: Name or service not known
   
   ```
   
   It's call getaddrinfo too fast cause hight  cpu use . 
   
   I run zookeeper server in local , and add a dns rule in hosts (127.0.0.1  zookeeper) ,   my process connected zookeeper with  zookeeper:2181 ,  remove the rule in hosts and then stop zookeeper server immediately  .   It's might reproduce .
   
   I used this 
   
   ```
    if (fd != -1 && (fds[1].revents&POLLNVAL) == 0) {
                interest=(fds[1].revents&POLLIN)?ZOOKEEPER_READ:0;
                interest|=((fds[1].revents&POLLOUT)||(fds[1].revents&POLLHUP))?ZOOKEEPER_WRITE:0;
            }
   
   ```
   
   It didn't solve my problem.
   
   
   Thanks for your reply
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@zookeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org