You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@trafficserver.apache.org by GitBox <gi...@apache.org> on 2021/05/11 23:03:50 UTC

[GitHub] [trafficserver] shinrich opened a new pull request #7809: Save and propagate epoll network error

shinrich opened a new pull request #7809:
URL: https://github.com/apache/trafficserver/pull/7809


   Another approach to PR #7803.  In addition to storing that one of the EPOLLERR bits was set, this code grabs a copy of error sitting on the socket and sends a VC_EVENT_ERROR from the read or write method with the lerrno set appropriately.
   
   With this change, the case where there is no one listening on the origin port and a ICMP or reset is returned is correctly identified as a connection error instead of a failure after the TCP handshake succeeded.
   
   I have been running this code change this afternoon on a production box.
   
   Tracking down correctly handing origin connection errors and marking things down appropriately is discussed in issue #7290
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] shinrich commented on a change in pull request #7809: Save and propagate epoll network error

Posted by GitBox <gi...@apache.org>.
shinrich commented on a change in pull request #7809:
URL: https://github.com/apache/trafficserver/pull/7809#discussion_r631494939



##########
File path: iocore/net/UnixNet.cc
##########
@@ -507,27 +507,24 @@ NetHandler::waitForActivity(ink_hrtime timeout)
       if (cop_list.in(ne)) {
         cop_list.remove(ne);
       }
-      if (get_ev_events(pd, x) & (EVENTIO_READ | EVENTIO_ERROR)) {
+      int flags = get_ev_events(pd, x);

Review comment:
       There is a ink_release_assert later in this function for that case.  I have yet to see it go off.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] SolidWallOfCode commented on a change in pull request #7809: Save and propagate epoll network error

Posted by GitBox <gi...@apache.org>.
SolidWallOfCode commented on a change in pull request #7809:
URL: https://github.com/apache/trafficserver/pull/7809#discussion_r631048475



##########
File path: iocore/net/NetEvent.h
##########
@@ -55,6 +55,9 @@ class NetEvent
   // Close when EventIO close;
   virtual int close() = 0;
 
+  bool has_error() const;
+  void set_error();

Review comment:
       Bikeshed - should be named `set_error_from_socket` - otherwise one wonders where the argument to set the error is.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] ezelkow1 edited a comment on pull request #7809: Save and propagate epoll network error

Posted by GitBox <gi...@apache.org>.
ezelkow1 edited a comment on pull request #7809:
URL: https://github.com/apache/trafficserver/pull/7809#issuecomment-877547680


   @shinrich do you know when we may hit the ink_release_assert that was added here? We recently bumped up our 9.1.x build closer to the head and now we are seeing it hit fairly often on a test situation (just sending fake traffic over to a test box with it, not like an actual specific test run of something)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] SolidWallOfCode commented on a change in pull request #7809: Save and propagate epoll network error

Posted by GitBox <gi...@apache.org>.
SolidWallOfCode commented on a change in pull request #7809:
URL: https://github.com/apache/trafficserver/pull/7809#discussion_r631049858



##########
File path: iocore/net/UnixNet.cc
##########
@@ -507,27 +507,24 @@ NetHandler::waitForActivity(ink_hrtime timeout)
       if (cop_list.in(ne)) {
         cop_list.remove(ne);
       }
-      if (get_ev_events(pd, x) & (EVENTIO_READ | EVENTIO_ERROR)) {
+      int flags = get_ev_events(pd, x);

Review comment:
       Is it possible to get only error bits and not read or write ready bits? It's reasonable to presume there's no `epoll` result unless checking for one of those conditions, so it's possible the checked for condition is always present.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] shinrich commented on pull request #7809: Save and propagate epoll network error

Posted by GitBox <gi...@apache.org>.
shinrich commented on pull request #7809:
URL: https://github.com/apache/trafficserver/pull/7809#issuecomment-879210226


   Oh, nevermind, it looks like we did run into this assert, and we locally added the warning, etc.  Will promote that change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] ezelkow1 commented on pull request #7809: Save and propagate epoll network error

Posted by GitBox <gi...@apache.org>.
ezelkow1 commented on pull request #7809:
URL: https://github.com/apache/trafficserver/pull/7809#issuecomment-877547680


   @shinrich do you know when we may hit the ink_release_assert that was added here? We recently bumped up our 9.1.x build closer to the head and now we are seeing it hit fairly often on a test situation


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] shinrich commented on pull request #7809: Save and propagate epoll network error

Posted by GitBox <gi...@apache.org>.
shinrich commented on pull request #7809:
URL: https://github.com/apache/trafficserver/pull/7809#issuecomment-879208319


   I put the assert in because I didn't think this would happen in reality.  It means the epoll reported an error but no read nor write event.  The event will only be reported up via outstanding read and write events.  So if we hit this assert, the error event was not going to be reported up.
   
   Not clear this is a bad thing.  I'll set up another PR which changes the ink_release_assert to a ink_assert (maybe) and add a Warning message with the epoll information so we can learn more about this case.
   
   We have not run into this assert in our environment.  I think we are running with very similar OS's, so perhaps it is an issue of the details of the communication pattern.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] zwoop commented on pull request #7809: Save and propagate epoll network error

Posted by GitBox <gi...@apache.org>.
zwoop commented on pull request #7809:
URL: https://github.com/apache/trafficserver/pull/7809#issuecomment-848872354


   Cherry-picked to v9.1.x branch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] jrushford commented on a change in pull request #7809: Save and propagate epoll network error

Posted by GitBox <gi...@apache.org>.
jrushford commented on a change in pull request #7809:
URL: https://github.com/apache/trafficserver/pull/7809#discussion_r668879187



##########
File path: iocore/net/UnixNet.cc
##########
@@ -507,27 +507,24 @@ NetHandler::waitForActivity(ink_hrtime timeout)
       if (cop_list.in(ne)) {
         cop_list.remove(ne);
       }
-      if (get_ev_events(pd, x) & (EVENTIO_READ | EVENTIO_ERROR)) {
+      int flags = get_ev_events(pd, x);
+      if (flags & (EVENTIO_ERROR)) {
+        ne->set_error_from_socket();
+      }
+      if (flags & (EVENTIO_READ)) {
         ne->read.triggered = 1;
         if (!read_ready_list.in(ne)) {
           read_ready_list.enqueue(ne);
-        } else if (get_ev_events(pd, x) & EVENTIO_ERROR) {
-          // check for unhandled epoll events that should be handled
-          Debug("iocore_net_main", "Unhandled epoll event on read: 0x%04x read.enabled=%d closed=%d read.netready_queue=%d",
-                get_ev_events(pd, x), ne->read.enabled, ne->closed, read_ready_list.in(ne));
         }
       }
-      if (get_ev_events(pd, x) & (EVENTIO_WRITE | EVENTIO_ERROR)) {
+      if (flags & (EVENTIO_WRITE)) {
         ne->write.triggered = 1;
         if (!write_ready_list.in(ne)) {
           write_ready_list.enqueue(ne);
-        } else if (get_ev_events(pd, x) & EVENTIO_ERROR) {
-          // check for unhandled epoll events that should be handled
-          Debug("iocore_net_main", "Unhandled epoll event on write: 0x%04x write.enabled=%d closed=%d write.netready_queue=%d",
-                get_ev_events(pd, x), ne->write.enabled, ne->closed, write_ready_list.in(ne));
         }
-      } else if (!(get_ev_events(pd, x) & EVENTIO_READ)) {
+      } else if (!(flags & (EVENTIO_READ))) {
         Debug("iocore_net_main", "Unhandled epoll event: 0x%04x", get_ev_events(pd, x));
+        ink_release_assert(false);

Review comment:
       @shinrich we're sending production traffic through 9.1.x and seeing it crash here with the assertion, line 527.  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] shinrich commented on pull request #7809: Save and propagate epoll network error

Posted by GitBox <gi...@apache.org>.
shinrich commented on pull request #7809:
URL: https://github.com/apache/trafficserver/pull/7809#issuecomment-879219866


   Added PR #8058 to address the crash.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@trafficserver.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [trafficserver] shinrich merged pull request #7809: Save and propagate epoll network error

Posted by GitBox <gi...@apache.org>.
shinrich merged pull request #7809:
URL: https://github.com/apache/trafficserver/pull/7809


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org