You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "gortiz (via GitHub)" <gi...@apache.org> on 2023/05/31 09:07:01 UTC

[GitHub] [pinot] gortiz opened a new issue, #10823: Flaky test in MultiNodesOfflineClusterIntegrationTest

gortiz opened a new issue, #10823:
URL: https://github.com/apache/pinot/issues/10823

   `MultiNodesOfflineClusterIntegrationTest.testServerHardFailure` shuts down one server and then executes a request, expecting it to fail with `Connection refused`. But the shut down call is not blocking. Specifically, it doesn't wait until the server is stopped. Therefore there is a race condition between the shut down call and the request. 
   
   In case the request is processed by the broker once the server is stopped, then the request fails with error code `BROKER_REQUEST_SEND_ERROR_CODE` and the expected `Connection refused` message. But it can also happen that the request is started to be executed before the broker knows that the server is down, in which case the request can theoretically success or fail with other error messages.
   
   In my executions I wasn't able to make the request to success, but from time to time the request fails with `BROKER_REQUEST_SEND_ERROR_CODE` error code (as expected) but `Connection reset` message. I think I found other messages as well, but I'm not 100% sure. That makes the test fail when in my opinion it shouldn't.
   
   I can see two solutions: The simple and empirically good enough is to expect the error code but do not assert on the message.  The other solution would be to make the shut down call blocking. That would make the test 100% repeatable, but it wouldn't cover all the cases (as we would never check what happens when the server is killed in the middle of a call).
   
   This issue was discovered while working on #10528


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on issue #10823: Flaky test in MultiNodesOfflineClusterIntegrationTest

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on issue #10823:
URL: https://github.com/apache/pinot/issues/10823#issuecomment-1577296391

   Trying to understand why shut down is not blocking. In `QueryServer`, when `shutDown()` is called, `_channel` is closed and `sync()` is called which should block it until `_channel` is closed. After `_channel` is closed, broker shouldn't be able to talk to the server. Could this be a bug in `Netty` which causes the channel to be left unclosed even after `sync()` is called?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on issue #10823: Flaky test in MultiNodesOfflineClusterIntegrationTest

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on issue #10823:
URL: https://github.com/apache/pinot/issues/10823#issuecomment-1577287630

   Attached a failed run: https://github.com/apache/pinot/actions/runs/5126437043/jobs/9220922056


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org