You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pegasus.apache.org by "littlepangdi (via GitHub)" <gi...@apache.org> on 2023/04/17 05:09:52 UTC

[GitHub] [incubator-pegasus] littlepangdi opened a new pull request, #1444: fix loopForRequest not return and retry forever

littlepangdi opened a new pull request, #1444:
URL: https://github.com/apache/incubator-pegasus/pull/1444

   ### What problem does this PR solve? <!--add issue link with summary if exists-->
   fix #1385 
   
   ### What is changed and how does it work?
   expected behavior: 
   when replica server restart,The client should be able to reconnect after the restart of the node without considering the client configuration update (reselecting the primary replica).
   Actually, client configuration update is another issue we should talk about. I'd like to work on this on another pr.
   In this pr, we focus on why client cannot reconnect with restarted server.
   
   1. SDK use several loop to monitor rpc.conn, when server is closed, `loopForDialing` will continue to retry dialling until connected with server(when server is restart),then,  two new loop `loopForRequest` & `loopForResponse`  will be created to handle request.
   however,  `loopForRequest` will not return when correlative `loopForResponse`  returned because of IsNetworkClosed(EOF), since latter only return nil and will not shutdown tom.  thus, there will be more alive`loopForRequest` than `loopForResponse`  in this case.
   
   2. SDK retried not on timeout err, however, we wrapped timeout err incorrectly in here https://github.com/apache/incubator-pegasus/blob/d16e65cd88b4fb6113d5e6a99234b5be527f28da/go-client/session/session.go#L338 ,thus every request like `multiset` will continue to do retry and never recover.
   
   
   
   
   ### Checklist <!--REMOVE the items that are not applicable-->
   
   ##### Tests <!-- At least one of them must be included. -->
   
   - Unit test
   - Integration test
   - Manual test (add detailed scripts or steps below)
   - No code
   
   ##### Code changes
   
   - Has exported function/method change
   - Has exported variable/fields change
   - Has interface methods change
   - Has persistent data change
   
   ##### Side effects
   
   - Possible performance regression
   - Increased code complexity
   - Breaking backward compatibility
   
   ##### Related changes
   
   - Need to cherry-pick to the release branch
   - Need to update the documentation
   - Need to be included in the release note
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] codecov-commenter commented on pull request #1444: fix loopForRequest not return and retry forever

Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #1444:
URL: https://github.com/apache/incubator-pegasus/pull/1444#issuecomment-1512350617

   ## [Codecov](https://codecov.io/gh/apache/incubator-pegasus/pull/1444?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > :exclamation: No coverage uploaded for pull request base (`master@d16e65c`). [Click here to learn what that means](https://docs.codecov.io/docs/error-reference?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#section-missing-base-commit).
   > The diff coverage is `n/a`.
   
   ```diff
   @@            Coverage Diff            @@
   ##             master    #1444   +/-   ##
   =========================================
     Coverage          ?   53.58%           
   =========================================
     Files             ?       27           
     Lines             ?     2538           
     Branches          ?        0           
   =========================================
     Hits              ?     1360           
     Misses            ?     1132           
     Partials          ?       46           
   ```
   
   
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] littlepangdi commented on pull request #1444: fix loopForRequest not return and retry forever

Posted by "littlepangdi (via GitHub)" <gi...@apache.org>.
littlepangdi commented on PR #1444:
URL: https://github.com/apache/incubator-pegasus/pull/1444#issuecomment-1511152388

   Futhermore, if we remove user-side call of `TableConnector.Close()`, client is working perfectly even when server replica is stopped as if nothing happened. see for this issue #1446 
   ![image](https://user-images.githubusercontent.com/41042306/232468046-a38a2ae4-effe-4e8b-bf73-b0990449013a.png)
   
   log:
   triggers configuration update(ips are blurred):
   ```
   time="2023-04-17T18:47:06+08:00" level=info msg="trigger configuration update of table [CL74485_loki_staging2_324] due to RPC failure [session unresponsive for long] to [xx(replica)]" func="pegasus.(*pegasusTableConnector).tryConfUpdate" file="table_connector.go:753"
   time="2023-04-17T18:47:06+08:00" level=info msg="querying configuration of table(CL74485_loki_staging2_324) [metaList=[xxxx]]" func="session.(*MetaManager).QueryConfig" file="meta_session.go:110"
   time="2023-04-17T18:47:06+08:00" level=info msg="querying configuration of table(CL74485_loki_staging2_324) from [xxxx(meta)]" func="session.(*metaSession).queryConfig" file="meta_session.go:45"
   time="2023-04-17T18:47:06+08:00" level=info msg="create session with [xxxx(replica)]" func=session.newNodeSession file="session.go:141"
   time="2023-04-17T18:47:13+08:00" level=info msg="failed to read response from [xxxx(replica)]: read tcp xxxx:46252->xxxx:15801: read: connection reset by peer" func="session.(*nodeSession).loopForResponse" file="session.go:298"
   time="2023-04-17T18:47:14+08:00" level=info msg="dial to [xxxx(replica)]" func="session.(*nodeSession).dial" file="session.go:199"
   time="2023-04-17T18:47:14+08:00" level=info msg="stop dialing for [xxxx(replica)], connection state: ConnStateReady" func="session.(*nodeSession).dial" file="session.go:211"
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] hycdong merged pull request #1444: fix(go-client): loopForRequest not return and retry forever

Posted by "hycdong (via GitHub)" <gi...@apache.org>.
hycdong merged PR #1444:
URL: https://github.com/apache/incubator-pegasus/pull/1444


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] littlepangdi commented on pull request #1444: fix loopForRequest not return and retry forever

Posted by "littlepangdi (via GitHub)" <gi...@apache.org>.
littlepangdi commented on PR #1444:
URL: https://github.com/apache/incubator-pegasus/pull/1444#issuecomment-1510852759

   During test, we found that there is still timeout exists, thus locate another problem--sdk should not retry on `replica is not primary error` after reconnect with restarted server ip.
   In this case, configuration update should be processed and keep retrying means nothing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org