You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2021/02/07 05:37:56 UTC

[GitHub] [pulsar] flowchartsman opened a new pull request #9513: Cache producers with sync.Map

flowchartsman opened a new pull request #9513:
URL: https://github.com/apache/pulsar/pull/9513


   This is an ideal use-case for sync.Map as producers (should) only ever
   be written once, but will be retrieved many times.
   
   fixes 9511
   
   <!--
   ### Contribution Checklist
     
     - Name the pull request in the form "[Issue XYZ][component] Title of the pull request", where *XYZ* should be replaced by the actual issue number.
       Skip *Issue XYZ* if there is no associated github issue for this pull request.
       Skip *component* if you are unsure about which is the best component. E.g. `[docs] Fix typo in produce method`.
   
     - Fill out the template below to describe the changes contributed by the pull request. That will give reviewers the context they need to do the review.
     
     - Each pull request should address only one issue, not mix up code from multiple issues.
     
     - Each commit in the pull request has a meaningful commit message
   
     - Once all items of the checklist are addressed, remove the above text and this checklist, leaving only the filled out template below.
   
   **(The sections below can be removed for hotfixes of typos)**
   -->
   
   *(If this PR fixes a github issue, please add `Fixes #<xyz>`.)*
   
   Fixes #<xyz>
   
   *(or if this PR is one task of a github issue, please add `Master Issue: #<xyz>` to link to the master issue.)*
   
   Master Issue: #<xyz>
   
   ### Motivation
   
   
   *Explain here the context, and why you're making that change. What is the problem you're trying to solve.*
   
   ### Modifications
   
   *Describe the modifications you've done.*
   
   ### Verifying this change
   
   - [ ] Make sure that the change passes the CI checks.
   
   *(Please pick either of the following options)*
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This change is already covered by existing tests, such as *(please describe tests)*.
   
   *(or)*
   
   This change added tests and can be verified as follows:
   
   *(example:)*
     - *Added integration tests for end-to-end deployment with large payloads (10MB)*
     - *Extended integration test for recovery after broker failure*
   
   ### Does this pull request potentially affect one of the following parts:
   
   *If `yes` was chosen, please highlight the changes*
   
     - Dependencies (does it add or upgrade a dependency): (yes / no)
     - The public API: (yes / no)
     - The schema: (yes / no / don't know)
     - The default values of configurations: (yes / no)
     - The wire protocol: (yes / no)
     - The rest endpoints: (yes / no)
     - The admin cli options: (yes / no)
     - Anything that affects deployment: (yes / no / don't know)
   
   ### Documentation
   
     - Does this pull request introduce a new feature? (yes / no)
     - If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)
     - If a feature is not applicable for documentation, explain why?
     - If a feature is not documented yet in this PR, please create a followup issue for adding the documentation
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] flowchartsman commented on pull request #9513: [Go Functions] Cache producers with sync.Map

Posted by GitBox <gi...@apache.org>.
flowchartsman commented on pull request #9513:
URL: https://github.com/apache/pulsar/pull/9513#issuecomment-774775330


   From looking at the code, it looks as if the intent of the SDK is to simply just error out if there is a producer error, whereupon the function will be restarted. Is there a concern with continuing to do it this way?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] flowchartsman commented on pull request #9513: [Go Functions] Cache producers with sync.Map

Posted by GitBox <gi...@apache.org>.
flowchartsman commented on pull request #9513:
URL: https://github.com/apache/pulsar/pull/9513#issuecomment-774745421


   Hmm, what are you asking about? 
   
   - If you're asking about closing the producer when the pfunc exits, you're right; I need to add that.
   - If you're asking about a producer failing to send, I don't think the sdk handles that, all send failures are fatal, right?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui commented on pull request #9513: [Go Functions] Cache producers with sync.Map

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on pull request #9513:
URL: https://github.com/apache/pulsar/pull/9513#issuecomment-788964996


   @freeznet @wolfstudy Could you please help take a look at this PR?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] flowchartsman edited a comment on pull request #9513: [Go Functions] Cache producers with sync.Map

Posted by GitBox <gi...@apache.org>.
flowchartsman edited a comment on pull request #9513:
URL: https://github.com/apache/pulsar/pull/9513#issuecomment-774745421


   Hmm, what are you asking about? 
   
   - If you're asking about closing the producer when the pfunc exits, you're right; I need to add that.
   - If you're asking about a producer failing to send and reconnecting, I don't think the sdk handles that, all send failures are fatal, right?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] flowchartsman edited a comment on pull request #9513: [Go Functions] Cache producers with sync.Map

Posted by GitBox <gi...@apache.org>.
flowchartsman edited a comment on pull request #9513:
URL: https://github.com/apache/pulsar/pull/9513#issuecomment-782183735


   > What I am really concerned is, context.outputMessage() might return user a dead producer, and user cannot recreate it because it is cached.
   
   My proposal is to deprecate `context.OutputMesage()`, disallowing direct access to producers entirely. Alternatively, it could be replaced with `context.NewProducer()`, which does not use the cache at all, providing the best of both worlds.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] flowchartsman commented on pull request #9513: [Go Functions] Cache producers with sync.Map

Posted by GitBox <gi...@apache.org>.
flowchartsman commented on pull request #9513:
URL: https://github.com/apache/pulsar/pull/9513#issuecomment-774750189


   I agree that overall the SDK needs more tests. As near as I can tell, it does not test producer creation or closing at all and there are no integration tests whatsoever.  Is it tested outside of the Go code? If so, I will need assistance. If not, adding tests to the SDK for pfunc lifecycle is kind of a big job.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] flowchartsman commented on pull request #9513: [Go Functions] Cache producers with sync.Map

Posted by GitBox <gi...@apache.org>.
flowchartsman commented on pull request #9513:
URL: https://github.com/apache/pulsar/pull/9513#issuecomment-781516820


   > Just as you said, all send failures are fatal, but this is limited within `goInstance`’s `producer`. For user who get `producer` from `context.outputMessage()`, it might different.
   
   It depends on what the goal of the SDK is. If you look at the [python pfunc SDK example](https://pulsar.apache.org/docs/en/functions-overview/#content-based-routing-example), it is just `context.publish(topic, item)`  If that is the experience you are after, then, it seems best to keep it simple and consistent.  Given the way the Go pfunc SDK is structured, this seems like the option more in keeping with the intent.
   
   > A relatively simple but not very mature idea is: we have a timing task in the background to process these resources regularly, for example: 30 minutes
   
   Are there any objections to this idea? It makes sense to me.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] flowchartsman commented on pull request #9513: [Go Functions] Cache producers with sync.Map

Posted by GitBox <gi...@apache.org>.
flowchartsman commented on pull request #9513:
URL: https://github.com/apache/pulsar/pull/9513#issuecomment-782289489


   So perhaps it might be best to close this issue and then integrate the caching/upkeep with #9515, which could also take care of creating `context.NewProducer()` (without cache) and then aliasing `context.NewOutputMessage()` to this, with a deprecation warning. Thoughts, @freeznet, @wolfstudy?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] freeznet commented on pull request #9513: [Go Functions] Cache producers with sync.Map

Posted by GitBox <gi...@apache.org>.
freeznet commented on pull request #9513:
URL: https://github.com/apache/pulsar/pull/9513#issuecomment-775649448


   Just as you said, all send failures are fatal, but this is limited within `goInstance`’s `producer`. For user who get `producer` from `context.outputMessage()`, it might different. Before this pr, function user will always get a new and ready-to-use producer, because if we create producer with error, function crush (I also don’t think this is the correct way, but should been raised up as another issue separately). So user dont have to worry wether the producer is closed or non-existed. but in this pr, for some reason, i assume that user will get a closed producer from cache, and when user try to send message from a closed or non-existed producer, which might have some additional issue, like function been blocked with closed producer. Or user self cannot managed to get a new producer if the cached one is closed. I m not sure if I was over considered this issue, so @sijie @wolfstudy PTAL if you have time, thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] flowchartsman edited a comment on pull request #9513: [Go Functions] Cache producers with sync.Map

Posted by GitBox <gi...@apache.org>.
flowchartsman edited a comment on pull request #9513:
URL: https://github.com/apache/pulsar/pull/9513#issuecomment-774775330


   @freeznet From looking at the code, it looks as if the intent of the SDK is to simply just error out if there is a producer error, whereupon the function will be restarted. Is there a concern with continuing to do it this way?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] flowchartsman commented on pull request #9513: [Go Functions] Cache producers with sync.Map

Posted by GitBox <gi...@apache.org>.
flowchartsman commented on pull request #9513:
URL: https://github.com/apache/pulsar/pull/9513#issuecomment-782183735


   > What I am really concerned is, context.outputMessage() might return user a dead producer, and user cannot recreate it because it is cached.
   
   My proposal is to deprecate `context.OutputMesage()` entirely, disallowing direct access to producers entirely. Alternatively, it could be replaced with `context.NewProducer()`, which does not use the cache at all, providing the best of both worlds.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] freeznet commented on pull request #9513: [Go Functions] Cache producers with sync.Map

Posted by GitBox <gi...@apache.org>.
freeznet commented on pull request #9513:
URL: https://github.com/apache/pulsar/pull/9513#issuecomment-774688710


   hi @flowchartsman , thanks for your work on this issue. the `sync.Map` is LGTM, but seems not handling if cached producer been closed, do you think a cleanup work should be added when producer is no longer available? also, we should add some tests to verify this assumption.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] flowchartsman edited a comment on pull request #9513: [Go Functions] Cache producers with sync.Map

Posted by GitBox <gi...@apache.org>.
flowchartsman edited a comment on pull request #9513:
URL: https://github.com/apache/pulsar/pull/9513#issuecomment-782183735


   > What I am really concerned is, context.outputMessage() might return user a dead producer, and user cannot recreate it because it is cached.
   
   My proposal on #9515 is to deprecate `context.NewOutputMesage()`, disallowing direct access to producers entirely, but since these are separate issues, `context.NewOutputMessage()` could be replaced with `context.NewProducer()`, which does not use cache at all and then the caching issues could be addressed in #9515, thoughts?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] flowchartsman edited a comment on pull request #9513: [Go Functions] Cache producers with sync.Map

Posted by GitBox <gi...@apache.org>.
flowchartsman edited a comment on pull request #9513:
URL: https://github.com/apache/pulsar/pull/9513#issuecomment-782183735


   > What I am really concerned is, context.outputMessage() might return user a dead producer, and user cannot recreate it because it is cached.
   
   _EDIT: got my issues confused on earlier response. Clarification:_
   
   My proposal on #9515 is to deprecate `context.NewOutputMesage()`, disallowing direct access to producers entirely, but since these are separate issues, `context.NewOutputMessage()` could be replaced with `context.NewProducer()`, which does not use cache at all and then the caching issues could be addressed in #9515, thoughts?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui commented on pull request #9513: [Go Functions] Cache producers with sync.Map

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on pull request #9513:
URL: https://github.com/apache/pulsar/pull/9513#issuecomment-1058893363


   The pr had no activity for 30 days, mark with Stale label.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] github-actions[bot] commented on pull request #9513: [Go Functions] Cache producers with sync.Map

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9513:
URL: https://github.com/apache/pulsar/pull/9513#issuecomment-1058894016


   @flowchartsman:Thanks for your contribution. For this PR, do we need to update docs?
   (The [PR template contains info about doc](https://github.com/apache/pulsar/blob/master/.github/PULL_REQUEST_TEMPLATE.md#documentation), which helps others know more about the changes. Can you provide doc-related info in this and future PR descriptions? Thanks)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] freeznet commented on pull request #9513: [Go Functions] Cache producers with sync.Map

Posted by GitBox <gi...@apache.org>.
freeznet commented on pull request #9513:
URL: https://github.com/apache/pulsar/pull/9513#issuecomment-782099580


   
   > It depends on what the goal of the SDK is. If you look at the [python pfunc SDK example](https://pulsar.apache.org/docs/en/functions-overview/#content-based-routing-example), it is just `context.publish(topic, item)` If that is the experience you are after, then, it seems best to keep it simple and consistent. Given the way the Go pfunc SDK is structured, this seems like the option more in keeping with the intent.
   
   What I am really concerned is, `context.outputMessage()` might return user a dead producer, and user cannot recreate it because it is cached. 
   
   such as 
   
   ```
   producer := context.outputMessage("topic")
   producer.Close() // <- user close the producer
   ```
   
   later when user try to get a producer for `topic`, a closed producer will return since it is cached. 
   
   > > A relatively simple but not very mature idea is: we have a timing task in the background to process these resources regularly, for example: 30 minutes
   > Are there any objections to this idea? It makes sense to me.
   
   I agree with this, in addition, this setting should be configureable.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] wolfstudy commented on pull request #9513: [Go Functions] Cache producers with sync.Map

Posted by GitBox <gi...@apache.org>.
wolfstudy commented on pull request #9513:
URL: https://github.com/apache/pulsar/pull/9513#issuecomment-775787911


   But here, a more difficult problem is that you don't know when the producer will be closed. This is determined by the user. The idea of cache producer is OK, but we need to figure out how to deal with these created producers at the same time. If they are not used for a long time, what should we do at this time?
   
   A relatively simple but not very mature idea is: we have a timing task in the background to process these resources regularly, for example: 30 minutes, if the user does not use them, we will turn off these resources


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org