You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2019/04/10 18:40:32 UTC

[GitHub] [pulsar] jerrypeng edited a comment on issue #4012: Adding upsert functionality

jerrypeng edited a comment on issue #4012: Adding upsert functionality
URL: https://github.com/apache/pulsar/pull/4012#issuecomment-481809137
 
 
   @devinbost thanks sharing your use case and the hurdles you are trying to overcome!
   
   In regards to:
   
   > Because our pulsar-admin commands are dockerized to allow them to operate at scale, there is performance overhead with every pulsar-admin command that must be executed.
   
   Is there a reason why you can't just submit/update functions via the REST endpoints instead of using the pulsar-admin CLI from docker containers?  Submitting/Updating functions by just making a HTTP REST call will be a lot faster than start up a docker container every time to execute commands via command line
   
   > In a deployment with 300 Pulsar functions, if each pulsar-admin command must be executed in series (rather than in parallel), executing 300 pulsar-admin commands to update these objects takes 15-25 minutes.
   
   Do you have 300 individual functions or is there a function with 300 instances or a group of functions that total 300 instances?  There will be a huge submission time difference depending on which scenario.  Submitting one function with 300 instances will take much less time that submitting 300 functions with one instance each.
   
   > Because Pulsar is in a broken state while these commands are being executed
   
   What do you mean by this?  The cluster will be running as it should when submitting functions.
   
   > this deployment approach could result in a production Pulsar environment being down for 15-25 minutes, far beyond our SLA of 300 milliseconds of downtime.
   
   In a situation, that somehow your whole pulsar cluster is down and all your functions disappeared, it is unrealistic to expect the downtime to be less that 300 milliseconds.  As you probably already know, starting up a pulsar cluster regardless of functions will take longer than that.  If you are just talking about resubmitting 300 functions,  I am not sure its realistic to expect all the JARs/Packages for 300 functions can be upload in 300 milliseconds.  If you are trying to avoid a situation in which you suffer downtime because a catastrophic event happened to your cluster, i would recommend having redundancy.  Have geo-replicated clusters across multiple regions.  So you can seamlessly cut traffic from your downed cluster to another cluster.
   
   If you have 300 functions, I don't think its going to be the norm for you to need to update all 300 functions.  Its more likely that its going to be a subset of that.
   
   I think functionality you are looking is bulk create, update, or upserts.  You want to bring a cluster from a potentially unknown state into a known consistent state in regards to functions.  I am I understanding you correctly?
   
   While we can add upserts and even bulk upserts.  I would suggest you to try just creating/updating functions directly using the REST endpoint first to see if that is good enough.
   
   I would still very much like to see features like bulk create/update/upserts in Pulsar functions.  I do believe we can accomplish them by just adding/modifying the "front end" code i.e. the REST the endpoints and ComponentImpl.java  to implement the bulk actions.  Please reference the code in registerFunction and updateFunction and when can probably just run that in a loop for bulk actions.
   
   In regards, to this PR and implementing upserts, I think you can just do something like the following in ComponentImpl.java
   
   ```
   if(functionMetaDataManager.containsFunction(tenant, namespace, functionName)) {
      updateFunction(...)
   } else {
      registerFunction(...)
   }
   ```
   
   The caveat in the above logic is that a function can be deleted after "containsFunction" is called. To handle that scenario I would suggest you looking at how the updateFunction code works and basically copy that code and modify it to also allow functions that don't exist to also proceed in the logic.
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services