You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2020/09/21 13:18:58 UTC

[GitHub] [incubator-pinot] adriancole opened a new pull request #6039: WIP: ServiceManager ADD_TABLE role

adriancole opened a new pull request #6039:
URL: https://github.com/apache/incubator-pinot/pull/6039


   This is a work in progress towards #5977
   
   Besides overall review of this sketch, this needs.
   
   [ ] how do we get a reference to the _helixResourceManager?
   [ ] do we need to block until Pinot Broker registers with the controller before installing the tables?
   [ ] can we can stop the controller from listening on a port until these tables are added?
   [ ] add test relative file paths can work
   [ ] add test multiple schemas install in parallel
   [ ] add test health check fails if a schema fails due to missing files, or inability to run addSchema or addTable.
   [ ] figure out where to document this
   [ ] determine if quickStart could or should use this
   
   ## Description
   
   This would allow one or more bootstrap config files that include one or more tables to add.
   
   ex given a file `etc/pinot-backendEntityView.conf`
   ```
   pinot.service.role=ADD_TABLE
   pinot.addTable.schemaFile=./schemas/backendEntityView-schemaFile.json
   pinot.addTable.tableConfigFile=./schemas/backendEntityView-tableConfigFile.json
   ```
   and a file `etc/pinot-rawServiceView.conf`
   ```
   pinot.service.role=ADD_TABLE
   pinot.addTable.schemaFile=./schemas/rawServiceView-schemaFile.json
   pinot.addTable.tableConfigFile=./schemas/rawServiceView-tableConfigFile.json
   ```
   
   you could pass the following to `StartServiceManager`:  `-bootstrapConfigPaths  etc/pinot-controller.conf etc/pinot-broker.conf etc/pinot-server.conf etc/pinot-backendEntityView.conf etc/pinot-backendEntityView.conf`
   
   What will happen in ideal case is the controller starts, the tables install, then the controller is listening. If we can't block the controller from listening until this is done, we can at least make sure health check is not ok.
   
   ## Upgrade Notes
   Does this PR prevent a zero down-time upgrade? **No**
   Does this PR fix a zero-downtime upgrade introduced earlier? **No**
   Does this PR otherwise need attention when creating release notes?
   * [x] Yes (Please label this PR as **<code>release-notes</code>** and complete the section on Release Notes)
   
   ## Release Notes
   TODO
   
   ## Documentation
   TODO
   
   If you have introduced a new feature or configuration, please add it to the documentation as well.
   See https://docs.pinot.apache.org/developers/developers-and-contributors/update-document
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] adriancole commented on pull request #6039: WIP: ServiceManager ADD_TABLE role

Posted by GitBox <gi...@apache.org>.
adriancole commented on pull request #6039:
URL: https://github.com/apache/incubator-pinot/pull/6039#issuecomment-701167620


   had a chat with @Mayank mainly to introduce the use case and things of priority here. for the curious this is for a  "quick start" of hypertrace, which ideally can result in ports listening IFF the tables were added. This could help reduce some blocking loops done externally, and ideally also mitigate some delays using external scripts to orchestrate this.
   
   NEXT ACTION: await more feedback. I'll pickup on monday


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mayankshriv edited a comment on pull request #6039: WIP: ServiceManager ADD_TABLE role

Posted by GitBox <gi...@apache.org>.
mayankshriv edited a comment on pull request #6039:
URL: https://github.com/apache/incubator-pinot/pull/6039#issuecomment-700137273


   If I get the context correctly, it seems we are trying to mix two things together?
   - Status of service being up and running
   - Status of a table being ready to be queriable
   
   (Note, tables can be added at a later point in time, on a previously running cluster).
   
   Is that a fair conclusion? If so, I think we should treat them separately. Perhaps, we should provide an endpoint like `tableReady` that clients can use before starting to make queries.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mayankshriv commented on pull request #6039: WIP: ServiceManager ADD_TABLE role

Posted by GitBox <gi...@apache.org>.
mayankshriv commented on pull request #6039:
URL: https://github.com/apache/incubator-pinot/pull/6039#issuecomment-703762755


   I put some more thought on this, and still unable to convince myself about ServiceManager controlling when the service should announce health status (especially given that the use case is for dev/test env). I feel a cleaner way is for client side to check for something like /tableReady or /tableHealthy (that could be added).
   
   Perhaps this is because I still see ServiceManager as a utility to setup the cluster (and not an integral part of the service itself). We can solicit inputs from other folks as well: @mcvsubbu @siddharthteotia @kishoreg 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mayankshriv commented on pull request #6039: WIP: ServiceManager ADD_TABLE role

Posted by GitBox <gi...@apache.org>.
mayankshriv commented on pull request #6039:
URL: https://github.com/apache/incubator-pinot/pull/6039#issuecomment-700412009


   > @mayankshriv in the case of ServiceManager, if I were to add tables as "bootstrap" I would expect those installed before things are healthy, intuitively. This is without regards to later table adds (which would not be bootstrap). Make sense?
   
   Yes, I agree with that expectation. Perhaps I don't have a clear picture of the overall solution (since the PR says WIP). So I have a few questions:
   - I take it that the usage model is only for dev/test environment, and not really being proposed for production. Is this accurate?
   
   - Does the proposal involve making changes to the service (to not return healthy status, or something similar), until `ServiceManager` has successfully bootstrapped the tables? If not, great. But if so, this is the part I am having trouble with, as I do not think it is right for a dev/test utility to govern what defines `healthy` status of the service. Also, it opens up questions like, is the service healthy once table is created, or data is uploaded, or data is actually ONLINE in external view to be served (and if so, what happens if one of the segments fails to load).
   
   We have similar issue in our integration test, where we have to wait until the table is bootstrapped, before running test queries (eg `OfflineClusterIntegrationTest`). There, the hack is to wait until `count(*)` returns expected count (not necessarily recommending this, just pointing out).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mayankshriv commented on pull request #6039: WIP: ServiceManager ADD_TABLE role

Posted by GitBox <gi...@apache.org>.
mayankshriv commented on pull request #6039:
URL: https://github.com/apache/incubator-pinot/pull/6039#issuecomment-700137273


   If I get the context correctly, it seems we are trying to mix two things together?
   - Status of service being up and running
   - Status of a table being ready to be queriable
   
   Is that a fair conclusion? If so, I think we should treat them separately. Perhaps, we should provide an endpoint like `tableReady` that clients can use before starting to make queries.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] adriancole commented on pull request #6039: WIP: ServiceManager ADD_TABLE role

Posted by GitBox <gi...@apache.org>.
adriancole commented on pull request #6039:
URL: https://github.com/apache/incubator-pinot/pull/6039#issuecomment-706963614


   nagging https://github.com/apache/incubator-pinot/issues/5975 as ran into it again.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] adriancole commented on a change in pull request #6039: WIP: ServiceManager ADD_TABLE role

Posted by GitBox <gi...@apache.org>.
adriancole commented on a change in pull request #6039:
URL: https://github.com/apache/incubator-pinot/pull/6039#discussion_r499929548



##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/services/ServiceRole.java
##########
@@ -22,5 +22,5 @@
  * ServiceRole defines a role that Pinot Service could start/stop.
  */
 public enum ServiceRole {
-  CONTROLLER, BROKER, SERVER, MINION,
+  CONTROLLER, BROKER, SERVER, MINION, ADD_TABLE

Review comment:
       ok




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] adriancole commented on pull request #6039: WIP: ServiceManager ADD_TABLE role

Posted by GitBox <gi...@apache.org>.
adriancole commented on pull request #6039:
URL: https://github.com/apache/incubator-pinot/pull/6039#issuecomment-700327386


   @mayankshriv in the case of ServiceManager, if I were to add tables as "bootstrap" I would expect those installed before things are healthy, intuitively. This is without regards to later table adds (which would not be bootstrap). Make sense?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on a change in pull request #6039: WIP: ServiceManager ADD_TABLE role

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on a change in pull request #6039:
URL: https://github.com/apache/incubator-pinot/pull/6039#discussion_r499763424



##########
File path: pinot-controller/src/main/java/org/apache/pinot/controller/ControllerStarter.java
##########
@@ -350,6 +350,8 @@ private void setUpPinotController() {
     LOGGER.info("Starting Pinot Helix resource manager and connecting to Zookeeper");
     _helixResourceManager.start(_helixParticipantManager);
 
+    // TODO: ideally we can block to do schema/table install here, as it is before the thing is listening..

Review comment:
       Why would you want to block before schema/table install? A controller can start up without any tables just fine

##########
File path: pinot-tools/src/main/java/org/apache/pinot/tools/service/PinotServiceManager.java
##########
@@ -92,6 +92,8 @@ public String startRole(ServiceRole role, Map<String, Object> properties)
         return startBroker(new PinotConfiguration(properties));
       case SERVER:
         return startServer(new PinotConfiguration(properties));
+      case ADD_TABLE:

Review comment:
       This is not a pinot service. Perhaps you should pull out a base class, and add a `PinotSetupServiceManager` (I am sure there is a better name) in which you can add tables, ingest data, check for data being online, etc.? 

##########
File path: pinot-spi/src/main/java/org/apache/pinot/spi/services/ServiceRole.java
##########
@@ -22,5 +22,5 @@
  * ServiceRole defines a role that Pinot Service could start/stop.
  */
 public enum ServiceRole {
-  CONTROLLER, BROKER, SERVER, MINION,
+  CONTROLLER, BROKER, SERVER, MINION, ADD_TABLE

Review comment:
       `ADD_TABLE` is not a service in pinot, it should be an external service.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] kishoreg commented on pull request #6039: WIP: ServiceManager ADD_TABLE role

Posted by GitBox <gi...@apache.org>.
kishoreg commented on pull request #6039:
URL: https://github.com/apache/incubator-pinot/pull/6039#issuecomment-706821231


   High-level comment, AddTable is a very specific role and seems to solve one specific problem. Its probably better to either enhance the minion role to perform a task or add a new role (pinot-admin) and add table can be passed as a task. This will allow us to add more setup tasks without having to add a new role for each task. 
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] adriancole commented on pull request #6039: WIP: ServiceManager ADD_TABLE role

Posted by GitBox <gi...@apache.org>.
adriancole commented on pull request #6039:
URL: https://github.com/apache/incubator-pinot/pull/6039#issuecomment-706827806


   added notes about implications of "add table" and the current design of ServiceManager. Notable, ServiceManager can currently only start things of ServiceRole.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] adriancole commented on pull request #6039: WIP: ServiceManager ADD_TABLE role

Posted by GitBox <gi...@apache.org>.
adriancole commented on pull request #6039:
URL: https://github.com/apache/incubator-pinot/pull/6039#issuecomment-707063779


   functionally, this works if the "add table" commands are done last (because effectively, you end up needing to wait for everything anyway). Verified here https://github.com/hypertrace/pinot/pull/48
   
   It may seem not that much better if done in java vs shell scripts, especially as everything has to be ready first. However, java is easier to test, there's less network problems, and a chance to be considered a bootstrap operation (crash the process if it fails, participate in overall health)
   
   I understand that there's some debate about where to stuff the add table command. I also understand there is some angst about this temporarily using the "service" role. This was to perform the least change as literally ServiceManager is designed to work on services. I don't really care if this is called a service or not. However, there's a lack of an alternative also.
   
   This brings to the next action. I'm hoping someone maybe @kishoreg can find an agreeable place to house this logic. Once those debates are over, I can take a look at backfilling tests etc in that context. Of course, if someone wants this code to do on own, they can do that also.
   
   NEXT STEP: knowing a role ADD_TABLE won't fly, await an alternative.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] adriancole commented on a change in pull request #6039: WIP: ServiceManager ADD_TABLE role

Posted by GitBox <gi...@apache.org>.
adriancole commented on a change in pull request #6039:
URL: https://github.com/apache/incubator-pinot/pull/6039#discussion_r499930870



##########
File path: pinot-controller/src/main/java/org/apache/pinot/controller/ControllerStarter.java
##########
@@ -350,6 +350,8 @@ private void setUpPinotController() {
     LOGGER.info("Starting Pinot Helix resource manager and connecting to Zookeeper");
     _helixResourceManager.start(_helixParticipantManager);
 
+    // TODO: ideally we can block to do schema/table install here, as it is before the thing is listening..

Review comment:
       when a table is in the bootstrap roles for servicemanager, it can make sense that this is a part of bootstrap. Not just controller, but nothing ideally should listen until bootstrap is complete. ServiceManager is effectively a composite service.
   
   In talking also at length with @mayankshriv  and with your comment also, I can see this point is not understandable outside my head. The impact is more external orchestration, which imho is a bad experience. We are allowing a composite service, but deciding against taking advantage of it.
   
   Another point lost despite various attempts to convey it is that I don't mean to literally block here. I mean to add a callback iff servicemanager is in use implements it. otherwise a noop.
   
   If this above comment changes things, lemme know. Otherwise, I'm not really sure this PR is worth proceeding with as if we aren't taking these concerns internally, the external approach (shell scripts etc) seems more coherent..




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] adriancole commented on pull request #6039: WIP: ServiceManager ADD_TABLE role

Posted by GitBox <gi...@apache.org>.
adriancole commented on pull request #6039:
URL: https://github.com/apache/incubator-pinot/pull/6039#issuecomment-700421545


   WRT the proposal here, firstly yes this can be considered a non-production env.
   
   As the scope doesn't include data loading, we wouldn't have a condition to consider wrt awaiting a certain count.
   
   I was thinking to block on a state condition that applies when the table/schema config are, then release a latch that could possibly be here https://github.com/apache/incubator-pinot/pull/6039/files#diff-76cdfdb12a9e3f09a73e92f36705b710R353
   
   Such a latch could be a callback that isn't in use when bootstrap tables are not used.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] adriancole commented on a change in pull request #6039: WIP: ServiceManager ADD_TABLE role

Posted by GitBox <gi...@apache.org>.
adriancole commented on a change in pull request #6039:
URL: https://github.com/apache/incubator-pinot/pull/6039#discussion_r499929479



##########
File path: pinot-tools/src/main/java/org/apache/pinot/tools/service/PinotServiceManager.java
##########
@@ -92,6 +92,8 @@ public String startRole(ServiceRole role, Map<String, Object> properties)
         return startBroker(new PinotConfiguration(properties));
       case SERVER:
         return startServer(new PinotConfiguration(properties));
+      case ADD_TABLE:

Review comment:
       ServiceManagerRole perhaps? I currently don't have any other use case except setting up a table and prefer to not be too speculative.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] adriancole commented on pull request #6039: WIP: ServiceManager ADD_TABLE role

Posted by GitBox <gi...@apache.org>.
adriancole commented on pull request #6039:
URL: https://github.com/apache/incubator-pinot/pull/6039#issuecomment-706823255


   > High-level comment, AddTable is a very specific role and seems to solve one specific problem. Its probably better to either enhance the minion role to perform a task or add a new role (pinot-admin) and add table can be passed as a task. This will allow us to add more setup tasks without having to add a new role for each task.
   
   ok just also consider the configuration aspect of this. an addTable role looks like serializing inputs to the existing CLI commands or the REST api. it is also a fairly cardinal thing in a database to add a table.
   
   Regardless, this is more a task than a service. I don't know if minions do tasks, but something to consider.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org