You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@drill.apache.org by GitBox <gi...@apache.org> on 2021/05/04 15:30:35 UTC

[GitHub] [drill] luocooong opened a new pull request #2215: DRILL-7916: Support new plugin installation on the running system

luocooong opened a new pull request #2215:
URL: https://github.com/apache/drill/pull/2215


   # [DRILL-7916](https://issues.apache.org/jira/browse/DRILL-7916): Support new plugin installation on the running system
   
   ## Description
   
     Drill does not support the new plugin installation on the running system :
   
   1. Boot the Drill.
   2. Load plugins to the persistent storage : `pluginStore`.
      - [x] Upgrade the plugin if the override file exist (storage-plugins-override.conf). (exist)
      - [ ] Check and add new plugin with the new release. (new)
      - [x] If 1 and 2 are not true, then initial all the plugins via loading bootstrap configuration. (exist)
   3. End the boot.
   
     As the above. Before that, We must to export (and import after install new release) all the storage configutation to support the new plugin run on new release (the key point is that  the `pluginStore` must be reset). Now, Everything would be much simpler.
   
   ## Documentation
   1. Add a Start-up option `drill.exec.storage.append.storage`. Set to `true` if users want to using the new plugin in new release (Once only).
   2. Refer to the [Start-Up Options](https://drill.apache.org/docs/start-up-options/). `./drillbit.sh start -Dname=value`
   2. No difference from the current startup process if not set the option (On-demand use).
   
   ## Testing
     To-do
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [drill] paul-rogers commented on pull request #2215: DRILL-7916: Support new plugin installation on the running system

Posted by GitBox <gi...@apache.org>.

paul-rogers commented on pull request #2215:
URL: https://github.com/apache/drill/pull/2215#issuecomment-833088946

Hi @luocooong,

You've tackled a difficult issue! Before I comment on the code itself, I'll share some background info. As you'll see, this issue has a long history, and is more complex than it may first appear.

The term "plugin" in Drill is very ambiguous so it helps to define what we mean. The terms I tend to use are:

* Plugin module: The jar file which implements a "plugin". This is the Java code shared by all "instances."
* Plugin config: The JSON bits stored in ZK that configures a plugin. Any one plugin modules can have 0, 1 or many plugin configs. The configs have a name such as "cp" or "dfs".
* Plugin instance: A Java object, instantiated from code in the plugin module, configured by one plugin config. This is what actually does work.

It appears that this PR wants to handle the case where a new version of Drill adds new plugin modules, and we want to add new plugin configs to match.

As it turns out, there is supposed to be code to handle this case, added a couple of years back. There is some kind of upgrade file that each plugin can provide; the contents of which are copied into ZK (persistent store) on the first restart after the upgrade. I say "supposed to" because it was not clear to me, when I reworked the plugin registry, that the code actually worked; it seemed to have some bugs. See below.

One thing to remember, however, is that users can add their own plugins. If I release the "WhizBang" plugin module via my own project, you may want to add it to your existing Drill installation. We need a way that Drill notices the new plugin module, and adds the corresponding plugin configs.

We must also remember the ways that people run Drill:

* Single node "cluster" on a laptop, say. This is the "learning/testing" mode. Plugin configs reside in a ZK on that same machine.
* Embedded Drill. Plugin configs are either deleted after each run, or reside in the file system. If run under Docker, plugin state is lost on each restart (unless one does some fancy file system mappings.)
* Distributed Drill. Multiple Drill servers, each with their own copy of the software, run on separate servers, but share the same plugin configs in ZK.

We'd want any plugin upgrade process to work in all three, but certainly in the two "cluster" cases.

### Existing Mechanism

Let's discuss the existing mechanism. You found the function in the code `upgradeStore()`. As I recall, it uses a very weak method to detect the need to upgrade. If a release adds a new plugin, or changes an existing one, the release should include a special file: `ExecConstants.UPGRADE_STORAGE_PLUGINS_FILE` or `drill.exec.storage.upgrade.storage`. If that file exists, plugins are upgraded. If not, no action is taken. Once the upgrade is done, the code deletes the file.

The upgrade itself consists of copying a special file from the class path into ZK. (Checking the code, I think it is the file mentioned above: if it exists, we copy it to ZK and delete it.)

Needless to say, this is a very unreliable, hacky solution. First, if Drill is reinstalled, plugins are again upgraded. Second, since Drill is (supposed to be) distributed, every Drillbit will find it has a local copy of the file and they will all compete to do the upgrade, perhaps overwriting one another's work. Third, if Drill is run embedded (not a job for which it was designed), then state is not persistent (unless the user plays tricks) and only the first run after install will try to do the upgrade. Fourth, if the Drill files are read-only (for security reasons), the file can't be deleted and the upgrade will happen on every restart.

See `ClassicConnectorLocator.updatedPlugins()` which calls `PluginBootstrapLoaderImpl.updatedPlugins()`.

The "Classic" bit, by the way, is meant to be Drill's current plugin mechanism. The ideas was we would add a Presto-like SPI mechanism as well. Didn't quite get that far...

It is not clear, by the way, that this mechanism was ever actually tested, or that Drill developers know how to use it. It seems to have been quietly added to solve one specific use case. This is an example of why we need reviews, and tend to want a good solution: otherwise someone has to clean up the mess later.

### Proposed Solution

The solution proposed in this PR is OK, but it is cumbersome and probably usable only in your one use case.

Having a command-line option may work OK for an embedded Drill, but it is very awkward in a distributed environment, such as under K8s. In that case, we'd want exactly one Drillbit to take the option, and then only once. It is very difficult to configure that in either "stock" K8s or under Helm. The mechanism also won't work for any remaining MapR customers using the MapR management console.

Further, that PR adds a second incomplete mechanism. Do we support both? Only one?

### Recommended Solution

The proper solution is to adopt a distributed systems approach. Maintain a version in ZK for each plugin module. If the ZK version is lower than the code version, upgrade that plugin. If the code contains a plugin that is not listed in the ZK version table, add it. if the ZK version table has a plugin not in the code, remove it. If the ZK version is newer than the code, fail with an error, or skip that plugin. (Right now, Drill fails with a cryptic crash in this case. We've all spent fun-filled hours trying to track down this problem.)

And, of course, back up the whole ZK before taking actions. The plugin versions would *NOT* be the Drill version, since third-party plugins would not follow the Drill release cycle, nor would Drill releases work during development. Instead, the version number would be a simple integer per plugin. For example, the MapRDB plugin might forever be at 1, while a more active plugin might be at 5.

A compromise might be to have a version per plugin, but a single version number for the complete ZK persistent store. Such a solution works for Drill-provided plugins: each Drill release bumps the version. But, it won't work for third-party plugins because they won't be able to change the global version in a consistent way.

Doing this was on the list of plugin enhancements I was working on when doing the plugin storage project. But, "Phase II" never came, so this is an open issue available for someone else to pick up.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [drill] paul-rogers commented on pull request #2215: DRILL-7916: Support new plugin installation on the running system

Posted by GitBox <gi...@apache.org>.

paul-rogers commented on pull request #2215:
URL: https://github.com/apache/drill/pull/2215#issuecomment-1030531217

@luocooong, here are a few more thoughts. First issues, then solutions.

Installing a plugin on a running system is a bit hard, as you've discovered. Looks like I had some earlier comments. Adding a new plugin requires some kind of coordination: it can't be used on any node until it is available on all nodes.

The plugin config system is still a bit brittle. The config can't go into ZK until the plugin is available on all nodes. Else, when a node tries to deserialize the config, it will fail because the plugin config class is not (yet) available.

The situation for upgrades is worse: it is hard to remove the existing classes, and references. (Special design must be done to allow classes to be unloaded.) Such replacement can only be done when we know no queries are running that need the plugin.

Plugins are not as easy to manage as UDFs, so the approach used for dynamic UDFs (which was already a bit complex), probably won't work for plugins. For one, UDFs have no ZK-based config file. On the other hand, the complexity of UDFs comes from preventing race conditions: from ensuring that the UDF is available to all nodes before allowing a query to reference the UDF.

So, solutions. The simplest is to use Drill's graceful shutdown feature and simply restart each Drillbit. This process also works for a patch release, to change the memory allocated to Drill, etc. So, you should already have a rolling restart mechanism available anyway if you are running Drill in prouction. If so, then just use that mechanism for adding a new plugin.

The process would be:

* Install the plugin jar on every node. (See below.)
* Use graceful shutdown to perform a rolling restart: shut down and restart nodes one at a time. This is safe because there should be no configs in ZK for the new plugin, which means queries can use the plugin on the nodes that have loaded it.
* When all Drillbits are restarted, write the plugin config into ZK.
* Once the plugin config is picked up by the nodes, queries can be issued against nodes.

The config is needed only on the node acting as the Foreman: all other nodes get the required config handed to them in the physical plan. So, no worry about race conditions in plugin config rollout.

One thing Drill is lacking is simple REST APIs for management. There should be an API to trigger graceful shutdown. Another to post a config. (There are a few APIs, but they are designed for use by the UI. Better than nothing.)

One other item: I notice that the proposed code change adds a plugin class path to the Drill config. That's not really needed. If you use the `--site <path>` option, then all your unique files (configs, UDFs, plugins) can reside in your custom conifg directory. Drill adds the `$DRILL_CONF/lib` directory to the class path. (Double check, I'm writing this from memory, being the guy who added this feature.) So, your custom plugins simply go into that custom config directory.

Longer term, we should add classloader isolation for plugins. Without that, your new plugin could bring down the system if, say, you use a version of Guava that conflicts with Drill's. The same is true of the many other dependencies.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [drill] luocooong closed pull request #2215: DRILL-7916: Support new plugin installation on the running system

Posted by GitBox <gi...@apache.org>.

luocooong closed pull request #2215:
URL: https://github.com/apache/drill/pull/2215


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org