You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by GitBox <gi...@apache.org> on 2021/06/04 15:00:05 UTC

[GitHub] [drill] vdiravka opened a new pull request #2251: DRILL-7871 StoragePluginStore instances for different users

vdiravka opened a new pull request #2251:
URL: https://github.com/apache/drill/pull/2251


   # [DRILL-7871](https://issues.apache.org/jira/browse/DRILL-7871): StoragePluginStore instances for different users
   
   ## Description
   
   This feature allows to have the separate plugin set per user. Profiles are restricted per user too.
   
   ## Documentation
   To enable the feature:
   `drill.exec.security.user.auth.separate_workspace`
   
   TBI
   
   ## Testing
   Unit tests. Manual testing of Drill WebUI and JDBC connection via SqlLine
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] vdiravka commented on pull request #2251: DRILL-7871 StoragePluginStore instances for different users

Posted by GitBox <gi...@apache.org>.
vdiravka commented on pull request #2251:
URL: https://github.com/apache/drill/pull/2251#issuecomment-882856859


   * **Motivation:**
   
   The goal is to allow multi-tenant operations. So it doesn't matter what API user uses: REST, JDBC ot ODBC. This is true that Drill initially wasn't designed for that, but we ca add it.
   
   * **Options:** 
   > The user has to be able to set the user options. We don't need a new option level. Simply add a new statement: ALTER USER > SET <name>=<value>. This would do two things:
   > Write the given key/value pair into the persistent store for the user.
   > Change the session option in the current session to that new value.
   
   This is also questionable, because it breaks `SessionOptionManager extends InMemoryOptionManager`:
   ```
   This is an OptionManager that holds options in memory rather than in a persistent store. Options stored in 
   SessionOptionManager, QueryOptionManager, and FragmentOptionManager are held in memory 
   (see options) whereas the SystemOptionManager stores options in a persistent store.
   ```
   Most likely all easy ways here are some sort of hacks. So let's do that in correct manner: adding the new `UserOptionManager` as a layer between `SystemOptionManager` and `SessionOptionManager`.
   
   > The next step is store the per-user plugins in separate persistent stores. This is the "cleaner" design, but now you have to 
   > create the means for an admin to see, change, and remove the user-level plugins. You also have to extend the plugin registry, > which is, as we noted, a complex beast. Any change requires extensive unit tests.
   
   Different PS can be chosen per Drill doc. This is independent from Plugins. We have `HBasePersistentStore`, `InMemoryStore`, `LocalPersistentStore`, `MongoPersistentStore`, `NoWriteLocalStore`, `ZookeeperPersistentStore` implementations for PS.
   If you mean different instances of the same`PS`, that is not needed. PSRegistry is just a code, which allows to access actual PS. We can manage access via the common PSRegistry. Or we can reconsider it in future.
   
   *  **Plugins:**
   
   About plugins it makes sense to add Global plugins and to use it in case User plugin is absent. It's like a plugins fallback.
   And system plugins can be used only in scope of Global plugins. I don't think user name in schema is a nice and elegant solution. I think plugins fallback is simpler and more convenient approach.
   
   * **Conclusion:**
   
   So I am going to add this functionality to current implementation and separate out `user options` part of this PR into the new one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] paul-rogers commented on pull request #2251: DRILL-7871 StoragePluginStore instances for different users

Posted by GitBox <gi...@apache.org>.
paul-rogers commented on pull request #2251:
URL: https://github.com/apache/drill/pull/2251#issuecomment-877870693


   This one is difficult. We are reviewing code under time pressure when the design itself raises many questions. Let's start with the design. First, it is too short, it leaves many issues unspecified. So, I have to infer them from reviewing the code. As noted before, the storage plugin registry is complex; so having to reverse-engineer a design at the code level makes the process slower than it might otherwise be.
   
   This PR adds two separate features: a new options level and changes to the plugin registry. Since these are independent concepts, they should appear as separate PRs: one for the options, another for plugins. That will reduce the cognitive load on us poor reviewers.
   
   So, the design of options:
   
   > But it appears different users have the common Storage Plugins and System Options.
   > * There is no possibility for Drillbit configuring individually for every user and persistent between sessions (Note: SessionOptions are removed after closing session)
   
   The above is by design. **System** options are just that: options for the system. They are meant to be set by the administrator to control the Drill cluster as a whole. Drill also has *Session* options: these are per user session. If we want per-user option settings, then we need a new level: *User* options.
   
   Options form a hierarchy. At present that hierarchy, from most to least specific, is:
   
   * Per-query options *or* session options
   * System options
   * Options configured in the Drill config system.
   
   It sounds like you want to add a new layer: user options:
   
   * Per-query options **or**
     * Session option
     * Per-user option
   * System options...
   
   Suppose we have per-user system options: I set my queue length to 10 items, you set your queue length to 100 items. But, the queue is a global resource: which is used? This is just one example of how intractable "per-user system options" is as a concept. So, we need actual user options.
   
   One obvious choice is simply to persist the session options. I login and change `foo` to 10. Drill simply persists that change so that next time I log in, `foo` is 10. Easy, but is that the right answer? Drill has a serious flaw that the behavior of any query depends on options: JSON modes, all-text mode and so on. That was done as a quick and dirty short-term solution to ship the product, but is still with us and still causing issues. If I set an option to "all text mode", and run my query, I get a result. If I share that query with you, but you don't set the all-text option, the query will produce different results (or, probably, fail.)
   
   So, to "fix" options, we need to push those options into the query (with hints, say), or we need options associated with a query, as having per-user options won't solve the "options associated with a query" problem.
   
   So, we probably do not want to just persist session options as they are used to control specific queries. Instead, we'd want to add a new system: `SET USER OPTION ...` so that the hierarchy is as above.
   
   The design is silent on many important questions:
   
   * What is the new hierarchy?
   * How does *the* (not an) administrator set truly-global options if all users can set their own?
   * How does the user set, review and clear the user options?
   * Suppose a user is logged in twice. In session 1 they change `foo` to 10. Do we expect session 2 (on a different Drillbit) to see that change? This is a **very** difficult concurrency question. Of course they do. But, when? In the middle of query planning? At the start of the next SQL statement? After a time period? Only at the next login? If concurrently, how will one Drillbit track changes made to the user options by another Drillbit? How can the use a "force resync" to keep options synchronized?
   * Can the admin see the per-user options? If so, how? `SHOW OPTIONS FOR USER 'bob'`? Remember, the admin is tasked with policing the system, they **must** be able to see all options.
   * How do we solve the current ambiguity of true system options (the query queuing options mentioned above) vs. user/session options? At present, the solution is ad-hoc and we just trust that most options are not changed. If a naive user can change any option, we've got issues. Do we need a permissions table to say which users can change which options? (And, categorize true system options as read-only at the user and session levels.)
   * You will need `USER` version of all of the option commands.
   * You will need complete tests that demonstrate that the hierarchy works as expected.
   
   The good news is that, in terms of implementation, Drill's option system is already hierarchical, so adding a new layer should be very easy.
   
   All of this is complex. It should be done as its own PR along with a *detailed* design.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] paul-rogers commented on pull request #2251: DRILL-7871 StoragePluginStore instances for different users

Posted by GitBox <gi...@apache.org>.
paul-rogers commented on pull request #2251:
URL: https://github.com/apache/drill/pull/2251#issuecomment-877884101


   Let's now move onto the primary topic of this PR: plugins. Again, this is a complex topic, especially if we think about how the solution would work at scale. I'm afraid that, since the design is so sparse and leaves out many details, that we're perhaps going down the wrong path. Let's take each topic in turn so we can see what's what.
   
   First, what is the structure we want? To answer this we need to separate the *code* (Java libraries, which I'll call a "connector", following Presto) and *configs* (the JSON things that users see, what we normally mean when we say "plugin".)
   
   Drill will only ever work with a fixed set of connectors: those available on the class path at runtime. AFAIK, there has never been interest in adding connectors at runtime (no "dynamic connectors") and, for security reasons, I'm not sure that doing so would even be a good idea.
   
   Connectors are presently used by any number of configs. And, every query creates an instance of the objects in the connector (the readers, the planner objects, etc.) So, the structure we have is:
   
   * Config (a JSON object in the persistent store)
   * Connector (a library)
     * Plugin (an instance of the plugin class from a connector, along with its config.)
       * Readers, planner objects, created per-query from a plugin
   
   The plugin registry binds connectors to configs to produce plugins (where by "plugin" I mean an instance of the plugin class initialized with a config.) This is some mighty complex code! It has to deal with a bunch of distributed system and concurrency-related issues. I'll omit the details for now.
   
   So, what does this mean? For one, it means we only need *one* copy of the connector. The "new" plugin registry is designed to handle this idea. My "class path plugin" and your "class path plugin" use the same plugin library. I'll assume that the answer above, that each user has their own copy of the code, represents a misunderstanding because of Drill's horribly ambiguous names in this area.
   
   Next, the plugin registry holds two kinds of "plugins". First are those configured in the UI and that can be resolved in queries. Familiar things like `cp`, `dfs` and so on. Second are ad-hoc plugins: those created on-the-fly based on table properties in a query, or plugin properties passed along in a query definition. This is done so that queries work even if, right after planning completes, someone deletes or changes the stored config: the next query sees the new config, the executing query uses the config stored in the query. (Else, disaster would result when some Drillbits see one config, others another.)
   
   There is a system plugin that offers the system tables. This one is meant to be shared by all users. It has no config options, but if it did, they would be set at the system level by the admin. It makes no sense for each user to have their own copy. (It might make sense to disallow certain system tables for some users, but that is a different question.)
   
   Now we have the "regular" configured plugins: a JSON config and an associated plugin instance. The goal here is to define a new level so that individual users have these items.
   
   First, I'll question if the requirements are correct. Do we really want plugins associated with each user? If you and I both use tale "foo", do I have to ask you to send me your JSON so I can create a new copy? What if you change something? What if we have 100s of users all with copies? Making copies is a "demo only" feature, it does not scale in production. Remember: reuse by copy and paste is for amateurs who will throw away their solution, not for professionals who want to minimize total costs.
   
   Let's think of a use case. A firm has five departments with 10 people per department. Some people in one department can see some data in another. For example, the VP of marketing is allowed to see Sales and Dev data. The CFO can see everything. Oh, and the employees change regularly.
   
   This is not a new idea. How do other tools handle this? Oracle? SQLServer? They operate at the level of schemas (databases) and tables. So, think of a plugin (config in storage, config + connector at runtime) as a shared, named object.
   
   So, what we want are groups of plugins. "Marketing", "Sales", "Ops". Within each are the configs for that set of tables. Users can be given permissions on the whole group, or on individual plugins. Administrators or DBAs (those who understand production systems) set up the configs. Users use those to which they are given permission. Or, if we want to be cheap, we have a huge global name space of plugins, and require all names to be globally unique, but you can only see those to which you have been given permission. (How will this work with 100 plugins and 1000 users? That results in 10,000 distinct permissions to maintain.)
   
   Users might want to create their own, individual configs not meant to be shared. Fine. Have a "user" group that holds configs for just that user. If I want to promote my new "foo" config, I ask an Admin to add it to the "Dev" group so you can see it.
   
   If the above is roughly in the ball park, we'd want to work out the details in a detailed design. There are likely a dozen "gotchas" that the above omitted. Who removes configs for users no longer with the organization? What connectors is a user allowed to create configs for? How can the admin see all configs to police users or track down odd behavior "some user did something silly that puts huge load on an external system"? 
   
   Think of security: who decides who gets permission on what? Is it done manually? With what UI? How does that work in an organization of 10 people? 100? 1000? Since we are dealing with permissions based on roles (role-based security), can/should we tie onto the corporate LDAP or similar system? What is the API for that?
   
   Configs should hold security keys, but those should not be in JSON, stored in the persistent store. (Drill has *always* done this wrong for systems other than MFS; another reason Drill fails in production.) How are configs connected with a "vault" to get keys, passwords, tickets and the like at runtime? On S3, the use of temporary tokens is the right way to do permissions, no one gives away S3 keys any more. So, how does a plugin obtain and refresh its S3 token?
   
   These are hard topics. Thanks for taking on this big challenge! Suggestion: pick a small subset to start with, but ensure that subset allows us to add the pieces needed when the solution moves into production.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] paul-rogers commented on pull request #2251: DRILL-7871 StoragePluginStore instances for different users

Posted by GitBox <gi...@apache.org>.
paul-rogers commented on pull request #2251:
URL: https://github.com/apache/drill/pull/2251#issuecomment-885898568


   @vdiravka, on the plugins side, let's please do list our requirements and design goals. Else, I worry that the comments I make get lost, and I will this have to repeat them over and over.
   
   Requirements as I extract them from your description:
   
   * Provide a bootstrap option to enable the new semantics. The default is off, meaning that Drill behavior will not change by default.
   * Retain global plugin configurations. (They allow multiple users to share the same configuration without copy/paste.)
   * Add per-user plugin configurations.
   * Users can add and change only their own per-user configurations. (No user can "share" a config with another user: either the config must be copied to a global config by the admin, or the second user must make their own copy.)
   * If a user configuration happens to have the same name as a global configuration, the per-user configuration takes precedence. (If the admin make a global version of a now-shared config, the user must delete his own copy in order to use the now-shared version.)
   * No non-admin user can see or change the configuration for another user. (That is, if two people work in the same group, and want to share a config, they must do so via copy/paste, or by asking the admin to make the plugin global to all users.)
   * No non-admin user can change a global configuration.
   * The admin user can see, add, modify or delete *any* plugin configuration, both global and per-user. (Without this, the system is not maintainable as the admin won't be able to manage per-user configurations.)
   
   Now, one can disagree with these requirements. For example, in a multi-tenant environment, we do not one *tenant* to see the plugins for another *tenant*. But, we might want all users in "Tenant A" to see a shared set of configs. Your description appears to assume that "Tenant" = "User": one user per tenant. However, a typical model is for each human in a tenant to have their own user account. Please spell out the desired semantics.
   
   *Aside*: Clarifying the desired tenant model may help with the options discussion. I've been equating "user" with "person". If your tenant model equates "user" with "tenant", so you are tying to define per-tenant defaults, then please clearly say so in the design. If you want per-tenant default, I would suggest a somewhat different design for options.
   
   Finally, if the goal is a true multi-tenant model, in which tenants are distinct business entities (rather than different departments within a single entity), then we must also ensure each tenant has access to only their own query profiles. Will there be another PR for this? Is there a roll-up Jira ticket for all the issues involved in true multi-tenant operation?
   
   Also, the Drill UI assumes a single organization. If the UI is to be exposed to multi-tenant users (so they can monitor queries, see query profiles, etc.), then the UI must change. Tenants should not be able to see details, or change the state of, Drillbits. Session options should reflect tenant values. Probably other changes. Will there be a design or PR for this?
   
   Further, each tenant must have guarantees on resources. That is, Tenant A should not be able to run a huge query that denies resources sold to Tenant B. This is a **very hard** problem. If you don't solve the hard problem, the options and plugins are somewhat moot. The option and plugin features are handy, but do not, by themselves, give you multi-tenant support in Drill.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] paul-rogers commented on a change in pull request #2251: DRILL-7871 StoragePluginStore instances for different users

Posted by GitBox <gi...@apache.org>.
paul-rogers commented on a change in pull request #2251:
URL: https://github.com/apache/drill/pull/2251#discussion_r662667413



##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginRegistry.java
##########
@@ -21,19 +21,18 @@
 import java.util.Map;
 import java.util.Set;
 
+import org.apache.calcite.schema.SchemaPlus;

Review comment:
       The plugin registry is a complex beast. Before this change, it supported built-in plugins and those provided in the plugins folder. The registry handles both the "connector" (code) and "instances" (code + a config.)
   
   This change adds per-user plugins. The overall bug description does not seem to provide answers to several questions. For example:
   
   * Are we controlling access to the "connectors" (code) or to the "instances" (configs?)
   * If we control the connectors, do we load new copies of the connector metadata per user?
   * Can a user see the common set of system-wide configs? Or, only their own private configs?
   * If system wide, how do we handle name conflicts? I add a "foo" plugin, later the admin adds a system-wide "foo" or visa-versa.
   * Can a private plugin be promoted to system level? Is that a copy/paste operation? With the user logged in as different windows? Or, is there a button to do the operation?
   * When is the per-user information created? As you know, on the web, the user session comes into existence for every single REST call. Plugin setup is pretty heavy-weight. Is the user config cached between requests? If so, when do we expire the user information if the user then becomes idle?
   * The registry handles updates. The current code is rather awful since there is no coordination among Drillbits: multiple drillbits could come up at the same time and they can all decide to do the upgrade. Adding users complicates the effort. Will a user session trigger a plugin upgrade? How do we handle races?
   * This registry handles a subtle, obscure case. Suppose (before this change), I run a query that users the "foo" plugin. Before the query starts running (but after planning), I rush in and delete the "foo" config. What happens? As it turns out, Drill saves the old config so that queries continue to work. This works because there is only one registry. If we add per-user registries, we'd need to pass the user info with the exec plan so that the readers grab the user's version of the config, including any cached, recently-deleted configs. (Distributed systems are complex!)
   * There are probably a few more issues that also deserve consideration. The comments in this part of the code try to explain the various issues.

##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/work/prepare/PreparedStatementProvider.java
##########
@@ -138,7 +139,9 @@ public void run() {
         final QueryId limit0QueryId = userWorker.submitWork(wrapper, limit0Query);
 
         final long timeoutMillis =
-            userWorker.getSystemOptions().getOption(CREATE_PREPARE_STATEMENT_TIMEOUT_MILLIS).num_val;
+            userWorker.getDrillbitContext().getConfig().getBoolean(ExecConstants.SEPARATE_WORKSPACE)

Review comment:
       As noted above, this kind of code will lead to countless subtle errors. Suggestion: introduce yet another option manager whose job is to decide which of the two "system" option managers is needed. Option managers are designed to nest, so this should be not too hard.
   
   And, there is still the question: why do we need two *system* option managers? Do users get their own system options now? See questions above...

##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/StatusResources.java
##########
@@ -183,9 +188,9 @@ public Viewable getSystemInternalOptions(@Context UriInfo uriInfo) {
   public Viewable updateSystemOption(@FormParam("name") String name,
                                    @FormParam("value") String value,
                                    @FormParam("kind") String kind) {
-    SystemOptionManager optionManager = work.getContext()
-      .getOptionManager();
-
+    SystemOptionManager optionManager = authEnabled.separateWorkspace()

Review comment:
       This seems to say that updating a *system* option works differently depending on whether the user has their own workspace. Shouldn't *system* options always update the (global) system options? Otherwise, do we need a "real system option" attribute to really update the global options?

##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/StatusResources.java
##########
@@ -112,7 +115,9 @@ public String getMetrics(@PathParam("hostname") String hostname) throws Exceptio
   private List<OptionWrapper> getSystemOptionsJSONHelper(boolean internal)
   {
     List<OptionWrapper> options = new LinkedList<>();
-    OptionManager optionManager = work.getContext().getOptionManager();
+    OptionManager optionManager = authEnabled.separateWorkspace()

Review comment:
       This is a complex bit of code which is repeated several times. Can it be factored out somewhere?

##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/StorageResources.java
##########
@@ -138,7 +141,9 @@ public Viewable getPlugins() {
   @Produces(MediaType.APPLICATION_JSON)
   public PluginConfigWrapper getPluginConfig(@PathParam("name") String name) {
     try {
-      return new PluginConfigWrapper(name, storage.getStoredConfig(name));
+      return authEnabled.separateWorkspace()

Review comment:
       On this, and similar lines, it would be better to factor out the logic into a method on `WebUserConnection` such as `getStorage()` or some such. Too much copy/paste and room for error otherwise.

##########
File path: exec/java-exec/src/test/java/org/apache/drill/exec/store/TestUserPluginRegistry.java
##########
@@ -0,0 +1,201 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store;
+
+import org.apache.drill.common.config.DrillProperties;
+import org.apache.drill.common.logical.StoragePluginConfig;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.rpc.user.UserServer;
+import org.apache.drill.exec.rpc.user.UserSession;
+import org.apache.drill.exec.rpc.user.security.testing.UserAuthenticatorTestImpl;
+import org.apache.drill.exec.store.dfs.FileSystemConfig;
+import org.apache.drill.exec.store.dfs.FileSystemPlugin;
+import org.apache.drill.test.ClientFixture;
+import org.apache.drill.test.ClusterFixture;
+import org.apache.drill.test.ClusterFixtureBuilder;
+import org.junit.Test;
+
+import java.util.HashMap;
+import java.util.Map;
+
+import static org.apache.drill.exec.rpc.user.security.testing.UserAuthenticatorTestImpl.TEST_USER_1;
+import static org.apache.drill.exec.rpc.user.security.testing.UserAuthenticatorTestImpl.TEST_USER_1_PASSWORD;
+import static org.apache.drill.exec.rpc.user.security.testing.UserAuthenticatorTestImpl.TEST_USER_2;
+import static org.apache.drill.exec.rpc.user.security.testing.UserAuthenticatorTestImpl.TEST_USER_2_PASSWORD;
+import static org.apache.drill.exec.util.StoragePluginTestUtils.CP_PLUGIN_NAME;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertNotNull;
+import static org.junit.Assert.assertNotSame;
+import static org.junit.Assert.assertNull;
+import static org.junit.Assert.assertSame;
+import static org.junit.Assert.assertTrue;
+import static org.junit.Assert.fail;
+
+public class TestUserPluginRegistry extends BasePluginRegistry {

Review comment:
       Thanks for providing these tests. Thanks for building on the tests added back when I refactored the plugin registry: there were many gaps in test coverage before that time.
   
   Do the test also test interaction of the per-user registry with the shared system registry? See the above comment for the kinds of things to test: duplicate names, etc.
   
   This area is highly complex (unfortunately) so we have to be very careful so we don't break stuff.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] vdiravka commented on a change in pull request #2251: DRILL-7871 StoragePluginStore instances for different users

Posted by GitBox <gi...@apache.org>.
vdiravka commented on a change in pull request #2251:
URL: https://github.com/apache/drill/pull/2251#discussion_r663159582



##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/StatusResources.java
##########
@@ -112,7 +115,9 @@ public String getMetrics(@PathParam("hostname") String hostname) throws Exceptio
   private List<OptionWrapper> getSystemOptionsJSONHelper(boolean internal)
   {
     List<OptionWrapper> options = new LinkedList<>();
-    OptionManager optionManager = work.getContext().getOptionManager();
+    OptionManager optionManager = authEnabled.separateWorkspace()

Review comment:
       Done.

##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/StatusResources.java
##########
@@ -183,9 +188,9 @@ public Viewable getSystemInternalOptions(@Context UriInfo uriInfo) {
   public Viewable updateSystemOption(@FormParam("name") String name,
                                    @FormParam("value") String value,
                                    @FormParam("kind") String kind) {
-    SystemOptionManager optionManager = work.getContext()
-      .getOptionManager();
-
+    SystemOptionManager optionManager = authEnabled.separateWorkspace()

Review comment:
       > This seems to say that updating a system option works differently depending on whether the user has their own workspace.
   
   - Yes
   
   > Shouldn't system options always update the (global) system options? Otherwise, do we need a "real system option" attribute to really update the global options?
   
   - No, we don't care about real system options in case separate_workspace is enabled. The user and all his activity relies on his own SystemOptions.

##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java
##########
@@ -71,6 +73,87 @@
  */
 public class SystemOptionManager extends BaseOptionManager implements AutoCloseable {
   private static final Logger logger = LoggerFactory.getLogger(SystemOptionManager.class);
+  public static final String OPTIONS_PSTORE_NAME = "options";

Review comment:
       _Note:_ changes in `SystemOptionManager` is just refactoring (placing code blocks in right places)

##########
File path: exec/java-exec/src/test/java/org/apache/drill/exec/store/TestUserPluginRegistry.java
##########
@@ -0,0 +1,201 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store;
+
+import org.apache.drill.common.config.DrillProperties;
+import org.apache.drill.common.logical.StoragePluginConfig;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.rpc.user.UserServer;
+import org.apache.drill.exec.rpc.user.UserSession;
+import org.apache.drill.exec.rpc.user.security.testing.UserAuthenticatorTestImpl;
+import org.apache.drill.exec.store.dfs.FileSystemConfig;
+import org.apache.drill.exec.store.dfs.FileSystemPlugin;
+import org.apache.drill.test.ClientFixture;
+import org.apache.drill.test.ClusterFixture;
+import org.apache.drill.test.ClusterFixtureBuilder;
+import org.junit.Test;
+
+import java.util.HashMap;
+import java.util.Map;
+
+import static org.apache.drill.exec.rpc.user.security.testing.UserAuthenticatorTestImpl.TEST_USER_1;
+import static org.apache.drill.exec.rpc.user.security.testing.UserAuthenticatorTestImpl.TEST_USER_1_PASSWORD;
+import static org.apache.drill.exec.rpc.user.security.testing.UserAuthenticatorTestImpl.TEST_USER_2;
+import static org.apache.drill.exec.rpc.user.security.testing.UserAuthenticatorTestImpl.TEST_USER_2_PASSWORD;
+import static org.apache.drill.exec.util.StoragePluginTestUtils.CP_PLUGIN_NAME;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertNotNull;
+import static org.junit.Assert.assertNotSame;
+import static org.junit.Assert.assertNull;
+import static org.junit.Assert.assertSame;
+import static org.junit.Assert.assertTrue;
+import static org.junit.Assert.fail;
+
+public class TestUserPluginRegistry extends BasePluginRegistry {

Review comment:
       The per-user registry doesn't interact with the system one. Except that every per-user one has an initial state corresponding to the system one.
   
   Also we may want in future to enable this feature by default and then check all tests pass.
   

##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/StorageResources.java
##########
@@ -138,7 +141,9 @@ public Viewable getPlugins() {
   @Produces(MediaType.APPLICATION_JSON)
   public PluginConfigWrapper getPluginConfig(@PathParam("name") String name) {
     try {
-      return new PluginConfigWrapper(name, storage.getStoredConfig(name));
+      return authEnabled.separateWorkspace()

Review comment:
       fully agree. Done

##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/work/prepare/PreparedStatementProvider.java
##########
@@ -138,7 +139,9 @@ public void run() {
         final QueryId limit0QueryId = userWorker.submitWork(wrapper, limit0Query);
 
         final long timeoutMillis =
-            userWorker.getSystemOptions().getOption(CREATE_PREPARE_STATEMENT_TIMEOUT_MILLIS).num_val;
+            userWorker.getDrillbitContext().getConfig().getBoolean(ExecConstants.SEPARATE_WORKSPACE)

Review comment:
       Yes, every user starts to have the separate instance of `SystemOptions`. The original one is needed for consistency and to start Drillbit. We can chose in future one of the ways: 
   * removing the global SystemOptions 
   * leveraging this global SystemOptions for some purpose (as describe in 5.2 section of [doc](https://docs.google.com/document/d/1mFjDGPHID70SG5uwP8FZImI0eyEwjSmbTCaSzOU7sjc/edit#heading=h.iqtfxhbbot9c))
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] paul-rogers commented on pull request #2251: DRILL-7871 StoragePluginStore instances for different users

Posted by GitBox <gi...@apache.org>.
paul-rogers commented on pull request #2251:
URL: https://github.com/apache/drill/pull/2251#issuecomment-881151197


   Hi @vdiravka,
   
   Thanks for the note, you make good points. Let's see where we can compromise. Let's start with options.
   
   It makes no sense for users to have their own system options (my favorite example is the length of the global, shared query queue.) So, `SYSTEM` options can't be per-user. This means that the `SYSTEM` level has to remain as it is. We can cut corners: still allow users to change system options, we just ignore those values. (For example, IIRC, the query queue, the value is taken from the `SYSTEM` options; we ignore `SESSION` values.) That is, we just trust that user ignorance will prevent them from breaking things; they won't know about the system options.
   
   If we have any form of user-property persistence, then we have to worry about race conditions. I don't see how we can ignore that issue unless we are strict and we say that user options are read once: when the user starts a session. If I change `all-text-mode` in session 1, it will not affect session 2. This seems reasonable for the "toy" use cases where Drill is now used.
   
   That is, when a session starts, we set the session options to the saved per-user values from the persistent store. A nice result is that, at runtime, there is no difference between user and session options. This fact should save some work.
   
   The user has to be able to set the user options. We don't need a new option level. Simply add a new statement: `ALTER USER SET <name>=<value>`. This would do two things:
   
   1. Write the given key/value pair into the persistent store for the user.
   2. Change the session option in the current session to that new value.
   
   Again, we let the user change anything they want; we rely on the fact that users will only care about one or two options, and only when someone suggests a change.
   
   The above is simple, but limited. The limits are no worse than today. It seems we could extend the feature in the future without breaking anything. (For example, broadcast user option changes to all sessions. Apply security to system options. Etc.)
   
   Then, we simply ignore the issue that the options users care about are associated with queries (all the Parquet, JSON, and other options that affect reader behavior.) We simply decide to be surprised when the user encounters those issues. User: "Gee, I set the `all-text-mode` option and my queries run file. But, when Bob runs them, they fail...". Our answer: "have Bob change the same option" This may break Bob's own queries, so we have Bob change the option back. This is a horrible user experience, but is the same as what Drill offers today, so we're not breaking anything.
   
   Will this work?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] paul-rogers commented on pull request #2251: DRILL-7871 StoragePluginStore instances for different users

Posted by GitBox <gi...@apache.org>.
paul-rogers commented on pull request #2251:
URL: https://github.com/apache/drill/pull/2251#issuecomment-887041732


   @vdiravka, I had an opportunity to get a bit more background info for this project. I am going out of my way to try to facilitate this PR; normally we'd require that the PR author provide this information so that us reviews can simply review the code, and not have to reverse engineer requirements and design.
   
   Sounds like the requirements are for a very specific light-weight multi-tenant model: one that allows tenants to set options, create storage plugin configs, and run queries, but not access any other part of Drill. Tenants are to be trusted to not make mistakes. Specifically:
   
   * A *tenant* has a set of "system" options (maybe call them *tenant options*) available when that tenant creates a Drill session.
   * A tenant can define a set of storage plugin configs which are visible to *only* that tenant. Perhaps call these *tenant plugin configs*.
   * A tenant can run queries that use the tenant options and tenant plugin configs.
   
   This use case is limited compared to the normal multi-tenant requirements. The following appear to be restrictions for this project:
   
   * A tenant does not have access to the Drill Web Console or the Drill REST API and thus does not have access to query profiles.
   * A tenant does not have access to Zookeeper or the Drill native API. Queries sent by the tenant must go through an intermediate software layer provided by the service provider.
   * A tenant does not have access to Drill logs to diagnose failed queries.
   
   The above restrictions say that the feature is not useful for open source Drill users who use the Drill-provided UI and APIs. This makes the feature of very limited appeal to the Drill community. So, one of our challenges is to design the feature in a way that users of the "out-of-the-box" Drill can benefit.
   
   Additional restrictions for this one use case:
   
   * A tenant cannot start, stop or restart Drillbits, nor can they change startup properties.
   * A tenant cannot upload a UDF nor can a tenant provide custom *connectors* (storage plugin classes). (Note that [DRILL-7916](https://github.com/apache/drill/pull/2215) is working at cross-purposes to this PR.)
   * Tenants are trusted to not change system-wide performance-related options (queueing, resource allocation, etc.) The resulting behavior, if those options are changed, is undefined and must be dealt with by the service provider if they occur.
   * No provision for the Drill admin to view or modify tenant options or plugin configs. If such behavior is desired, a service provider must write tools that work with Drill's persistent storage.
   * Tenants are trusted to not consume excess resources, so no resource isolation between tenants. Tenant A might try to sort a trillion rows, which might deny resources to other tenants.
   * Tenants cannot (?) create views or a metadata store.
   * Parquet metadata caching is either unsupported (?) or must be written to the tenant's S3 bucket; Drill provides no storage for the metadata.
   
   The above limit the solution, but leave the door open to eventually providing more general multi-tenant support.
   
   A final question is the relation between *tenant* and *user*. This PR assumes that they are identical: that "fred" is either a normal Drill user in "normal mode", or a tenant in "tenant mode." That is, each tenant has a single Drill user (which works in this use case because of the intermediate software layer.) This explains why this PR is labeled as "instances for different *users*", the the discussion has revealed the goal to be "instances for different *tenants*."
   
   Since the "tenant = user" model applies to only this one use case, it again is not a general enough feature to add to the Apache Drill code. Instead, Apache Drill must provide a generally useful solution. 
   
   Prior notes have explained how a "per-user" model should work (it requires sharing between users). Specifically a, "config per user" solution must recognize that users work on a team, allow sharing, and permit admin abilities. Similarly, an "options per user" solution must persist only a subset of options (those which are neither system nor per-query options), and must solve the synchronization issue.
   
   Notes have also explained that the customary definition of "tenant" is an organization with multiple users. A true multi-tenant solution must allow multiple users per tenant, and provide a path toward full tenant isolation later.
   
   The challenge here is to find a design that balances the very specific, ad-hoc, unusual needs of this one use case, with something that can evolve to become of general use to the Drill community.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] paul-rogers edited a comment on pull request #2251: DRILL-7871 StoragePluginStore instances for different users

Posted by GitBox <gi...@apache.org>.
paul-rogers edited a comment on pull request #2251:
URL: https://github.com/apache/drill/pull/2251#issuecomment-885888891


   @vdiravka, thanks for the explanation. Let's be careful not to confuse existing implementation with new requirements. Yes, there is a comment that says that the system options are persistent and session options are not. Be careful to read this correctly. *At the time the comment was written*, system options were the only persistent options. System options, by definition, have to be persistent. The comment did not *forbid* session options from being persistent; it simply observes that, at that time, for the reasons we discussed, session options were not persistent. Nor does the comment say that, if you add persistent user-level options, that they have to be system options because only system options are persistent: you are free to add persistence to session options.
   
   Thanks splitting this PR into two: one for options, another for plugins. Else, discussion becomes overly complex.
   
   *Before* we worry about code details, we have to get the semantics right. After writing the comment on plugins, below, I realize that we are probably using the same words for entirely different ideas. Your goal is to add multi-tenant support to Drill. You uses the term "user options". I now realize you probably mean "tenant options."
   
   If we are doing per-user options (which is how I first read your description), then the requirements are one thing. If we want per-tenant options, the requirements are different. Let's list both scenarios: per-user options and per-tenant options. Please let us know which you are trying to build (or if you are tying to build something else entirely.)
   
   ## Per-User Persistent Options
   
   Suppose the goal is to let users (individual named humans) persist options. We assume that system-wide settings are set as system options. We are simply avoiding the need for the user to configure their own personal options on each login.
   
   Users will want to persist only *some* of their session options. We discussed that this may not work as one might think: we have to think through the issues. The point was that concurrency of user options will be a mess if they work like system options: if changes to user/session options are written to persistent storage immediately. This will screw up queries as I persist "all-text-mode" in one session, which breaks a query in another. Point is, user options *must* have semantics different from system options.
   
   * We need a concurrency guarantee. One choice is "options are propagated to all active sessions with unknown latency, just like system options." Another, more dependable, is "session options are initialized on session startup, after which they do not reflect changes made in other sessions." Whatever it is, the definition has to be stable and explainable.
   * Some options apply to queries: we've been using "all-text-mode" as an example. Such options will be set frequently. Since these options are per-query, we cannot set the option across all user sessions immediately, as that will break queries. Instead, the user has to *tell us* when they want to persist that setting. Otherwise the setting is per-session (that is, per-query.)
   
   Is it a bad thing that the behavior of queries depends on session options? Certainly. That was a poor design choice. But, it is how Drill works, and we have to design new features to reflect this choice.
   
   ## Per-Tenant Options
   
   Now, if the goal is to have "per-tenant" options, we have a much different problem. Each *tenant* is an organization which believes it has its own Drill cluster. That the cluster is shared is invisible to the tenant. The tenant reads in the docs that they can change system options, so they decide to do so. Those changes must persist only for that one *tenant* and all that tenant's users.
   
   I now see that this is probably what you are trying to implement. As a result, the semantics are closer to system options. For now, let's assume that you do not, in fact, need per-user (per-named human) persistent options.
   
   Let's list some possible requirements:
   
   * A bootstrap option enables multi-tenant support. The option is off by default, meaning that existing Drill behavior is unchanged by default.
   * A tenant is an organization with (a single user? multiple users?)
   * Each tenant has a *tenant admin* user who may set its own (subset of) system options. This subset is called *tenant preferences*. Such options persist, like session options, but apply only to sessions started for that tenant. Values set by one tenant cannot be seen by any other tenant.
   * Both server options and tenant preferences are system options: changes take effect immediately (after some latency) in all active sessions. Both are changed with the `ALTER SESSION` and related commands.
   * A new class of user, "tenant admin", is given permission to change tenant preferences. The "tenant admin" cannot change server options. When a tenant admin issues an `ALTER SESSION` command, that command alters the tenant preferences.
   * The full set of system options include options outside of the tenant preferences set. These are called *server options* and may only be set by a *server admin* user who belongs to the organization that administers the physical Drill cluster.
   * In multi-tenant mode, the Drill "admin" user becomes the *server admin* who can modify server options. The server admin can also view and change tenant preferences.When the server admin issues an `ALTER SESSION` command, that command alters the server options.
   * When not run in multi-tenant mode, server options and tenant preferences are identical, they are just *system options*. In multi-tenant mode, the concepts are distinct: each tenant has their own persistent set of tenant preferences separate from server options.
   * Normal users can change neither server options nor tenant preferences; such user can only change session options using the `ALTER SESSION` command.
   * The server options, tenant preferences, and session options form a hierarchy, as in present Drill. The server admin can alter any option which "flows down" to all other levels. The tenant admin can alter tenant preference which "flow down" to sessions.
   
   Code wise, you will be adding a new system options layer. You should mark options as "server", "tenant" or "user" so the above requirements can be enforced.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] paul-rogers edited a comment on pull request #2251: DRILL-7871 StoragePluginStore instances for different users

Posted by GitBox <gi...@apache.org>.
paul-rogers edited a comment on pull request #2251:
URL: https://github.com/apache/drill/pull/2251#issuecomment-885888891


   @vdiravka, thanks for the explanation. Let's be careful not to confuse existing implementation with new requirements. Yes, there is a comment that says that the system options are persistent and session options are not. Be careful to read this correctly. *At the time the comment was written*, system options were the only persistent options. System options, by definition, have to be persistent. The comment did not *forbid* session options from being persistent; it simply observes that, at that time, for the reasons we discussed, session options were not persistent. Nor does the comment say that, if you add persistent user-level options, that they have to be system options because only system options are persistent: you are free to add persistence to session options.
   
   I strongly recommend splitting this PR into two: one for options, another for plugins. Else, discussion becomes overly complex.
   
   *Before* we worry about code details, we have to get the semantics right. After writing the comment on plugins, below, I realize that we are probably using the same words for entirely different ideas. Your goal is to add multi-tenant support to Drill. You uses the term "user options". I now realize you probably mean "tenant options."
   
   If we are doing per-user options (which is how I first read your description), then the requirements are one thing. If we want per-tenant options, the requirements are different. Let's list both scenarios: per-user options and per-tenant options. Please let us know which you are trying to build (or if you are tying to build something else entirely.)
   
   ## Per-User Persistent Options
   
   Suppose the goal is to let users (individual named humans) persist options. We assume that system-wide settings are set as system options. We are simply avoiding the need for the user to configure their own personal options on each login.
   
   Users will want to persist only *some* of their session options. We discussed that this may not work as one might think: we have to think through the issues. The point was that concurrency of user options will be a mess if they work like system options: if changes to user/session options are written to persistent storage immediately. This will screw up queries as I persist "all-text-mode" in one session, which breaks a query in another. Point is, user options *must* have semantics different from system options.
   
   * We need a concurrency guarantee. One choice is "options are propagated to all active sessions with unknown latency, just like system options." Another, more dependable, is "session options are initialized on session startup, after which they do not reflect changes made in other sessions." Whatever it is, the definition has to be stable and explainable.
   * Some options apply to queries: we've been using "all-text-mode" as an example. Such options will be set frequently. Since these options are per-query, we cannot set the option across all user sessions immediately, as that will break queries. Instead, the user has to *tell us* when they want to persist that setting. Otherwise the setting is per-session (that is, per-query.)
   
   Is it a bad thing that the behavior of queries depends on session options? Certainly. That was a poor design choice. But, it is how Drill works, and we have to design new features to reflect this choice.
   
   ## Per-Tenant Options
   
   Now, if the goal is to have "per-tenant" options, we have a much different problem. Each *tenant* is an organization which believes it has its own Drill cluster. That the cluster is shared is invisible to the tenant. The tenant reads in the docs that they can change system options, so they decide to do so. Those changes must persist only for that one *tenant* and all that tenant's users.
   
   I now see that this is probably what you are trying to implement. As a result, the semantics are closer to system options. For now, let's assume that you do not, in fact, need per-user (per-named human) persistent options.
   
   Let's list some possible requirements:
   
   * A bootstrap option enables multi-tenant support. The option is off by default, meaning that existing Drill behavior is unchanged by default.
   * A tenant is an organization with (a single user? multiple users?)
   * Each tenant has a *tenant admin* user who may set its own (subset of) system options. This subset is called *tenant preferences*. Such options persist, like session options, but apply only to sessions started for that tenant. Values set by one tenant cannot be seen by any other tenant.
   * Both server options and tenant preferences are system options: changes take effect immediately (after some latency) in all active sessions. Both are changed with the `ALTER SESSION` and related commands.
   * A new class of user, "tenant admin", is given permission to change tenant preferences. The "tenant admin" cannot change server options. When a tenant admin issues an `ALTER SESSION` command, that command alters the tenant preferences.
   * The full set of system options include options outside of the tenant preferences set. These are called *server options* and may only be set by a *server admin* user who belongs to the organization that administers the physical Drill cluster.
   * In multi-tenant mode, the Drill "admin" user becomes the *server admin* who can modify server options. The server admin can also view and change tenant preferences.When the server admin issues an `ALTER SESSION` command, that command alters the server options.
   * When not run in multi-tenant mode, server options and tenant preferences are identical, they are just *system options*. In multi-tenant mode, the concepts are distinct: each tenant has their own persistent set of tenant preferences separate from server options.
   * Normal users can change neither server options nor tenant preferences; such user can only change session options using the `ALTER SESSION` command.
   * The server options, tenant preferences, and session options form a hierarchy, as in present Drill. The server admin can alter any option which "flows down" to all other levels. The tenant admin can alter tenant preference which "flow down" to sessions.
   
   Code wise, you will be adding a new system options layer. You should mark options as "server", "tenant" or "user" so the above requirements can be enforced.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] vdiravka commented on pull request #2251: DRILL-7871 StoragePluginStore instances for different users

Posted by GitBox <gi...@apache.org>.
vdiravka commented on pull request #2251:
URL: https://github.com/apache/drill/pull/2251#issuecomment-879480858


   Hi @paul-rogers,
   
   I started design for system options from adding the new additional layer. But this approach is much more complex and we should cover all the cases you described.
   The design for the feature proposed by me is other: we just don't use global system options at all (in case `separate_workspace` is enabled). Every user has it's own system options. It gives a great functionality, which drill required for a long time, I think. We used session options for this purpose, but it can't be serialized. I also abandon the idea to Store Session options for the same reason, it has other concept and couples with a Query.
   
   About plugins the same thing, I tried to keep things as simple as it is possible (it corresponds to your last sentence).
   * I agree we can factor out system plugins and use that shared plugins for all users. It is minor improvement, it allows to keep less things in memory. This is not a blocker for this PR. I will create a ticket and proceed working on it after finishing this PR.
   * Also I agree with it is nice to have Group/Organization level of plugins and `System Options`. I noted about it in Design document. But it is much more complex improvement, requires introducing a lot of new rules and grants. I thought to consider working on it after finishing on this PR. Because now separate plugins and options can provide all things needed for users, but setting up is not so comfortable as it can be with Group/Organization level of these instances. And there are more similar instances are placed in memory now (separate instances for Groups/Organizations will improve this too)
   * About concurrency and distributed systems issues, they are all the same as it is now. Since event now one user can run query and other can edit the plugin config. Now the picture is even better, since two user can't sue common plugins registry or system options. Whey their own ones. They don't impact each other.
   
   About dividing the PR I also thought about it initially, because leverage of System Options is other than plugins, so a lot of things for options is other. But options and plugins are combined with a common idea - to have the separate workspace for users: plugin configs, system options and to see their own query profiles only. And the approach for implementation is very similar - to replace the global instances with their own user-session ones.
   So I can here to divide the PR in two parts or just in two commits for easier review. But I think we can also go with one PR and commit.
   
   So let me know do you like the above approach for this feature with a two (or more) further improvements or possibly you want to bring mail discussion on this topic with other devs?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] paul-rogers commented on pull request #2251: DRILL-7871 StoragePluginStore instances for different users

Posted by GitBox <gi...@apache.org>.
paul-rogers commented on pull request #2251:
URL: https://github.com/apache/drill/pull/2251#issuecomment-881155695


   Next, let's discuss plugins. Let's start with three goals:
   
   1. System plugins work for everyone (we can add security later).
   2. By default, plugins are available to everyone. That is, the Drill out-of-the-box plugins are available so that, say, `cp` works and people can try examples from the documentation.
   3. There is an optional way to create per-user plugin (configs).
   
   What is the simplest way to do this? Maybe by changing the plugin resolution rules. When the planner looks for a plugin, it tries two names: `$$<user>$$<plugin>` and `<plugin>`. We use the resulting name in the query plan. So, if I have (unwisely) created my own `cp` plugin, when I ask for `cp`, that's what I get:  `$$paul$$cp`. If I do not have my own `dfs`, then when I ask for it, I get the (global) `dfs`. (The `$$<user>$$` is something I just made up; it can be anything as long as the marker is not likely to ever appear in a real plugin name. Maybe `..<user>..` or `\<user>\` would be better since they are not legal SQL.)
   
   When a user stores a plugin, and the per-user config feature is enabled, then the UI appends the user prefix. The UI filters out all plugins owned by other users. When the admin works with plugins, there is no prefix. This allows the admin to see, and change, both `cp` and `$$paul$$cp`. If I leave the organization, the admin can remove all the `$$paul$$...` plugins.
   
   I suspect you'll find that, with this solution, *no* change is needed in the plugin registry. Plans will resolve to the correct reader configs. The reader configs in the query plan will just work (as they would in your design.)
   
   How does this help sharing? Simply ask the admin to change the name of the plugin from `$$paul$$whizbang` to just `whizbang`. Now, everyone can see it and my queries work without change. Super simple. We have only two security levels (everyone or single user), but that is **far** better than reuse by copy/paste. The design could be extended to the role-based one without breaking anything.
   
   The next step is store the per-user plugins in separate persistent stores. This is the "cleaner" design, but now you have to create the means for an admin to see, change, and remove the user-level plugins. You also have to extend the plugin registry, which is, as we noted, a complex beast. Any change requires *extensive* unit tests.
   
   By the way, when I revised the plugin registry a while back, I fixed dozens of bugs. Most were simply because I added tests and found all kinds of broken stuff. Concurrency was broken, upgrades were broken, expiring of old plugins was broken. and so on. I still have scars. So, I'm trying to save you from similar pain.
   
   Will something like this work and avoid the hard work we discussed earlier?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] vdiravka commented on a change in pull request #2251: DRILL-7871 StoragePluginStore instances for different users

Posted by GitBox <gi...@apache.org>.
vdiravka commented on a change in pull request #2251:
URL: https://github.com/apache/drill/pull/2251#discussion_r664845699



##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePluginRegistry.java
##########
@@ -21,19 +21,18 @@
 import java.util.Map;
 import java.util.Set;
 
+import org.apache.calcite.schema.SchemaPlus;

Review comment:
       > Are we controlling access to the "connectors" (code) or to the "instances" (configs?)
   * connectors and configs
   > If we control the connectors, do we load new copies of the connector metadata per user?
   * yes, we do
   > Can a user see the common set of system-wide configs? Or, only their own private configs?
   * only their private
   > If system wide, how do we handle name conflicts? I add a "foo" plugin, later the admin adds a system-wide "foo" or visa-versa.
   * answered above
   > Can a private plugin be promoted to system level? Is that a copy/paste operation? With the user logged in as different windows? Or, is there a button to do the operation?
   * there is no such functionality yet
   > When is the per-user information created? As you know, on the web, the user session comes into existence for every single REST call. Plugin setup is pretty heavy-weight. Is the user config cached between requests? If so, when do we expire the user information if the user then becomes idle?
   * the users plugins is created and placed within the user session. The caching mechanism is absolutely the same as for for the system plugins registry (`ephemeralPlugins` is kept within every session `UserStoragePluginRegistry` inheritor of the `StoragePluginRegistryImpl`). If the UserSession is closed, the `ephemeralPlugins` is persisted to ZK similar to the global plugins approach.
   > The registry handles updates. The current code is rather awful since there is no coordination among Drillbits: multiple drillbits could come up at the same time and they can all decide to do the upgrade. Adding users complicates the effort. Will a user session trigger a plugin upgrade? How do we handle races?
   * Any plugins update or any query run is performed via the client session. A new User Session can handle only own `UserStoragePluginRegistry` instance. Handling races mechanism is the same as for `StoragePluginRegistryImpl`, but it is event simpler - the user within one session can't update plugins and run query simultaneously.
   > This registry handles a subtle, obscure case. Suppose (before this change), I run a query that users the "foo" plugin. Before the query starts running (but after planning), I rush in and delete the "foo" config. What happens? As it turns out, Drill saves the old config so that queries continue to work. This works because there is only one registry. If we add per-user registries, we'd need to pass the user info with the exec plan so that the readers grab the user's version of the config, including any cached, recently-deleted configs. (Distributed systems are complex!)
   * for this case the query can fail indeed. But do we expect it to be finished?! I don't think so. as the config was deleted and it is fine that the query based on this config is failed.
   
   The global idea is to have separate `StoragePluginRegistryImpl` per UserSession and to keep all other functionality without changes. It allows to avoid any unexpected behavior. Absolutely the same story with `SystemOptionManager`
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] vdiravka commented on pull request #2251: DRILL-7871 StoragePluginStore instances for different users

Posted by GitBox <gi...@apache.org>.
vdiravka commented on pull request #2251:
URL: https://github.com/apache/drill/pull/2251#issuecomment-882856859


   * **Motivation:**
   
   The goal is to allow multi-tenant operations. So it doesn't matter what API user uses: REST, JDBC ot ODBC. This is true that Drill initially wasn't designed for that, but we ca add it.
   
   * **Options:** 
   > The user has to be able to set the user options. We don't need a new option level. Simply add a new statement: ALTER USER > SET <name>=<value>. This would do two things:
   > Write the given key/value pair into the persistent store for the user.
   > Change the session option in the current session to that new value.
   
   This is also questionable, because it breaks `SessionOptionManager extends InMemoryOptionManager`:
   ```
   This is an OptionManager that holds options in memory rather than in a persistent store. Options stored in 
   SessionOptionManager, QueryOptionManager, and FragmentOptionManager are held in memory 
   (see options) whereas the SystemOptionManager stores options in a persistent store.
   ```
   Most likely all easy ways here are some sort of hacks. So let's do that in correct manner: adding the new `UserOptionManager` as a layer between `SystemOptionManager` and `SessionOptionManager`.
   
   > The next step is store the per-user plugins in separate persistent stores. This is the "cleaner" design, but now you have to 
   > create the means for an admin to see, change, and remove the user-level plugins. You also have to extend the plugin registry, > which is, as we noted, a complex beast. Any change requires extensive unit tests.
   
   Different PS can be chosen per Drill doc. This is independent from Plugins. We have `HBasePersistentStore`, `InMemoryStore`, `LocalPersistentStore`, `MongoPersistentStore`, `NoWriteLocalStore`, `ZookeeperPersistentStore` implementations for PS.
   If you mean different instances of the same`PS`, that is not needed. PSRegistry is just a code, which allows to access actual PS. We can manage access via the common PSRegistry. Or we can reconsider it in future.
   
   *  **Plugins:**
   
   About plugins it makes sense to add Global plugins and to use it in case User plugin is absent. It's like a plugins fallback.
   And system plugins can be used only in scope of Global plugins. I don't think user name in schema is a nice and elegant solution. I think plugins fallback is simpler and more convenient approach.
   
   * **Conclusion:**
   
   So I am going to add this functionality to current implementation and separate out `user options` part of this PR into the new one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] paul-rogers commented on pull request #2251: DRILL-7871 StoragePluginStore instances for different users

Posted by GitBox <gi...@apache.org>.
paul-rogers commented on pull request #2251:
URL: https://github.com/apache/drill/pull/2251#issuecomment-885888891


   @vdiravka, thanks for the explanation. Let's be careful not to confuse existing implementation with new requirements. Yes, there is a comment that says that the system options are persistent and session options are not. Be careful to read this correctly. *At the time the comment was written*, system options were the only persistent options. System options, by definition, have to be persistent. The comment did not *forbid* session options from being persistent; it simply observes that, at that time, for the reasons we discussed, session options were not persistent. Nor does the comment say that, if you add persistent user-level options, that they have to be system options because only system options are persistent: you are free to add persistence to session options.
   
   The gist of your requirement is to let users persist *some* of their session options. We discussed that this may not work as one might think: we have to think through the issues. The point was that concurrency of user options will be a mess if they work like system options: if changes to user/session options are written to persistent storage immediately. This will screw up queries as I persist "all-text-mode" in one session, which breaks a query in another. Point is, user options *must* have semantics different from system options.
   
   *Before* we worry about code details, we have to get the semantics right. To repeat:
   
   * We need a concurrency guarantee. One choice is "options are propagated to all active sessions with unknown latency, just like system options." Another, more dependable, is "session options are initialized on session startup, after which they do not reflect changes made in other sessions." Whatever it is, the definition has to be stable and explainable.
   * Some options apply to queries: we've been using "all-text-mode" as an example. Such options will be set frequently. Since these options are per-query, we cannot set the option across all user sessions immediately, as that will break queries. Instead, the user has to *tell us* when they want to persist that setting. Otherwise the setting is per-session (that is, per-query.)
   
   Is it a bad thing that the behavior of queries depends on session options? Certainly. That was a poor design choice. But, it is how Drill works, and we have to design new features to reflect this choice.
   
   The proposal I outlined is the simplest way to achieve with concurrency semantics we could actually support. Feel free to suggest a better design; but let's discuss that design *before* we reduce it to code. Either way, the design has to reflect our requirements and constraints, not just follow existing code.
   
   I strongly recommend splitting this PR into two: one for options, another for plugins. Else, discussion becomes overly complex.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] paul-rogers commented on pull request #2251: DRILL-7871 StoragePluginStore instances for different users

Posted by GitBox <gi...@apache.org>.
paul-rogers commented on pull request #2251:
URL: https://github.com/apache/drill/pull/2251#issuecomment-881179890


   One more thought: it is not clear from the PR or design document *why* you need these changes, so I'm having to guess. I may be guessing wrong. Can you spell out the use case?
   
   For example, I could imagine that you need the persistent user options because you want to use the REST API, and that API does not have a persistent session, so options don't persist. As it turns out, someone kindly recently added the ability to pass along options on each query request. This is, in fact, how Impala works to avoid the need for session state. So, you can achieve the same result by keeping options on the client.
   
   Or, you can achieve the same result by adding a session concept to the REST API. Log in and get a session token back. Send that token on subsequent requests to identify the user and to key to a cached session object. The session ends after a timeout or on an explicit logout call.
   
   If the per-user plugins are to allow multi-tenant operation, then the problem is considerably more complex as much of Drill is not designed for that. (You'd want to secure query profiles, police resource usage, etc.) If it is to allow different users in the same organization to define their own plugins, then you have the sharing issue we discussed.
   
   In short, if you can explain your goals, we can perhaps help you find a workable solution that achieves the goals. Without knowing the goals, then I'm free to make up my own understanding of what you want...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] paul-rogers commented on pull request #2251: DRILL-7871 StoragePluginStore instances for different users

Posted by GitBox <gi...@apache.org>.
paul-rogers commented on pull request #2251:
URL: https://github.com/apache/drill/pull/2251#issuecomment-887063203


   @vdiravka, here I will suggest the broad outline of a design which meets the very limited goals of this PR while also providing a path forward for a generally useful set of Apache Drill features.
   
   We introduce one new concept, the *tenant*. Key concepts:
   
   * A tenant is a *name space* for options and plugin configs. The tenant level is optional; by default, no tenants exist.
   * Each user has an optional associated *tenant ID*. A user with no tenant ID works at the system level. (Example: the Drill admin, or a user who sets up demo configs shared by all users.) A user with an associated tenant ID gives the tenant name space to use. There may be 0, 1 or many users associated with a tenant.
   * A persistent *tenant options* name space sits between the system options and the session options in the option hierarchy. Users "inherit" options from the system layer, unless overridden at the tenant layer, unless overridden at the session layer.
   * Queries and plugin config-related API calls resolve plugin configs within the *tenant plugin config name space*. On lookups, if the config is not found in the tenant name space, repeat the search in the global name space. (Allows system-wide configs for such things as demo data.)
   
   Now we can decide how much of the above to implement in this PR, leaving the rest to future PRs. Given your goals, it would seem the minimum is:
   
   * Provide an API or other means to define tenants and to associate tenants with users.
   * Provide documentation for how to inspect or remove per-tenant name spaces (e.g. if the tenant cancels service or is moved to another cluster.)
   * Tenant membership is enforced for the native and REST query APIs.
   * Tenant users can see the Drill Console, but features that are not "tenant-ready" should be disabled. That is, a tenant user might only be able to run a query, and see their options, but not see query profiles, etc.
   
   SQL semantics:
   
   * The `ALTER SYSTEM` command affects the tenant option store given by the user's tenant ID, if any. *System* (e.g. server-wide) options are set using the same command issued by a user with no tenant ID.
   * System tables are not available to users associated with a tenant ID. (Later, this can be modified to filter system table contents to include only information for that tenant.)
   
   An possible code design might be:
   
   * Add a tenant registry which can be as simple as the known, valid tenant IDs. This "registry" can simply be an API, with a default implementation that does nothing.
   * Add a field to the user to hold the tenant ID. When resolving a user name, obtain the tenant ID along with the admin/normal-user access level.
   * Unfortunately, Drill does not have a `User` object, we assume the only information for a user is the user name. Define a new `User` object which holds the user ID and tenant ID. (Ideally, it would also hold the security level, etc.)
   * We know that the system must determine if the user is an admin. Extend that mechanism to also provide the tenant ID. Modify that code to provide a new method which returns the `User` object. A storage plugin manager client can either use an existing `User` object, or obtain a new one by calling to this new `User` factory method.
   * Introduce the tenant option manager which acts like the system option manager, and persists to a location that includes the tenant ID in the name. Persistence is to a location which includes the tenant ID in the name.
   * When a session starts, if the user has a tenant ID, add the tenant option manager to the session option manager stack. This one change should ensure that the option solution "just works" everywhere else.
   * Modify the storage plugin manager to require a `User` object for every config CRUD method. The `User` says which tenant store to read from or write to.
   * The storage plugin registry currently uses keys to locate configs. This key is extended with the tenant ID. Modify config resolution to use the tenant ID. If the config is not found in the tenant store, or the tenant ID is not set for the user, repeat the search with the default "system" config store. When modifying configs, write to the persistent store given by the tenant ID.
   * Modify all clients of the storage plugin manager to pass along the `User` object, creating one if needed.
   * System tables do not have an associated config. For system tables (only), enforce the rule above: if a plugin lookup resolves to the system plugin, and the user has an associated tenant ID, act as if the plugin is not found.
   
   This proposed design accomplishes two goals. First, it is limited in scope to only the limited needs of the PR author. Second, it is constructed in a way that others can extend the solution toward a full multi-tenant solution in the future without breaking backward compatibility.
   
   Note that this design is a *multi-tenant* solution, not a *per-user* solution. I would suggest changing the title of the PR to match the intent to (partially) solve the multi-tenant problem.
   
   The good news is that this is more-or-less in line with some of the code in this PR.
   
   @vdiravka, please review these suggestions, and the need to offer a solution that is of general interest to the Drill community. Feel free to use the above, or to modify them, to propose a revised set of requirements and designs. Once we're on the same page at the design level, attention can return to review of the code itself. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] vdiravka commented on pull request #2251: DRILL-7871 StoragePluginStore instances for different users

Posted by GitBox <gi...@apache.org>.
vdiravka commented on pull request #2251:
URL: https://github.com/apache/drill/pull/2251#issuecomment-882856859


   * **Motivation:**
   
   The goal is to allow multi-tenant operations. So it doesn't matter what API user uses: REST, JDBC ot ODBC. This is true that Drill initially wasn't designed for that, but we ca add it.
   
   * **Options:** 
   > The user has to be able to set the user options. We don't need a new option level. Simply add a new statement: ALTER USER > SET <name>=<value>. This would do two things:
   > Write the given key/value pair into the persistent store for the user.
   > Change the session option in the current session to that new value.
   
   This is also questionable, because it breaks `SessionOptionManager extends InMemoryOptionManager`:
   ```
   This is an OptionManager that holds options in memory rather than in a persistent store. Options stored in 
   SessionOptionManager, QueryOptionManager, and FragmentOptionManager are held in memory 
   (see options) whereas the SystemOptionManager stores options in a persistent store.
   ```
   Most likely all easy ways here are some sort of hacks. So let's do that in correct manner: adding the new `UserOptionManager` as a layer between `SystemOptionManager` and `SessionOptionManager`.
   
   > The next step is store the per-user plugins in separate persistent stores. This is the "cleaner" design, but now you have to 
   > create the means for an admin to see, change, and remove the user-level plugins. You also have to extend the plugin registry, > which is, as we noted, a complex beast. Any change requires extensive unit tests.
   
   Different PS can be chosen per Drill doc. This is independent from Plugins. We have `HBasePersistentStore`, `InMemoryStore`, `LocalPersistentStore`, `MongoPersistentStore`, `NoWriteLocalStore`, `ZookeeperPersistentStore` implementations for PS.
   If you mean different instances of the same`PS`, that is not needed. PSRegistry is just a code, which allows to access actual PS. We can manage access via the common PSRegistry. Or we can reconsider it in future.
   
   *  **Plugins:**
   
   About plugins it makes sense to add Global plugins and to use it in case User plugin is absent. It's like a plugins fallback.
   And system plugins can be used only in scope of Global plugins. I don't think user name in schema is a nice and elegant solution. I think plugins fallback is simpler and more convenient approach.
   
   * **Conclusion:**
   
   So I am going to add this functionality to current implementation and separate out `user options` part of this PR into the new one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] jnturton commented on pull request #2251: DRILL-7871 StoragePluginStore instances for different users

Posted by GitBox <gi...@apache.org>.
jnturton commented on pull request #2251:
URL: https://github.com/apache/drill/pull/2251#issuecomment-1048739204


   Converting to draft while there is still discussion about how we approach the design.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] paul-rogers commented on pull request #2251: DRILL-7871 StoragePluginStore instances for different users

Posted by GitBox <gi...@apache.org>.
paul-rogers commented on pull request #2251:
URL: https://github.com/apache/drill/pull/2251#issuecomment-885928024


   @vdiravka, let's think about your larger goal.  If the goal is a true multi-tenant model, in which tenants are distinct business entities (rather than different departments within a single entity), then we must also ensure each tenant has access to only their own query profiles. Will there be another PR for this? Is there a roll-up Jira ticket for all the issues involved in true multi-tenant operation?
   
   Also, the Drill UI assumes a single organization. If the UI is to be exposed to multi-tenant users (so they can monitor queries, see query profiles, etc.), then the UI must change. Tenants should not be able to see details, or change the state of, Drillbits. Session options should reflect tenant values. Probably other changes. Will there be a design or PR for this?
   
   Further, each tenant must have guarantees on resources. That is, Tenant A should not be able to run a huge query that denies resources sold to Tenant B. This is a **very hard** problem. If you don't solve the hard problem, the options and plugins are somewhat moot. The option and plugin features are handy, but do not, by themselves, give you multi-tenant support in Drill.
   
   System tables could "leak" information between tenants. Should system tables be disabled? What changes would be needed to ensure that no per-tenant information "leaks" to another tenant?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] vdiravka commented on a change in pull request #2251: DRILL-7871 StoragePluginStore instances for different users

Posted by GitBox <gi...@apache.org>.
vdiravka commented on a change in pull request #2251:
URL: https://github.com/apache/drill/pull/2251#discussion_r663159582



##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/StatusResources.java
##########
@@ -112,7 +115,9 @@ public String getMetrics(@PathParam("hostname") String hostname) throws Exceptio
   private List<OptionWrapper> getSystemOptionsJSONHelper(boolean internal)
   {
     List<OptionWrapper> options = new LinkedList<>();
-    OptionManager optionManager = work.getContext().getOptionManager();
+    OptionManager optionManager = authEnabled.separateWorkspace()

Review comment:
       Done.

##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/StatusResources.java
##########
@@ -183,9 +188,9 @@ public Viewable getSystemInternalOptions(@Context UriInfo uriInfo) {
   public Viewable updateSystemOption(@FormParam("name") String name,
                                    @FormParam("value") String value,
                                    @FormParam("kind") String kind) {
-    SystemOptionManager optionManager = work.getContext()
-      .getOptionManager();
-
+    SystemOptionManager optionManager = authEnabled.separateWorkspace()

Review comment:
       > This seems to say that updating a system option works differently depending on whether the user has their own workspace.
   
   - Yes
   
   > Shouldn't system options always update the (global) system options? Otherwise, do we need a "real system option" attribute to really update the global options?
   
   - No, we don't care about real system options in case separate_workspace is enabled. The user and all his activity relies on his own SystemOptions.

##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java
##########
@@ -71,6 +73,87 @@
  */
 public class SystemOptionManager extends BaseOptionManager implements AutoCloseable {
   private static final Logger logger = LoggerFactory.getLogger(SystemOptionManager.class);
+  public static final String OPTIONS_PSTORE_NAME = "options";

Review comment:
       _Note:_ changes in `SystemOptionManager` is just refactoring (placing code blocks in right places)

##########
File path: exec/java-exec/src/test/java/org/apache/drill/exec/store/TestUserPluginRegistry.java
##########
@@ -0,0 +1,201 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store;
+
+import org.apache.drill.common.config.DrillProperties;
+import org.apache.drill.common.logical.StoragePluginConfig;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.rpc.user.UserServer;
+import org.apache.drill.exec.rpc.user.UserSession;
+import org.apache.drill.exec.rpc.user.security.testing.UserAuthenticatorTestImpl;
+import org.apache.drill.exec.store.dfs.FileSystemConfig;
+import org.apache.drill.exec.store.dfs.FileSystemPlugin;
+import org.apache.drill.test.ClientFixture;
+import org.apache.drill.test.ClusterFixture;
+import org.apache.drill.test.ClusterFixtureBuilder;
+import org.junit.Test;
+
+import java.util.HashMap;
+import java.util.Map;
+
+import static org.apache.drill.exec.rpc.user.security.testing.UserAuthenticatorTestImpl.TEST_USER_1;
+import static org.apache.drill.exec.rpc.user.security.testing.UserAuthenticatorTestImpl.TEST_USER_1_PASSWORD;
+import static org.apache.drill.exec.rpc.user.security.testing.UserAuthenticatorTestImpl.TEST_USER_2;
+import static org.apache.drill.exec.rpc.user.security.testing.UserAuthenticatorTestImpl.TEST_USER_2_PASSWORD;
+import static org.apache.drill.exec.util.StoragePluginTestUtils.CP_PLUGIN_NAME;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertNotNull;
+import static org.junit.Assert.assertNotSame;
+import static org.junit.Assert.assertNull;
+import static org.junit.Assert.assertSame;
+import static org.junit.Assert.assertTrue;
+import static org.junit.Assert.fail;
+
+public class TestUserPluginRegistry extends BasePluginRegistry {

Review comment:
       The per-user registry doesn't interact with the system one. Except that every per-user one has an initial state corresponding to the system one.
   
   Also we may want in future to enable this feature by default and then check all tests pass.
   

##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/StorageResources.java
##########
@@ -138,7 +141,9 @@ public Viewable getPlugins() {
   @Produces(MediaType.APPLICATION_JSON)
   public PluginConfigWrapper getPluginConfig(@PathParam("name") String name) {
     try {
-      return new PluginConfigWrapper(name, storage.getStoredConfig(name));
+      return authEnabled.separateWorkspace()

Review comment:
       fully agree. Done

##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/work/prepare/PreparedStatementProvider.java
##########
@@ -138,7 +139,9 @@ public void run() {
         final QueryId limit0QueryId = userWorker.submitWork(wrapper, limit0Query);
 
         final long timeoutMillis =
-            userWorker.getSystemOptions().getOption(CREATE_PREPARE_STATEMENT_TIMEOUT_MILLIS).num_val;
+            userWorker.getDrillbitContext().getConfig().getBoolean(ExecConstants.SEPARATE_WORKSPACE)

Review comment:
       Yes, every user starts to have the separate instance of `SystemOptions`. The original one is needed for consistency and to start Drillbit. We can chose in future one of the ways: 
   * removing the global SystemOptions 
   * leveraging this global SystemOptions for some purpose (as describe in 5.2 section of [doc](https://docs.google.com/document/d/1mFjDGPHID70SG5uwP8FZImI0eyEwjSmbTCaSzOU7sjc/edit#heading=h.iqtfxhbbot9c))
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] paul-rogers edited a comment on pull request #2251: DRILL-7871 StoragePluginStore instances for different users

Posted by GitBox <gi...@apache.org>.
paul-rogers edited a comment on pull request #2251:
URL: https://github.com/apache/drill/pull/2251#issuecomment-885898568


   @vdiravka, on the plugins side, I think the confusion over "user" and "tenant" has us talking about different designs. Again, let's discuss per-user and per-tenant designs. Tell me which you are trying to build.
   
   Clarifying the desired tenant model may help with the options discussion. I've been equating "user" with "person". If your tenant model equates "user" with "tenant", you are actually trying to define per-tenant defaults. Perhaps update the docs to clearly say so. Let's use the term "tenant" for a business entity and "user" for a human, perhaps within that tenant.
   
   ## Per-User Plugin Configs
   
   The goal here is to allow users (humans) to create their own private plugin configs. Since the users work in a common organization, users may wish to share configs without copy/paste reuse.
   
   Requirements for this case:
   
   * Provide a bootstrap option to enable the new semantics. The default is off, meaning that Drill behavior will not change by default.
   * Retain global plugin configurations. (They allow multiple users to share the same configuration without copy/paste.)
   * Add per-user plugin configurations.
   * Users can add and change only their own per-user configurations. (No user can "share" a config with another user: either the config must be copied to a global config by the admin, or the second user must make their own copy.)
   * If a user configuration happens to have the same name as a global configuration, the per-user configuration takes precedence. (If the admin make a global version of a now-shared config, the user must delete his own copy in order to use the now-shared version.)
   * No non-admin user can see or change the configuration for another user. (That is, if two people work in the same group, and want to share a config, they must do so via copy/paste, or by asking the admin to make the plugin global to all users.)
   * No non-admin user can change a global configuration.
   * The admin user can see, add, modify or delete *any* plugin configuration, both global and per-user. (Without this, the system is not maintainable as the admin won't be able to manage per-user configurations.)
   
   ## Per-Tenant Plugin Configs
   
   The other way to read your comments is that you want plugin configs per *tenant* (business entity). In this case, as we said earlier, Tenant A should not know that there is a Tenant B on the same cluster. There is likely no shared storage or configs between tenants. The only exception would be system-level configs offered by the organization that owns the Drill cluster, such as the system tables. Maybe some demo data.
   
   Requirements in this case:
   
   * Provide a bootstrap option to enable multi-tenant support. The default is off, meaning that Drill behavior will not change by default.
   * A set of global plugin configuration is provided by, and can only be changed by, the server admin (as defined in the earlier note.) If no global plugins are to be provided, then delete all global plugin configs.
   * The system table is not controlled by a plugin config. If tenants are to be prohibited from using system tables, provide an option to disable system tables. (Or, a per a later note, ensure that system tables do not leak tenant information.)
   * Tenants see only their own configs. To the tenant, the per-tenant configs *are* the plugin configs.
   * UI and API operations that operate on configs, when run in multi-tenant mode, implicitly apply to the set only for that one tenant. That is, there is no new UI or API; the existing operations simply change meaning to apply to only the active tenant.
   * Every user (except the server admin) is associated with a tenant. If the user has permission to modify plugins, then all changes apply to only that one tenant. (Possible refinement: only the tenant admin, as defined earlier, can change tenant plugin configs.)
   * A tenant may (may not?) have multiple users. All tenant users see the same set of tenant plugins. There are no "per-user" plugins, only per-tenant plugins.
   * Tenants can add and change only their own per-tenant configurations. No tenant can "share" a config with another tenant.
   * If a tenant configuration happens to have the same name as a global configuration, the per-tenant configuration takes precedence.
   * No tenant can change a global configuration.
   * The server admin user can see, add, modify or delete *any* plugin configuration, both global and per-tenant. (Without this, the system is not maintainable as the admin won't be able to manage per-tenant configurations.)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] vvysotskyi commented on pull request #2251: DRILL-7871 StoragePluginStore instances for different users

Posted by GitBox <gi...@apache.org>.
vvysotskyi commented on pull request #2251:
URL: https://github.com/apache/drill/pull/2251#issuecomment-866817374


   @vdiravka, looks like the test you have added is failing in CI, could you please take a look?
   
   Could you please share some functional-level doc with the description use cases and behavior, so it will be easier to do the code review?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org