You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by chunhui-shi <gi...@git.apache.org> on 2017/11/10 06:22:38 UTC

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

GitHub user chunhui-shi opened a pull request:

    https://github.com/apache/drill/pull/1032

    DRILL-5089: Dynamically load schema of storage plugin only when neede…

    …d for every query
    
    For each query, loading all storage plugins and loading all workspaces under file system plugins is not needed.
    
    This patch use DynamicRootSchema as the root schema for Drill. Which loads correspondent storage only when needed.
    
    infoschema to read full schema information and load second level schema accordingly.
    
    for workspaces under the same Filesyetm, no need to create FileSystem for each workspace.
    
    use fs.access API to check permission which is available after HDFS 2.6 except for windows + local file system case.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/chunhui-shi/drill DRILL-5089-pull

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/1032.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1032
    
----
commit a381677c59a7371733bae12ad4896b7cc927da5e
Author: chunhui-shi <cs...@maprtech.com>
Date:   2017-11-03T00:06:25Z

    DRILL-5089: Dynamically load schema of storage plugin only when needed for every query
    
    For each query, loading all storage plugins and loading all workspaces under file system plugins is not needed.
    
    This patch use DynamicRootSchema as the root schema for Drill. Which loads correspondent storage only when needed.
    
    infoschema to read full schema information and load second level schema accordingly.
    
    for workspaces under the same Filesyetm, no need to create FileSystem for each workspace.
    
    use fs.access API to check permission which is available after HDFS 2.6 except for windows + local file system case.

----


---

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1032#discussion_r150684114
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemSchemaFactory.java ---
    @@ -52,9 +55,20 @@
     
       private List<WorkspaceSchemaFactory> factories;
       private String schemaName;
    +  protected FileSystemPlugin plugin;
     
       public FileSystemSchemaFactory(String schemaName, List<WorkspaceSchemaFactory> factories) {
         super();
    +    if (factories.size() > 0 ) {
    +      this.plugin = factories.get(0).getPlugin();
    +    }
    +    this.schemaName = schemaName;
    +    this.factories = factories;
    +  }
    +
    +  public FileSystemSchemaFactory(FileSystemPlugin plugin, String schemaName, List<WorkspaceSchemaFactory> factories) {
    +    super();
    --- End diff --
    
    Omit; Java does this by default.


---

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1032#discussion_r150680454
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java ---
    @@ -0,0 +1,125 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.exec.planner.sql;
    +
    +import com.google.common.collect.ImmutableSortedSet;
    +import com.google.common.collect.Lists;
    +import com.google.common.collect.Sets;
    +import org.apache.calcite.DataContext;
    +import org.apache.calcite.jdbc.CalciteRootSchema;
    +import org.apache.calcite.jdbc.CalciteSchema;
    +
    +import org.apache.calcite.linq4j.tree.Expression;
    +import org.apache.calcite.linq4j.tree.Expressions;
    +import org.apache.calcite.schema.SchemaPlus;
    +import org.apache.calcite.schema.impl.AbstractSchema;
    +import org.apache.calcite.util.BuiltInMethod;
    +import org.apache.calcite.util.Compatible;
    +import org.apache.drill.common.exceptions.ExecutionSetupException;
    +import org.apache.drill.exec.store.SchemaConfig;
    +import org.apache.drill.exec.store.StoragePlugin;
    +import org.apache.drill.exec.store.StoragePluginRegistry;
    +import org.apache.drill.exec.store.SubSchemaWrapper;
    +
    +import java.io.IOException;
    +import java.util.Map;
    +import java.util.NavigableSet;
    +import java.util.Set;
    +
    +public class DynamicRootSchema extends DynamicSchema
    +    implements CalciteRootSchema {
    +
    +  /** Creates a root schema. */
    +  DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig schemaConfig) {
    +    super(null, new RootSchema(), "");
    +    this.schemaConfig = schemaConfig;
    +    this.storages = storages;
    +  }
    +
    +  @Override
    +  public CalciteSchema getSubSchema(String schemaName, boolean caseSensitive) {
    +    CalciteSchema retSchema = getSubSchemaMap().get(schemaName);
    +
    +    if (retSchema == null) {
    +      loadSchemaFactory(schemaName, caseSensitive);
    +    }
    +
    +    retSchema = getSubSchemaMap().get(schemaName);
    +    return retSchema;
    +  }
    +
    +  @Override
    +  public NavigableSet<String> getTableNames() {
    +    Set<String> pluginNames = Sets.newHashSet();
    +    for (Map.Entry<String, StoragePlugin> storageEntry : getSchemaFactories()) {
    +      pluginNames.add(storageEntry.getKey());
    +    }
    +    return Compatible.INSTANCE.navigableSet(
    +        ImmutableSortedSet.copyOf(
    +            Sets.union(pluginNames, getSubSchemaMap().keySet())));
    +  }
    +
    +  /**
    +   * load schema factory(storage plugin) for schemaName
    +   * @param schemaName
    +   * @param caseSensitive
    +   */
    +  public void loadSchemaFactory(String schemaName, boolean caseSensitive) {
    +    try {
    +      SchemaPlus thisPlus = this.plus();
    +      StoragePlugin plugin = getSchemaFactories().getPlugin(schemaName);
    +      if (plugin != null) {
    +        plugin.registerSchemas(schemaConfig, thisPlus);
    +      }
    +      else {
    +        //this schema name could be `dfs.tmp`, a 2nd level schema under 'dfs'
    +        String[] paths = schemaName.split("\\.");
    +        if (paths.length == 2) {
    +          plugin = getSchemaFactories().getPlugin(paths[0]);
    +          if (plugin != null) {
    +            plugin.registerSchemas(schemaConfig, thisPlus);
    +          }
    +
    +          final CalciteSchema secondLevelSchema = getSubSchemaMap().get(paths[0]).getSubSchema(paths[1], caseSensitive);
    +          if (secondLevelSchema != null) {
    +            SchemaPlus secondlevel = secondLevelSchema.plus();
    +            org.apache.drill.exec.store.AbstractSchema drillSchema =
    +                secondlevel.unwrap(org.apache.drill.exec.store.AbstractSchema.class);
    +            SubSchemaWrapper wrapper = new SubSchemaWrapper(drillSchema);
    +            thisPlus.add(wrapper.getName(), wrapper);
    +          }
    +        }
    +      }
    +    } catch(ExecutionSetupException | IOException ex) {
    --- End diff --
    
    Really? Just ignore errors? Maybe add a comment inside the catch block to explain why.


---

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1032#discussion_r150684042
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemSchemaFactory.java ---
    @@ -52,9 +55,20 @@
     
       private List<WorkspaceSchemaFactory> factories;
       private String schemaName;
    +  protected FileSystemPlugin plugin;
     
       public FileSystemSchemaFactory(String schemaName, List<WorkspaceSchemaFactory> factories) {
         super();
    +    if (factories.size() > 0 ) {
    +      this.plugin = factories.get(0).getPlugin();
    --- End diff --
    
    A comment would be helpful: what is special about plugin 0 that it should be the default? Is plugin 0 the default of some sort?


---

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/drill/pull/1032


---

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

Posted by arina-ielchiieva <gi...@git.apache.org>.
Github user arina-ielchiieva commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1032#discussion_r151713734
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/mock/MockBreakageStorage.java ---
    @@ -0,0 +1,46 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.exec.store.mock;
    +
    +import org.apache.calcite.schema.SchemaPlus;
    +import org.apache.drill.exec.server.DrillbitContext;
    +import org.apache.drill.exec.store.SchemaConfig;
    +
    +import java.io.IOException;
    +
    +public class MockBreakageStorage extends MockStorageEngine {
    +
    +  boolean breakRegister;
    --- End diff --
    
    private


---

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1032#discussion_r150685946
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java ---
    @@ -532,7 +572,10 @@ public boolean isMutable() {
         }
     
         public DrillFileSystem getFS() {
    -      return fs;
    +      if (this.fs == null) {
    +        this.fs = ImpersonationUtil.createFileSystem(schemaConfig.getUserName(), fsConf);
    +      }
    +      return this.fs;
    --- End diff --
    
    No need for `this.fs`, just `fs` will do.


---

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

Posted by chunhui-shi <gi...@git.apache.org>.
Github user chunhui-shi commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1032#discussion_r152073873
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java ---
    @@ -0,0 +1,140 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.exec.planner.sql;
    +
    +import com.google.common.collect.ImmutableSortedSet;
    +import com.google.common.collect.Lists;
    +import com.google.common.collect.Sets;
    +import org.apache.calcite.DataContext;
    +import org.apache.calcite.jdbc.CalciteRootSchema;
    +import org.apache.calcite.jdbc.CalciteSchema;
    +
    +import org.apache.calcite.linq4j.tree.Expression;
    +import org.apache.calcite.linq4j.tree.Expressions;
    +import org.apache.calcite.schema.SchemaPlus;
    +import org.apache.calcite.schema.impl.AbstractSchema;
    +import org.apache.calcite.util.BuiltInMethod;
    +import org.apache.calcite.util.Compatible;
    +import org.apache.drill.common.exceptions.ExecutionSetupException;
    +import org.apache.drill.exec.store.SchemaConfig;
    +import org.apache.drill.exec.store.StoragePlugin;
    +import org.apache.drill.exec.store.StoragePluginRegistry;
    +import org.apache.drill.exec.store.SubSchemaWrapper;
    +
    +import java.io.IOException;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.NavigableSet;
    +import java.util.Set;
    +
    +/**
    + * This class is to allow us loading schemas from storage plugins later when {@link #getSubSchema(String, boolean)}
    + * is called.
    + */
    +public class DynamicRootSchema extends DynamicSchema
    +    implements CalciteRootSchema {
    +  private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(DynamicRootSchema.class);
    +  /** Creates a root schema. */
    +  DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig schemaConfig) {
    +    super(null, new RootSchema(), "");
    +    this.schemaConfig = schemaConfig;
    +    this.storages = storages;
    +  }
    +
    +  @Override
    +  public CalciteSchema getSubSchema(String schemaName, boolean caseSensitive) {
    +    CalciteSchema retSchema = getSubSchemaMap().get(schemaName);
    +    if (retSchema != null) {
    +      return retSchema;
    +    }
    +
    +    loadSchemaFactory(schemaName, caseSensitive);
    +    retSchema = getSubSchemaMap().get(schemaName);
    +    return retSchema;
    +  }
    +
    +  @Override
    +  public NavigableSet<String> getTableNames() {
    +    Set<String> pluginNames = Sets.newHashSet();
    +    for (Map.Entry<String, StoragePlugin> storageEntry : getSchemaFactories()) {
    +      pluginNames.add(storageEntry.getKey());
    +    }
    +    return Compatible.INSTANCE.navigableSet(
    +        ImmutableSortedSet.copyOf(
    +            Sets.union(pluginNames, getSubSchemaMap().keySet())));
    +  }
    +
    +  /**
    +   * load schema factory(storage plugin) for schemaName
    +   * @param schemaName
    +   * @param caseSensitive
    +   */
    +  public void loadSchemaFactory(String schemaName, boolean caseSensitive) {
    +    try {
    +      SchemaPlus thisPlus = this.plus();
    +      StoragePlugin plugin = getSchemaFactories().getPlugin(schemaName);
    +      if (plugin != null) {
    +        plugin.registerSchemas(schemaConfig, thisPlus);
    +        return;
    +      }
    +
    +      // we could not find the plugin, the schemaName could be `dfs.tmp`, a 2nd level schema under 'dfs'
    +      String[] paths = schemaName.split("\\.");
    +      if (paths.length == 2) {
    +        plugin = getSchemaFactories().getPlugin(paths[0]);
    +        if (plugin == null) {
    +          return;
    +        }
    +
    +        // we could find storage plugin for first part(e.g. 'dfs') of schemaName (e.g. 'dfs.tmp')
    +        // register schema for this storage plugin to 'this'.
    +        plugin.registerSchemas(schemaConfig, thisPlus);
    --- End diff --
    
    we get to this place only when that split got an array of length 2 and after we check nullability of plugin, so 'plugin' in this line should not be null if this is your concern.


---

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1032#discussion_r150680951
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java ---
    @@ -0,0 +1,125 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.exec.planner.sql;
    +
    +import com.google.common.collect.ImmutableSortedSet;
    +import com.google.common.collect.Lists;
    +import com.google.common.collect.Sets;
    +import org.apache.calcite.DataContext;
    +import org.apache.calcite.jdbc.CalciteRootSchema;
    +import org.apache.calcite.jdbc.CalciteSchema;
    +
    +import org.apache.calcite.linq4j.tree.Expression;
    +import org.apache.calcite.linq4j.tree.Expressions;
    +import org.apache.calcite.schema.SchemaPlus;
    +import org.apache.calcite.schema.impl.AbstractSchema;
    +import org.apache.calcite.util.BuiltInMethod;
    +import org.apache.calcite.util.Compatible;
    +import org.apache.drill.common.exceptions.ExecutionSetupException;
    +import org.apache.drill.exec.store.SchemaConfig;
    +import org.apache.drill.exec.store.StoragePlugin;
    +import org.apache.drill.exec.store.StoragePluginRegistry;
    +import org.apache.drill.exec.store.SubSchemaWrapper;
    +
    +import java.io.IOException;
    +import java.util.Map;
    +import java.util.NavigableSet;
    +import java.util.Set;
    +
    +public class DynamicRootSchema extends DynamicSchema
    --- End diff --
    
    Class would benefit from general comments: purpose, references, structure.


---

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

Posted by arina-ielchiieva <gi...@git.apache.org>.
Github user arina-ielchiieva commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1032#discussion_r151713266
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestSchema.java ---
    @@ -0,0 +1,84 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.exec.physical.impl;
    +
    +import org.apache.drill.test.ClientFixture;
    +import org.apache.drill.test.ClusterFixture;
    +import org.apache.drill.test.ClusterMockStorageFixture;
    +import org.apache.drill.test.DrillTest;
    +
    +import org.junit.AfterClass;
    +import org.junit.BeforeClass;
    +import org.junit.Test;
    +
    +import static org.junit.Assert.assertTrue;
    +
    +public class TestSchema extends DrillTest {
    +
    +  private static ClusterMockStorageFixture cluster;
    +  private static ClientFixture client;
    +
    +  @BeforeClass
    +  public static void setup() throws Exception {
    +    cluster = ClusterFixture.builder().buildCustomMockStorage();
    +    boolean breakRegisterSchema = true;
    +    cluster.insertMockStorage("mock_broken", breakRegisterSchema);
    +    cluster.insertMockStorage("mock_good", !breakRegisterSchema);
    +    client = cluster.clientFixture();
    +  }
    +
    +  @Test
    +  public void testQueryBrokenStorage() {
    +    String sql = "SELECT id_i, name_s10 FROM `mock_broken`.`employees_5`";
    +    try {
    +      client.queryBuilder().sql(sql).printCsv();
    +    } catch (Exception ex) {
    +      assertTrue(ex.getMessage().contains("VALIDATION ERROR: Schema"));
    +    }
    +  }
    +
    +  @Test
    +  public void testQueryGoodStorage() {
    +    String sql = "SELECT id_i, name_s10 FROM `mock_good`.`employees_5`";
    +    client.queryBuilder().sql(sql).printCsv();
    +  }
    +
    +  @Test
    +  public void testQueryGoodStorageWithDefaultSchema() throws Exception {
    +    String use_dfs = "use dfs.tmp";
    +    client.queryBuilder().sql(use_dfs).run();
    +    String sql = "SELECT id_i, name_s10 FROM `mock_good`.`employees_5`";
    +    client.queryBuilder().sql(sql).printCsv();
    --- End diff --
    
    Do we actually want to print csv here? I suggest we produce no output here.


---

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

Posted by arina-ielchiieva <gi...@git.apache.org>.
Github user arina-ielchiieva commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1032#discussion_r151732407
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java ---
    @@ -0,0 +1,140 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.exec.planner.sql;
    +
    +import com.google.common.collect.ImmutableSortedSet;
    +import com.google.common.collect.Lists;
    +import com.google.common.collect.Sets;
    +import org.apache.calcite.DataContext;
    +import org.apache.calcite.jdbc.CalciteRootSchema;
    +import org.apache.calcite.jdbc.CalciteSchema;
    +
    +import org.apache.calcite.linq4j.tree.Expression;
    +import org.apache.calcite.linq4j.tree.Expressions;
    +import org.apache.calcite.schema.SchemaPlus;
    +import org.apache.calcite.schema.impl.AbstractSchema;
    +import org.apache.calcite.util.BuiltInMethod;
    +import org.apache.calcite.util.Compatible;
    +import org.apache.drill.common.exceptions.ExecutionSetupException;
    +import org.apache.drill.exec.store.SchemaConfig;
    +import org.apache.drill.exec.store.StoragePlugin;
    +import org.apache.drill.exec.store.StoragePluginRegistry;
    +import org.apache.drill.exec.store.SubSchemaWrapper;
    +
    +import java.io.IOException;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.NavigableSet;
    +import java.util.Set;
    +
    +/**
    + * This class is to allow us loading schemas from storage plugins later when {@link #getSubSchema(String, boolean)}
    + * is called.
    + */
    +public class DynamicRootSchema extends DynamicSchema
    +    implements CalciteRootSchema {
    +  private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(DynamicRootSchema.class);
    +  /** Creates a root schema. */
    +  DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig schemaConfig) {
    +    super(null, new RootSchema(), "");
    +    this.schemaConfig = schemaConfig;
    +    this.storages = storages;
    +  }
    +
    +  @Override
    +  public CalciteSchema getSubSchema(String schemaName, boolean caseSensitive) {
    +    CalciteSchema retSchema = getSubSchemaMap().get(schemaName);
    +    if (retSchema != null) {
    +      return retSchema;
    +    }
    +
    +    loadSchemaFactory(schemaName, caseSensitive);
    +    retSchema = getSubSchemaMap().get(schemaName);
    +    return retSchema;
    +  }
    +
    +  @Override
    +  public NavigableSet<String> getTableNames() {
    +    Set<String> pluginNames = Sets.newHashSet();
    +    for (Map.Entry<String, StoragePlugin> storageEntry : getSchemaFactories()) {
    +      pluginNames.add(storageEntry.getKey());
    +    }
    +    return Compatible.INSTANCE.navigableSet(
    +        ImmutableSortedSet.copyOf(
    +            Sets.union(pluginNames, getSubSchemaMap().keySet())));
    +  }
    +
    +  /**
    +   * load schema factory(storage plugin) for schemaName
    +   * @param schemaName
    +   * @param caseSensitive
    +   */
    +  public void loadSchemaFactory(String schemaName, boolean caseSensitive) {
    +    try {
    +      SchemaPlus thisPlus = this.plus();
    +      StoragePlugin plugin = getSchemaFactories().getPlugin(schemaName);
    +      if (plugin != null) {
    +        plugin.registerSchemas(schemaConfig, thisPlus);
    +        return;
    +      }
    +
    +      // we could not find the plugin, the schemaName could be `dfs.tmp`, a 2nd level schema under 'dfs'
    +      String[] paths = schemaName.split("\\.");
    +      if (paths.length == 2) {
    +        plugin = getSchemaFactories().getPlugin(paths[0]);
    +        if (plugin == null) {
    +          return;
    +        }
    +
    +        // we could find storage plugin for first part(e.g. 'dfs') of schemaName (e.g. 'dfs.tmp')
    +        // register schema for this storage plugin to 'this'.
    +        plugin.registerSchemas(schemaConfig, thisPlus);
    +
    +        // we load second level schemas for this storage plugin
    +        final SchemaPlus firstlevelSchema = thisPlus.getSubSchema(paths[0]);
    +        final List<SchemaPlus> secondLevelSchemas = Lists.newArrayList();
    +        for (String secondLevelSchemaName : firstlevelSchema.getSubSchemaNames()) {
    +          secondLevelSchemas.add(firstlevelSchema.getSubSchema(secondLevelSchemaName));
    +        }
    +
    +        for (SchemaPlus schema : secondLevelSchemas) {
    +          org.apache.drill.exec.store.AbstractSchema drillSchema;
    +          try {
    +            drillSchema = schema.unwrap(org.apache.drill.exec.store.AbstractSchema.class);
    +          } catch (ClassCastException e) {
    +            throw new RuntimeException(String.format("Schema '%s' is not expected under root schema", schema.getName()));
    +          }
    +          SubSchemaWrapper wrapper = new SubSchemaWrapper(drillSchema);
    +          thisPlus.add(wrapper.getName(), wrapper);
    +        }
    +      }
    +    } catch(ExecutionSetupException | IOException ex) {
    +      logger.warn("Failed to load schema for \"" + schemaName + "\"!", ex);
    +    }
    +  }
    +
    +  static class RootSchema extends AbstractSchema {
    +    @Override public Expression getExpression(SchemaPlus parentSchema,
    --- End diff --
    
    Could you please explain why we override  `getExpression` method?


---

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

Posted by chunhui-shi <gi...@git.apache.org>.
Github user chunhui-shi commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1032#discussion_r151301607
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java ---
    @@ -150,14 +152,30 @@ public WorkspaceSchemaFactory(
        * @return True if the user has access. False otherwise.
        */
       public boolean accessible(final String userName) throws IOException {
    -    final FileSystem fs = ImpersonationUtil.createFileSystem(userName, fsConf);
    +    final DrillFileSystem fs = ImpersonationUtil.createFileSystem(userName, fsConf);
    +    return accessible(fs);
    +  }
    +
    +  /**
    +   * Checks whether a FileSystem object has the permission to list/read workspace directory
    +   * @param fs a DrillFileSystem object that was created with certain user privilege
    +   * @return True if the user has access. False otherwise.
    +   * @throws IOException
    +   */
    +  public boolean accessible(DrillFileSystem fs) throws IOException {
         try {
    -      // We have to rely on the listStatus as a FileSystem can have complicated controls such as regular unix style
    -      // permissions, Access Control Lists (ACLs) or Access Control Expressions (ACE). Hadoop 2.7 version of FileSystem
    -      // has a limited private API (FileSystem.access) to check the permissions directly
    -      // (see https://issues.apache.org/jira/browse/HDFS-6570). Drill currently relies on Hadoop 2.5.0 version of
    -      // FileClient. TODO: Update this when DRILL-3749 is fixed.
    -      fs.listStatus(wsPath);
    +      /**
    +       * For Windows local file system, fs.access ends up using DeprecatedRawLocalFileStatus which has
    +       * TrustedInstaller as owner, and a member of Administrators group could not satisfy the permission.
    +       * In this case, we will still use method listStatus.
    +       * In other cases, we use access method since it is cheaper.
    +       */
    +      if (SystemUtils.IS_OS_WINDOWS && fs.getUri().getScheme().equalsIgnoreCase("file")) {
    --- End diff --
    
    FileSystem in hdfs has a constant DEFAULT_FS "file:///", for now I will define our own.


---

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1032#discussion_r150685414
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java ---
    @@ -150,14 +152,30 @@ public WorkspaceSchemaFactory(
        * @return True if the user has access. False otherwise.
        */
       public boolean accessible(final String userName) throws IOException {
    -    final FileSystem fs = ImpersonationUtil.createFileSystem(userName, fsConf);
    +    final DrillFileSystem fs = ImpersonationUtil.createFileSystem(userName, fsConf);
    +    return accessible(fs);
    +  }
    +
    +  /**
    +   * Checks whether a FileSystem object has the permission to list/read workspace directory
    +   * @param fs a DrillFileSystem object that was created with certain user privilege
    +   * @return True if the user has access. False otherwise.
    +   * @throws IOException
    +   */
    +  public boolean accessible(DrillFileSystem fs) throws IOException {
         try {
    -      // We have to rely on the listStatus as a FileSystem can have complicated controls such as regular unix style
    -      // permissions, Access Control Lists (ACLs) or Access Control Expressions (ACE). Hadoop 2.7 version of FileSystem
    -      // has a limited private API (FileSystem.access) to check the permissions directly
    -      // (see https://issues.apache.org/jira/browse/HDFS-6570). Drill currently relies on Hadoop 2.5.0 version of
    -      // FileClient. TODO: Update this when DRILL-3749 is fixed.
    -      fs.listStatus(wsPath);
    +      /**
    +       * For Windows local file system, fs.access ends up using DeprecatedRawLocalFileStatus which has
    +       * TrustedInstaller as owner, and a member of Administrators group could not satisfy the permission.
    +       * In this case, we will still use method listStatus.
    +       * In other cases, we use access method since it is cheaper.
    +       */
    +      if (SystemUtils.IS_OS_WINDOWS && fs.getUri().getScheme().equalsIgnoreCase("file")) {
    --- End diff --
    
    HDFS probably defines a constant for "file". Should we reference that?


---

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

Posted by chunhui-shi <gi...@git.apache.org>.
Github user chunhui-shi commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1032#discussion_r151793647
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java ---
    @@ -373,12 +402,12 @@ public String toString() {
       public class WorkspaceSchema extends AbstractSchema implements ExpandingConcurrentMap.MapValueFactory<TableInstance, DrillTable> {
         private final ExpandingConcurrentMap<TableInstance, DrillTable> tables = new ExpandingConcurrentMap<>(this);
         private final SchemaConfig schemaConfig;
    -    private final DrillFileSystem fs;
    +    private DrillFileSystem fs;
     
    -    public WorkspaceSchema(List<String> parentSchemaPath, String wsName, SchemaConfig schemaConfig) throws IOException {
    +    public WorkspaceSchema(List<String> parentSchemaPath, String wsName, SchemaConfig schemaConfig, DrillFileSystem fs) throws IOException {
           super(parentSchemaPath, wsName);
           this.schemaConfig = schemaConfig;
    -      this.fs = ImpersonationUtil.createFileSystem(schemaConfig.getUserName(), fsConf);
    +      this.fs = fs;
    --- End diff --
    
    Now we pass in fs instead creating from inside of WorkspaceSchema. 


---

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

Posted by chunhui-shi <gi...@git.apache.org>.
Github user chunhui-shi commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1032#discussion_r151302799
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java ---
    @@ -0,0 +1,125 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.exec.planner.sql;
    +
    +import com.google.common.collect.ImmutableSortedSet;
    +import com.google.common.collect.Lists;
    +import com.google.common.collect.Sets;
    +import org.apache.calcite.DataContext;
    +import org.apache.calcite.jdbc.CalciteRootSchema;
    +import org.apache.calcite.jdbc.CalciteSchema;
    +
    +import org.apache.calcite.linq4j.tree.Expression;
    +import org.apache.calcite.linq4j.tree.Expressions;
    +import org.apache.calcite.schema.SchemaPlus;
    +import org.apache.calcite.schema.impl.AbstractSchema;
    +import org.apache.calcite.util.BuiltInMethod;
    +import org.apache.calcite.util.Compatible;
    +import org.apache.drill.common.exceptions.ExecutionSetupException;
    +import org.apache.drill.exec.store.SchemaConfig;
    +import org.apache.drill.exec.store.StoragePlugin;
    +import org.apache.drill.exec.store.StoragePluginRegistry;
    +import org.apache.drill.exec.store.SubSchemaWrapper;
    +
    +import java.io.IOException;
    +import java.util.Map;
    +import java.util.NavigableSet;
    +import java.util.Set;
    +
    +public class DynamicRootSchema extends DynamicSchema
    +    implements CalciteRootSchema {
    +
    +  /** Creates a root schema. */
    +  DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig schemaConfig) {
    +    super(null, new RootSchema(), "");
    +    this.schemaConfig = schemaConfig;
    +    this.storages = storages;
    +  }
    +
    +  @Override
    +  public CalciteSchema getSubSchema(String schemaName, boolean caseSensitive) {
    +    CalciteSchema retSchema = getSubSchemaMap().get(schemaName);
    +
    +    if (retSchema == null) {
    +      loadSchemaFactory(schemaName, caseSensitive);
    +    }
    +
    +    retSchema = getSubSchemaMap().get(schemaName);
    +    return retSchema;
    +  }
    +
    +  @Override
    +  public NavigableSet<String> getTableNames() {
    +    Set<String> pluginNames = Sets.newHashSet();
    --- End diff --
    
    plugin name in drill is case sensitive.


---

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1032#discussion_r150682787
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicSchema.java ---
    @@ -0,0 +1,65 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.exec.planner.sql;
    +
    +import org.apache.calcite.jdbc.CalciteSchema;
    +import org.apache.calcite.jdbc.SimpleCalciteSchema;
    +import org.apache.calcite.schema.Schema;
    +import org.apache.calcite.schema.SchemaPlus;
    +import org.apache.drill.exec.store.SchemaConfig;
    +import org.apache.drill.exec.store.StoragePluginRegistry;
    +
    +
    +/**
    + * Unlike SimpleCalciteSchema, DynamicSchema could have an empty or partial schemaMap, but it could maintain a map of
    + * name->SchemaFactory, and only register schema when the corresponsdent name is requested.
    + */
    +public class DynamicSchema extends SimpleCalciteSchema {
    +
    +  protected SchemaConfig schemaConfig;
    --- End diff --
    
    Seems the schemaConfig is never set or used.


---

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

Posted by arina-ielchiieva <gi...@git.apache.org>.
Github user arina-ielchiieva commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1032#discussion_r151714217
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java ---
    @@ -150,14 +152,30 @@ public WorkspaceSchemaFactory(
        * @return True if the user has access. False otherwise.
        */
       public boolean accessible(final String userName) throws IOException {
    -    final FileSystem fs = ImpersonationUtil.createFileSystem(userName, fsConf);
    +    final DrillFileSystem fs = ImpersonationUtil.createFileSystem(userName, fsConf);
    +    return accessible(fs);
    +  }
    +
    +  /**
    +   * Checks whether a FileSystem object has the permission to list/read workspace directory
    +   * @param fs a DrillFileSystem object that was created with certain user privilege
    +   * @return True if the user has access. False otherwise.
    +   * @throws IOException
    +   */
    +  public boolean accessible(DrillFileSystem fs) throws IOException {
         try {
    -      // We have to rely on the listStatus as a FileSystem can have complicated controls such as regular unix style
    -      // permissions, Access Control Lists (ACLs) or Access Control Expressions (ACE). Hadoop 2.7 version of FileSystem
    -      // has a limited private API (FileSystem.access) to check the permissions directly
    -      // (see https://issues.apache.org/jira/browse/HDFS-6570). Drill currently relies on Hadoop 2.5.0 version of
    -      // FileClient. TODO: Update this when DRILL-3749 is fixed.
    -      fs.listStatus(wsPath);
    +      /**
    +       * For Windows local file system, fs.access ends up using DeprecatedRawLocalFileStatus which has
    +       * TrustedInstaller as owner, and a member of Administrators group could not satisfy the permission.
    +       * In this case, we will still use method listStatus.
    +       * In other cases, we use access method since it is cheaper.
    +       */
    +      if (SystemUtils.IS_OS_WINDOWS && fs.getUri().getScheme().equalsIgnoreCase(FileSystemSchemaFactory.LOCAL_FS_SCHEME)) {
    --- End diff --
    
    Just in case, did you check that everything works on Windows?


---

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

Posted by chunhui-shi <gi...@git.apache.org>.
Github user chunhui-shi commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1032#discussion_r151798147
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java ---
    @@ -0,0 +1,140 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.exec.planner.sql;
    +
    +import com.google.common.collect.ImmutableSortedSet;
    +import com.google.common.collect.Lists;
    +import com.google.common.collect.Sets;
    +import org.apache.calcite.DataContext;
    +import org.apache.calcite.jdbc.CalciteRootSchema;
    +import org.apache.calcite.jdbc.CalciteSchema;
    +
    +import org.apache.calcite.linq4j.tree.Expression;
    +import org.apache.calcite.linq4j.tree.Expressions;
    +import org.apache.calcite.schema.SchemaPlus;
    +import org.apache.calcite.schema.impl.AbstractSchema;
    +import org.apache.calcite.util.BuiltInMethod;
    +import org.apache.calcite.util.Compatible;
    +import org.apache.drill.common.exceptions.ExecutionSetupException;
    +import org.apache.drill.exec.store.SchemaConfig;
    +import org.apache.drill.exec.store.StoragePlugin;
    +import org.apache.drill.exec.store.StoragePluginRegistry;
    +import org.apache.drill.exec.store.SubSchemaWrapper;
    +
    +import java.io.IOException;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.NavigableSet;
    +import java.util.Set;
    +
    +/**
    + * This class is to allow us loading schemas from storage plugins later when {@link #getSubSchema(String, boolean)}
    + * is called.
    + */
    +public class DynamicRootSchema extends DynamicSchema
    +    implements CalciteRootSchema {
    +  private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(DynamicRootSchema.class);
    +  /** Creates a root schema. */
    +  DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig schemaConfig) {
    +    super(null, new RootSchema(), "");
    +    this.schemaConfig = schemaConfig;
    +    this.storages = storages;
    +  }
    +
    +  @Override
    +  public CalciteSchema getSubSchema(String schemaName, boolean caseSensitive) {
    +    CalciteSchema retSchema = getSubSchemaMap().get(schemaName);
    +    if (retSchema != null) {
    +      return retSchema;
    +    }
    +
    +    loadSchemaFactory(schemaName, caseSensitive);
    +    retSchema = getSubSchemaMap().get(schemaName);
    +    return retSchema;
    +  }
    +
    +  @Override
    +  public NavigableSet<String> getTableNames() {
    +    Set<String> pluginNames = Sets.newHashSet();
    +    for (Map.Entry<String, StoragePlugin> storageEntry : getSchemaFactories()) {
    +      pluginNames.add(storageEntry.getKey());
    +    }
    +    return Compatible.INSTANCE.navigableSet(
    +        ImmutableSortedSet.copyOf(
    +            Sets.union(pluginNames, getSubSchemaMap().keySet())));
    +  }
    +
    +  /**
    +   * load schema factory(storage plugin) for schemaName
    +   * @param schemaName
    +   * @param caseSensitive
    +   */
    +  public void loadSchemaFactory(String schemaName, boolean caseSensitive) {
    +    try {
    +      SchemaPlus thisPlus = this.plus();
    +      StoragePlugin plugin = getSchemaFactories().getPlugin(schemaName);
    +      if (plugin != null) {
    +        plugin.registerSchemas(schemaConfig, thisPlus);
    +        return;
    +      }
    +
    +      // we could not find the plugin, the schemaName could be `dfs.tmp`, a 2nd level schema under 'dfs'
    +      String[] paths = schemaName.split("\\.");
    +      if (paths.length == 2) {
    +        plugin = getSchemaFactories().getPlugin(paths[0]);
    +        if (plugin == null) {
    +          return;
    +        }
    +
    +        // we could find storage plugin for first part(e.g. 'dfs') of schemaName (e.g. 'dfs.tmp')
    +        // register schema for this storage plugin to 'this'.
    +        plugin.registerSchemas(schemaConfig, thisPlus);
    +
    +        // we load second level schemas for this storage plugin
    +        final SchemaPlus firstlevelSchema = thisPlus.getSubSchema(paths[0]);
    +        final List<SchemaPlus> secondLevelSchemas = Lists.newArrayList();
    +        for (String secondLevelSchemaName : firstlevelSchema.getSubSchemaNames()) {
    +          secondLevelSchemas.add(firstlevelSchema.getSubSchema(secondLevelSchemaName));
    +        }
    +
    +        for (SchemaPlus schema : secondLevelSchemas) {
    +          org.apache.drill.exec.store.AbstractSchema drillSchema;
    +          try {
    +            drillSchema = schema.unwrap(org.apache.drill.exec.store.AbstractSchema.class);
    +          } catch (ClassCastException e) {
    +            throw new RuntimeException(String.format("Schema '%s' is not expected under root schema", schema.getName()));
    +          }
    +          SubSchemaWrapper wrapper = new SubSchemaWrapper(drillSchema);
    +          thisPlus.add(wrapper.getName(), wrapper);
    +        }
    +      }
    +    } catch(ExecutionSetupException | IOException ex) {
    +      logger.warn("Failed to load schema for \"" + schemaName + "\"!", ex);
    +    }
    +  }
    +
    +  static class RootSchema extends AbstractSchema {
    +    @Override public Expression getExpression(SchemaPlus parentSchema,
    --- End diff --
    
    This is copied from the RootSchema used in SimpleCalciteSchema which class is not public. getExpression is used in Calcite code not in our code.


---

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1032#discussion_r150678680
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java ---
    @@ -0,0 +1,125 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.exec.planner.sql;
    +
    +import com.google.common.collect.ImmutableSortedSet;
    +import com.google.common.collect.Lists;
    +import com.google.common.collect.Sets;
    +import org.apache.calcite.DataContext;
    +import org.apache.calcite.jdbc.CalciteRootSchema;
    +import org.apache.calcite.jdbc.CalciteSchema;
    +
    +import org.apache.calcite.linq4j.tree.Expression;
    +import org.apache.calcite.linq4j.tree.Expressions;
    +import org.apache.calcite.schema.SchemaPlus;
    +import org.apache.calcite.schema.impl.AbstractSchema;
    +import org.apache.calcite.util.BuiltInMethod;
    +import org.apache.calcite.util.Compatible;
    +import org.apache.drill.common.exceptions.ExecutionSetupException;
    +import org.apache.drill.exec.store.SchemaConfig;
    +import org.apache.drill.exec.store.StoragePlugin;
    +import org.apache.drill.exec.store.StoragePluginRegistry;
    +import org.apache.drill.exec.store.SubSchemaWrapper;
    +
    +import java.io.IOException;
    +import java.util.Map;
    +import java.util.NavigableSet;
    +import java.util.Set;
    +
    +public class DynamicRootSchema extends DynamicSchema
    +    implements CalciteRootSchema {
    +
    +  /** Creates a root schema. */
    +  DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig schemaConfig) {
    +    super(null, new RootSchema(), "");
    +    this.schemaConfig = schemaConfig;
    +    this.storages = storages;
    +  }
    +
    +  @Override
    +  public CalciteSchema getSubSchema(String schemaName, boolean caseSensitive) {
    +    CalciteSchema retSchema = getSubSchemaMap().get(schemaName);
    +
    +    if (retSchema == null) {
    +      loadSchemaFactory(schemaName, caseSensitive);
    +    }
    +
    +    retSchema = getSubSchemaMap().get(schemaName);
    +    return retSchema;
    --- End diff --
    
    if the original call returns non-null, we make the same call a second time. Better:
    ```
    retSchema = ...
    if (retSchema != null) { return retSchema; }
    loadSchemaFactory(...)
    return getSubSchemaMap()...
    ```


---

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

Posted by arina-ielchiieva <gi...@git.apache.org>.
Github user arina-ielchiieva commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1032#discussion_r151732963
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java ---
    @@ -373,12 +402,12 @@ public String toString() {
       public class WorkspaceSchema extends AbstractSchema implements ExpandingConcurrentMap.MapValueFactory<TableInstance, DrillTable> {
         private final ExpandingConcurrentMap<TableInstance, DrillTable> tables = new ExpandingConcurrentMap<>(this);
         private final SchemaConfig schemaConfig;
    -    private final DrillFileSystem fs;
    +    private DrillFileSystem fs;
     
    -    public WorkspaceSchema(List<String> parentSchemaPath, String wsName, SchemaConfig schemaConfig) throws IOException {
    +    public WorkspaceSchema(List<String> parentSchemaPath, String wsName, SchemaConfig schemaConfig, DrillFileSystem fs) throws IOException {
           super(parentSchemaPath, wsName);
           this.schemaConfig = schemaConfig;
    -      this.fs = ImpersonationUtil.createFileSystem(schemaConfig.getUserName(), fsConf);
    +      this.fs = fs;
    --- End diff --
    
    Why don't we anymore need to create fs using `ImpersonationUtil` but needed before?


---

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1032#discussion_r150679636
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java ---
    @@ -0,0 +1,125 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.exec.planner.sql;
    +
    +import com.google.common.collect.ImmutableSortedSet;
    +import com.google.common.collect.Lists;
    +import com.google.common.collect.Sets;
    +import org.apache.calcite.DataContext;
    +import org.apache.calcite.jdbc.CalciteRootSchema;
    +import org.apache.calcite.jdbc.CalciteSchema;
    +
    +import org.apache.calcite.linq4j.tree.Expression;
    +import org.apache.calcite.linq4j.tree.Expressions;
    +import org.apache.calcite.schema.SchemaPlus;
    +import org.apache.calcite.schema.impl.AbstractSchema;
    +import org.apache.calcite.util.BuiltInMethod;
    +import org.apache.calcite.util.Compatible;
    +import org.apache.drill.common.exceptions.ExecutionSetupException;
    +import org.apache.drill.exec.store.SchemaConfig;
    +import org.apache.drill.exec.store.StoragePlugin;
    +import org.apache.drill.exec.store.StoragePluginRegistry;
    +import org.apache.drill.exec.store.SubSchemaWrapper;
    +
    +import java.io.IOException;
    +import java.util.Map;
    +import java.util.NavigableSet;
    +import java.util.Set;
    +
    +public class DynamicRootSchema extends DynamicSchema
    +    implements CalciteRootSchema {
    +
    +  /** Creates a root schema. */
    +  DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig schemaConfig) {
    +    super(null, new RootSchema(), "");
    +    this.schemaConfig = schemaConfig;
    +    this.storages = storages;
    +  }
    +
    +  @Override
    +  public CalciteSchema getSubSchema(String schemaName, boolean caseSensitive) {
    +    CalciteSchema retSchema = getSubSchemaMap().get(schemaName);
    +
    +    if (retSchema == null) {
    +      loadSchemaFactory(schemaName, caseSensitive);
    +    }
    +
    +    retSchema = getSubSchemaMap().get(schemaName);
    +    return retSchema;
    +  }
    +
    +  @Override
    +  public NavigableSet<String> getTableNames() {
    +    Set<String> pluginNames = Sets.newHashSet();
    +    for (Map.Entry<String, StoragePlugin> storageEntry : getSchemaFactories()) {
    +      pluginNames.add(storageEntry.getKey());
    +    }
    +    return Compatible.INSTANCE.navigableSet(
    +        ImmutableSortedSet.copyOf(
    +            Sets.union(pluginNames, getSubSchemaMap().keySet())));
    +  }
    +
    +  /**
    +   * load schema factory(storage plugin) for schemaName
    +   * @param schemaName
    +   * @param caseSensitive
    +   */
    +  public void loadSchemaFactory(String schemaName, boolean caseSensitive) {
    +    try {
    +      SchemaPlus thisPlus = this.plus();
    +      StoragePlugin plugin = getSchemaFactories().getPlugin(schemaName);
    +      if (plugin != null) {
    +        plugin.registerSchemas(schemaConfig, thisPlus);
    +      }
    --- End diff --
    
    If the name is `dfs.test`, we first look up the compound name, then the parts? Why? Do we put the compound names in the map? Or can we have one schema named "dfs.test" and another called `dfs`.`test`? Or, can this code be restructured a bit?


---

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

Posted by chunhui-shi <gi...@git.apache.org>.
Github user chunhui-shi commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1032#discussion_r151493351
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java ---
    @@ -175,6 +193,21 @@ public WorkspaceSchema createSchema(List<String> parentSchemaPath, SchemaConfig
         return new WorkspaceSchema(parentSchemaPath, schemaName, schemaConfig);
       }
     
    +  public WorkspaceSchema createSchema(List<String> parentSchemaPath, SchemaConfig schemaConfig, DrillFileSystem fs) throws IOException {
    +    if (!accessible(fs)) {
    --- End diff --
    
    returning null then user could not even list this workspace, so they don't know the existence of this workspace at all. I think that is a good access control practice. 
    
    If users expect to see a workspace but could not see it, then they need to figure out why by themselves.


---

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

Posted by chunhui-shi <gi...@git.apache.org>.
Github user chunhui-shi commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1032#discussion_r151298428
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemSchemaFactory.java ---
    @@ -73,9 +87,10 @@ public void registerSchemas(SchemaConfig schemaConfig, SchemaPlus parent) throws
     
         public FileSystemSchema(String name, SchemaConfig schemaConfig) throws IOException {
           super(ImmutableList.<String>of(), name);
    +      final DrillFileSystem fs = ImpersonationUtil.createFileSystem(schemaConfig.getUserName(), plugin.getFsConf());
           for(WorkspaceSchemaFactory f :  factories){
    -        if (f.accessible(schemaConfig.getUserName())) {
    -          WorkspaceSchema s = f.createSchema(getSchemaPath(), schemaConfig);
    +        WorkspaceSchema s = f.createSchema(getSchemaPath(), schemaConfig, fs);
    +        if ( s != null) {
    --- End diff --
    
    'factories' is from storage plugin, so it will be identical when we don't update this storage plugin. This FileSystemSchema constructor will be called only once for a query if this FileSystem storage plugin is needed for this query.



---

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

Posted by arina-ielchiieva <gi...@git.apache.org>.
Github user arina-ielchiieva commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1032#discussion_r151713624
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestSchema.java ---
    @@ -0,0 +1,84 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.exec.physical.impl;
    +
    +import org.apache.drill.test.ClientFixture;
    +import org.apache.drill.test.ClusterFixture;
    +import org.apache.drill.test.ClusterMockStorageFixture;
    +import org.apache.drill.test.DrillTest;
    +
    +import org.junit.AfterClass;
    +import org.junit.BeforeClass;
    +import org.junit.Test;
    +
    +import static org.junit.Assert.assertTrue;
    +
    +public class TestSchema extends DrillTest {
    +
    +  private static ClusterMockStorageFixture cluster;
    +  private static ClientFixture client;
    +
    +  @BeforeClass
    +  public static void setup() throws Exception {
    +    cluster = ClusterFixture.builder().buildCustomMockStorage();
    +    boolean breakRegisterSchema = true;
    +    cluster.insertMockStorage("mock_broken", breakRegisterSchema);
    +    cluster.insertMockStorage("mock_good", !breakRegisterSchema);
    +    client = cluster.clientFixture();
    +  }
    +
    +  @Test
    +  public void testQueryBrokenStorage() {
    +    String sql = "SELECT id_i, name_s10 FROM `mock_broken`.`employees_5`";
    +    try {
    +      client.queryBuilder().sql(sql).printCsv();
    +    } catch (Exception ex) {
    +      assertTrue(ex.getMessage().contains("VALIDATION ERROR: Schema"));
    +    }
    +  }
    +
    +  @Test
    +  public void testQueryGoodStorage() {
    +    String sql = "SELECT id_i, name_s10 FROM `mock_good`.`employees_5`";
    +    client.queryBuilder().sql(sql).printCsv();
    +  }
    +
    +  @Test
    +  public void testQueryGoodStorageWithDefaultSchema() throws Exception {
    +    String use_dfs = "use dfs.tmp";
    +    client.queryBuilder().sql(use_dfs).run();
    +    String sql = "SELECT id_i, name_s10 FROM `mock_good`.`employees_5`";
    +    client.queryBuilder().sql(sql).printCsv();
    +  }
    +
    +  @Test
    +  public void testUseBrokenStorage() throws Exception {
    +    try {
    +      String use_dfs = "use mock_broken";
    +      client.queryBuilder().sql(use_dfs).run();
    +    } catch(Exception ex) {
    +      assertTrue(ex.getMessage().contains("VALIDATION ERROR: Schema"));
    +    }
    +  }
    +
    +  @AfterClass
    --- End diff --
    
    Can be removed.


---

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

Posted by arina-ielchiieva <gi...@git.apache.org>.
Github user arina-ielchiieva commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1032#discussion_r151713558
  
    --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/TestSchema.java ---
    @@ -0,0 +1,84 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.exec.physical.impl;
    +
    +import org.apache.drill.test.ClientFixture;
    +import org.apache.drill.test.ClusterFixture;
    +import org.apache.drill.test.ClusterMockStorageFixture;
    +import org.apache.drill.test.DrillTest;
    +
    +import org.junit.AfterClass;
    +import org.junit.BeforeClass;
    +import org.junit.Test;
    +
    +import static org.junit.Assert.assertTrue;
    +
    +public class TestSchema extends DrillTest {
    +
    +  private static ClusterMockStorageFixture cluster;
    +  private static ClientFixture client;
    +
    +  @BeforeClass
    +  public static void setup() throws Exception {
    +    cluster = ClusterFixture.builder().buildCustomMockStorage();
    +    boolean breakRegisterSchema = true;
    +    cluster.insertMockStorage("mock_broken", breakRegisterSchema);
    +    cluster.insertMockStorage("mock_good", !breakRegisterSchema);
    +    client = cluster.clientFixture();
    +  }
    +
    +  @Test
    +  public void testQueryBrokenStorage() {
    +    String sql = "SELECT id_i, name_s10 FROM `mock_broken`.`employees_5`";
    +    try {
    +      client.queryBuilder().sql(sql).printCsv();
    +    } catch (Exception ex) {
    +      assertTrue(ex.getMessage().contains("VALIDATION ERROR: Schema"));
    --- End diff --
    
    This test can give false positive result when exception won't be thrown at all. Please re-throw the exception after the check and add `@Test(expected = Exception.class)`.


---

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1032#discussion_r150682963
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/SchemaTreeProvider.java ---
    @@ -20,12 +20,13 @@
     import java.io.IOException;
     import java.util.List;
     
    -import org.apache.calcite.jdbc.SimpleCalciteSchema;
    +//import org.apache.calcite.jdbc.SimpleCalciteSchema;
    --- End diff --
    
    Remove


---

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1032#discussion_r150682252
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java ---
    @@ -0,0 +1,125 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.exec.planner.sql;
    +
    +import com.google.common.collect.ImmutableSortedSet;
    +import com.google.common.collect.Lists;
    +import com.google.common.collect.Sets;
    +import org.apache.calcite.DataContext;
    +import org.apache.calcite.jdbc.CalciteRootSchema;
    +import org.apache.calcite.jdbc.CalciteSchema;
    +
    +import org.apache.calcite.linq4j.tree.Expression;
    +import org.apache.calcite.linq4j.tree.Expressions;
    +import org.apache.calcite.schema.SchemaPlus;
    +import org.apache.calcite.schema.impl.AbstractSchema;
    +import org.apache.calcite.util.BuiltInMethod;
    +import org.apache.calcite.util.Compatible;
    +import org.apache.drill.common.exceptions.ExecutionSetupException;
    +import org.apache.drill.exec.store.SchemaConfig;
    +import org.apache.drill.exec.store.StoragePlugin;
    +import org.apache.drill.exec.store.StoragePluginRegistry;
    +import org.apache.drill.exec.store.SubSchemaWrapper;
    +
    +import java.io.IOException;
    +import java.util.Map;
    +import java.util.NavigableSet;
    +import java.util.Set;
    +
    +public class DynamicRootSchema extends DynamicSchema
    +    implements CalciteRootSchema {
    +
    +  /** Creates a root schema. */
    +  DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig schemaConfig) {
    +    super(null, new RootSchema(), "");
    +    this.schemaConfig = schemaConfig;
    +    this.storages = storages;
    +  }
    +
    +  @Override
    +  public CalciteSchema getSubSchema(String schemaName, boolean caseSensitive) {
    +    CalciteSchema retSchema = getSubSchemaMap().get(schemaName);
    +
    +    if (retSchema == null) {
    +      loadSchemaFactory(schemaName, caseSensitive);
    +    }
    +
    +    retSchema = getSubSchemaMap().get(schemaName);
    +    return retSchema;
    +  }
    +
    +  @Override
    +  public NavigableSet<String> getTableNames() {
    +    Set<String> pluginNames = Sets.newHashSet();
    +    for (Map.Entry<String, StoragePlugin> storageEntry : getSchemaFactories()) {
    +      pluginNames.add(storageEntry.getKey());
    +    }
    +    return Compatible.INSTANCE.navigableSet(
    +        ImmutableSortedSet.copyOf(
    +            Sets.union(pluginNames, getSubSchemaMap().keySet())));
    +  }
    +
    +  /**
    +   * load schema factory(storage plugin) for schemaName
    +   * @param schemaName
    +   * @param caseSensitive
    +   */
    +  public void loadSchemaFactory(String schemaName, boolean caseSensitive) {
    +    try {
    +      SchemaPlus thisPlus = this.plus();
    +      StoragePlugin plugin = getSchemaFactories().getPlugin(schemaName);
    +      if (plugin != null) {
    +        plugin.registerSchemas(schemaConfig, thisPlus);
    +      }
    +      else {
    +        //this schema name could be `dfs.tmp`, a 2nd level schema under 'dfs'
    +        String[] paths = schemaName.split("\\.");
    +        if (paths.length == 2) {
    +          plugin = getSchemaFactories().getPlugin(paths[0]);
    +          if (plugin != null) {
    +            plugin.registerSchemas(schemaConfig, thisPlus);
    +          }
    +
    +          final CalciteSchema secondLevelSchema = getSubSchemaMap().get(paths[0]).getSubSchema(paths[1], caseSensitive);
    +          if (secondLevelSchema != null) {
    +            SchemaPlus secondlevel = secondLevelSchema.plus();
    +            org.apache.drill.exec.store.AbstractSchema drillSchema =
    --- End diff --
    
    Fully qualified name needed? Or, will an import work?


---

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1032#discussion_r150686157
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java ---
    @@ -532,7 +572,10 @@ public boolean isMutable() {
         }
     
         public DrillFileSystem getFS() {
    -      return fs;
    +      if (this.fs == null) {
    +        this.fs = ImpersonationUtil.createFileSystem(schemaConfig.getUserName(), fsConf);
    +      }
    +      return this.fs;
    --- End diff --
    
    This class caches the file system, which is good. The other classes in this PR do not; they create the fs as needed.
    
    Does Calcite allow some kind of session state in which we can cache the fs for the query (plan) rather than creating it on the fly each time we need it?


---

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1032#discussion_r150682074
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java ---
    @@ -0,0 +1,125 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.exec.planner.sql;
    +
    +import com.google.common.collect.ImmutableSortedSet;
    +import com.google.common.collect.Lists;
    +import com.google.common.collect.Sets;
    +import org.apache.calcite.DataContext;
    +import org.apache.calcite.jdbc.CalciteRootSchema;
    +import org.apache.calcite.jdbc.CalciteSchema;
    +
    +import org.apache.calcite.linq4j.tree.Expression;
    +import org.apache.calcite.linq4j.tree.Expressions;
    +import org.apache.calcite.schema.SchemaPlus;
    +import org.apache.calcite.schema.impl.AbstractSchema;
    +import org.apache.calcite.util.BuiltInMethod;
    +import org.apache.calcite.util.Compatible;
    +import org.apache.drill.common.exceptions.ExecutionSetupException;
    +import org.apache.drill.exec.store.SchemaConfig;
    +import org.apache.drill.exec.store.StoragePlugin;
    +import org.apache.drill.exec.store.StoragePluginRegistry;
    +import org.apache.drill.exec.store.SubSchemaWrapper;
    +
    +import java.io.IOException;
    +import java.util.Map;
    +import java.util.NavigableSet;
    +import java.util.Set;
    +
    +public class DynamicRootSchema extends DynamicSchema
    +    implements CalciteRootSchema {
    +
    +  /** Creates a root schema. */
    +  DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig schemaConfig) {
    +    super(null, new RootSchema(), "");
    +    this.schemaConfig = schemaConfig;
    +    this.storages = storages;
    +  }
    +
    +  @Override
    +  public CalciteSchema getSubSchema(String schemaName, boolean caseSensitive) {
    +    CalciteSchema retSchema = getSubSchemaMap().get(schemaName);
    +
    +    if (retSchema == null) {
    +      loadSchemaFactory(schemaName, caseSensitive);
    +    }
    +
    +    retSchema = getSubSchemaMap().get(schemaName);
    +    return retSchema;
    +  }
    +
    +  @Override
    +  public NavigableSet<String> getTableNames() {
    +    Set<String> pluginNames = Sets.newHashSet();
    +    for (Map.Entry<String, StoragePlugin> storageEntry : getSchemaFactories()) {
    +      pluginNames.add(storageEntry.getKey());
    +    }
    +    return Compatible.INSTANCE.navigableSet(
    +        ImmutableSortedSet.copyOf(
    +            Sets.union(pluginNames, getSubSchemaMap().keySet())));
    +  }
    +
    +  /**
    +   * load schema factory(storage plugin) for schemaName
    +   * @param schemaName
    +   * @param caseSensitive
    +   */
    +  public void loadSchemaFactory(String schemaName, boolean caseSensitive) {
    +    try {
    +      SchemaPlus thisPlus = this.plus();
    +      StoragePlugin plugin = getSchemaFactories().getPlugin(schemaName);
    +      if (plugin != null) {
    +        plugin.registerSchemas(schemaConfig, thisPlus);
    +      }
    +      else {
    +        //this schema name could be `dfs.tmp`, a 2nd level schema under 'dfs'
    +        String[] paths = schemaName.split("\\.");
    +        if (paths.length == 2) {
    +          plugin = getSchemaFactories().getPlugin(paths[0]);
    +          if (plugin != null) {
    +            plugin.registerSchemas(schemaConfig, thisPlus);
    +          }
    --- End diff --
    
    What if the plugin is null? Should we fail with an error, or just keep going?


---

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

Posted by arina-ielchiieva <gi...@git.apache.org>.
Github user arina-ielchiieva commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1032#discussion_r151714418
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemSchemaFactory.java ---
    @@ -50,11 +53,23 @@
     
       public static final String DEFAULT_WS_NAME = "default";
     
    +  public static final String LOCAL_FS_SCHEME = "file";
    +
       private List<WorkspaceSchemaFactory> factories;
       private String schemaName;
    +  protected FileSystemPlugin plugin;
     
       public FileSystemSchemaFactory(String schemaName, List<WorkspaceSchemaFactory> factories) {
    -    super();
    +    // when the correspondent FileSystemPlugin is not passed in, we dig into ANY workspace factory to get it.
    +    if (factories.size() > 0 ) {
    --- End diff --
    
    Please remove space `if (factories.size() > 0) {`.


---

[GitHub] drill issue #1032: DRILL-5089: Dynamically load schema of storage plugin onl...

Posted by arina-ielchiieva <gi...@git.apache.org>.
Github user arina-ielchiieva commented on the issue:

    https://github.com/apache/drill/pull/1032
  
    +1


---

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1032#discussion_r150685113
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemSchemaFactory.java ---
    @@ -73,9 +87,10 @@ public void registerSchemas(SchemaConfig schemaConfig, SchemaPlus parent) throws
     
         public FileSystemSchema(String name, SchemaConfig schemaConfig) throws IOException {
           super(ImmutableList.<String>of(), name);
    +      final DrillFileSystem fs = ImpersonationUtil.createFileSystem(schemaConfig.getUserName(), plugin.getFsConf());
           for(WorkspaceSchemaFactory f :  factories){
    -        if (f.accessible(schemaConfig.getUserName())) {
    -          WorkspaceSchema s = f.createSchema(getSchemaPath(), schemaConfig);
    +        WorkspaceSchema s = f.createSchema(getSchemaPath(), schemaConfig, fs);
    +        if ( s != null) {
    --- End diff --
    
    Here we iterate over a list of workspace schema factories. For each, we resolve a schemaConfig against the file system.
    
    Under what situations would we have multiple factories? Selecting from two distinct storage plugins?
    
    Calcite tends to resolve the same things over and over. Will this method be called multiple times?


---

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

Posted by chunhui-shi <gi...@git.apache.org>.
Github user chunhui-shi commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1032#discussion_r151299650
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/SchemaTreeProvider.java ---
    @@ -105,12 +106,36 @@ public SchemaPlus createRootSchema(final String userName, final SchemaConfigInfo
        * @return
        */
       public SchemaPlus createRootSchema(SchemaConfig schemaConfig) {
    +      final SchemaPlus rootSchema = DynamicSchema.createRootSchema(dContext.getStorage(), schemaConfig);
    +      schemaTreesToClose.add(rootSchema);
    +      return rootSchema;
    +  }
    +
    +  /**
    +   * Return full root schema with schema owner as the given user.
    +   *
    +   * @param userName Name of the user who is accessing the storage sources.
    +   * @param provider {@link SchemaConfigInfoProvider} instance
    +   * @return Root of the schema tree.
    +   */
    +  public SchemaPlus createFullRootSchema(final String userName, final SchemaConfigInfoProvider provider) {
    +    final String schemaUser = isImpersonationEnabled ? userName : ImpersonationUtil.getProcessUserName();
    --- End diff --
    
    not that many places, and need to pass in isImpersonationEnabled and userName if this line became a standalone method. will keep it as is for now.


---

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1032#discussion_r150678803
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java ---
    @@ -0,0 +1,125 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.exec.planner.sql;
    +
    +import com.google.common.collect.ImmutableSortedSet;
    +import com.google.common.collect.Lists;
    +import com.google.common.collect.Sets;
    +import org.apache.calcite.DataContext;
    +import org.apache.calcite.jdbc.CalciteRootSchema;
    +import org.apache.calcite.jdbc.CalciteSchema;
    +
    +import org.apache.calcite.linq4j.tree.Expression;
    +import org.apache.calcite.linq4j.tree.Expressions;
    +import org.apache.calcite.schema.SchemaPlus;
    +import org.apache.calcite.schema.impl.AbstractSchema;
    +import org.apache.calcite.util.BuiltInMethod;
    +import org.apache.calcite.util.Compatible;
    +import org.apache.drill.common.exceptions.ExecutionSetupException;
    +import org.apache.drill.exec.store.SchemaConfig;
    +import org.apache.drill.exec.store.StoragePlugin;
    +import org.apache.drill.exec.store.StoragePluginRegistry;
    +import org.apache.drill.exec.store.SubSchemaWrapper;
    +
    +import java.io.IOException;
    +import java.util.Map;
    +import java.util.NavigableSet;
    +import java.util.Set;
    +
    +public class DynamicRootSchema extends DynamicSchema
    +    implements CalciteRootSchema {
    +
    +  /** Creates a root schema. */
    +  DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig schemaConfig) {
    +    super(null, new RootSchema(), "");
    +    this.schemaConfig = schemaConfig;
    +    this.storages = storages;
    +  }
    +
    +  @Override
    +  public CalciteSchema getSubSchema(String schemaName, boolean caseSensitive) {
    +    CalciteSchema retSchema = getSubSchemaMap().get(schemaName);
    +
    +    if (retSchema == null) {
    +      loadSchemaFactory(schemaName, caseSensitive);
    +    }
    +
    +    retSchema = getSubSchemaMap().get(schemaName);
    +    return retSchema;
    +  }
    +
    +  @Override
    +  public NavigableSet<String> getTableNames() {
    +    Set<String> pluginNames = Sets.newHashSet();
    --- End diff --
    
    Should be case insensitive?


---

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1032#discussion_r150686404
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/ops/FragmentContext.java ---
    @@ -229,7 +229,7 @@ public DrillbitContext getDrillbitContext() {
         return context;
       }
     
    -  public SchemaPlus getRootSchema() {
    +  public SchemaPlus getFullRootSchema() {
    --- End diff --
    
    Comment to explain what a "full root schema" is? Apparently, this is both the plugin config name and workspace combined?


---

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

Posted by arina-ielchiieva <gi...@git.apache.org>.
Github user arina-ielchiieva commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1032#discussion_r151716880
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java ---
    @@ -0,0 +1,140 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.exec.planner.sql;
    +
    +import com.google.common.collect.ImmutableSortedSet;
    +import com.google.common.collect.Lists;
    +import com.google.common.collect.Sets;
    +import org.apache.calcite.DataContext;
    +import org.apache.calcite.jdbc.CalciteRootSchema;
    +import org.apache.calcite.jdbc.CalciteSchema;
    +
    +import org.apache.calcite.linq4j.tree.Expression;
    +import org.apache.calcite.linq4j.tree.Expressions;
    +import org.apache.calcite.schema.SchemaPlus;
    +import org.apache.calcite.schema.impl.AbstractSchema;
    +import org.apache.calcite.util.BuiltInMethod;
    +import org.apache.calcite.util.Compatible;
    +import org.apache.drill.common.exceptions.ExecutionSetupException;
    +import org.apache.drill.exec.store.SchemaConfig;
    +import org.apache.drill.exec.store.StoragePlugin;
    +import org.apache.drill.exec.store.StoragePluginRegistry;
    +import org.apache.drill.exec.store.SubSchemaWrapper;
    +
    +import java.io.IOException;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.NavigableSet;
    +import java.util.Set;
    +
    +/**
    + * This class is to allow us loading schemas from storage plugins later when {@link #getSubSchema(String, boolean)}
    + * is called.
    + */
    +public class DynamicRootSchema extends DynamicSchema
    +    implements CalciteRootSchema {
    +  private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(DynamicRootSchema.class);
    +  /** Creates a root schema. */
    +  DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig schemaConfig) {
    +    super(null, new RootSchema(), "");
    +    this.schemaConfig = schemaConfig;
    +    this.storages = storages;
    +  }
    +
    +  @Override
    +  public CalciteSchema getSubSchema(String schemaName, boolean caseSensitive) {
    +    CalciteSchema retSchema = getSubSchemaMap().get(schemaName);
    +    if (retSchema != null) {
    +      return retSchema;
    +    }
    +
    +    loadSchemaFactory(schemaName, caseSensitive);
    +    retSchema = getSubSchemaMap().get(schemaName);
    +    return retSchema;
    +  }
    +
    +  @Override
    +  public NavigableSet<String> getTableNames() {
    +    Set<String> pluginNames = Sets.newHashSet();
    +    for (Map.Entry<String, StoragePlugin> storageEntry : getSchemaFactories()) {
    +      pluginNames.add(storageEntry.getKey());
    +    }
    +    return Compatible.INSTANCE.navigableSet(
    +        ImmutableSortedSet.copyOf(
    +            Sets.union(pluginNames, getSubSchemaMap().keySet())));
    --- End diff --
    
    Could you please explain what this method actually returns? According by its name it should return table names but it seems it returns different things...


---

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1032#discussion_r150679941
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java ---
    @@ -0,0 +1,125 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.exec.planner.sql;
    +
    +import com.google.common.collect.ImmutableSortedSet;
    +import com.google.common.collect.Lists;
    +import com.google.common.collect.Sets;
    +import org.apache.calcite.DataContext;
    +import org.apache.calcite.jdbc.CalciteRootSchema;
    +import org.apache.calcite.jdbc.CalciteSchema;
    +
    +import org.apache.calcite.linq4j.tree.Expression;
    +import org.apache.calcite.linq4j.tree.Expressions;
    +import org.apache.calcite.schema.SchemaPlus;
    +import org.apache.calcite.schema.impl.AbstractSchema;
    +import org.apache.calcite.util.BuiltInMethod;
    +import org.apache.calcite.util.Compatible;
    +import org.apache.drill.common.exceptions.ExecutionSetupException;
    +import org.apache.drill.exec.store.SchemaConfig;
    +import org.apache.drill.exec.store.StoragePlugin;
    +import org.apache.drill.exec.store.StoragePluginRegistry;
    +import org.apache.drill.exec.store.SubSchemaWrapper;
    +
    +import java.io.IOException;
    +import java.util.Map;
    +import java.util.NavigableSet;
    +import java.util.Set;
    +
    +public class DynamicRootSchema extends DynamicSchema
    +    implements CalciteRootSchema {
    +
    +  /** Creates a root schema. */
    +  DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig schemaConfig) {
    +    super(null, new RootSchema(), "");
    +    this.schemaConfig = schemaConfig;
    +    this.storages = storages;
    +  }
    +
    +  @Override
    +  public CalciteSchema getSubSchema(String schemaName, boolean caseSensitive) {
    +    CalciteSchema retSchema = getSubSchemaMap().get(schemaName);
    +
    +    if (retSchema == null) {
    +      loadSchemaFactory(schemaName, caseSensitive);
    +    }
    +
    +    retSchema = getSubSchemaMap().get(schemaName);
    +    return retSchema;
    +  }
    +
    +  @Override
    +  public NavigableSet<String> getTableNames() {
    +    Set<String> pluginNames = Sets.newHashSet();
    +    for (Map.Entry<String, StoragePlugin> storageEntry : getSchemaFactories()) {
    +      pluginNames.add(storageEntry.getKey());
    +    }
    +    return Compatible.INSTANCE.navigableSet(
    +        ImmutableSortedSet.copyOf(
    +            Sets.union(pluginNames, getSubSchemaMap().keySet())));
    +  }
    +
    +  /**
    +   * load schema factory(storage plugin) for schemaName
    +   * @param schemaName
    +   * @param caseSensitive
    +   */
    +  public void loadSchemaFactory(String schemaName, boolean caseSensitive) {
    +    try {
    +      SchemaPlus thisPlus = this.plus();
    +      StoragePlugin plugin = getSchemaFactories().getPlugin(schemaName);
    +      if (plugin != null) {
    +        plugin.registerSchemas(schemaConfig, thisPlus);
    +      }
    +      else {
    +        //this schema name could be `dfs.tmp`, a 2nd level schema under 'dfs'
    +        String[] paths = schemaName.split("\\.");
    --- End diff --
    
    Should this be done here in this simple way? How many other places do we do the same thing? Or, should we have a common function to split schema names so we can handle, way, escapes and other special cases that might come along?


---

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1032#discussion_r150685672
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java ---
    @@ -175,6 +193,21 @@ public WorkspaceSchema createSchema(List<String> parentSchemaPath, SchemaConfig
         return new WorkspaceSchema(parentSchemaPath, schemaName, schemaConfig);
       }
     
    +  public WorkspaceSchema createSchema(List<String> parentSchemaPath, SchemaConfig schemaConfig, DrillFileSystem fs) throws IOException {
    +    if (!accessible(fs)) {
    --- End diff --
    
    Is returning null sufficient to tell the user that they don't have permission to do this operation?


---

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

Posted by chunhui-shi <gi...@git.apache.org>.
Github user chunhui-shi commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1032#discussion_r151793708
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java ---
    @@ -150,14 +152,30 @@ public WorkspaceSchemaFactory(
        * @return True if the user has access. False otherwise.
        */
       public boolean accessible(final String userName) throws IOException {
    -    final FileSystem fs = ImpersonationUtil.createFileSystem(userName, fsConf);
    +    final DrillFileSystem fs = ImpersonationUtil.createFileSystem(userName, fsConf);
    +    return accessible(fs);
    +  }
    +
    +  /**
    +   * Checks whether a FileSystem object has the permission to list/read workspace directory
    +   * @param fs a DrillFileSystem object that was created with certain user privilege
    +   * @return True if the user has access. False otherwise.
    +   * @throws IOException
    +   */
    +  public boolean accessible(DrillFileSystem fs) throws IOException {
         try {
    -      // We have to rely on the listStatus as a FileSystem can have complicated controls such as regular unix style
    -      // permissions, Access Control Lists (ACLs) or Access Control Expressions (ACE). Hadoop 2.7 version of FileSystem
    -      // has a limited private API (FileSystem.access) to check the permissions directly
    -      // (see https://issues.apache.org/jira/browse/HDFS-6570). Drill currently relies on Hadoop 2.5.0 version of
    -      // FileClient. TODO: Update this when DRILL-3749 is fixed.
    -      fs.listStatus(wsPath);
    +      /**
    +       * For Windows local file system, fs.access ends up using DeprecatedRawLocalFileStatus which has
    +       * TrustedInstaller as owner, and a member of Administrators group could not satisfy the permission.
    +       * In this case, we will still use method listStatus.
    +       * In other cases, we use access method since it is cheaper.
    +       */
    +      if (SystemUtils.IS_OS_WINDOWS && fs.getUri().getScheme().equalsIgnoreCase(FileSystemSchemaFactory.LOCAL_FS_SCHEME)) {
    --- End diff --
    
    Yes. it was tested in windows unit tests.


---

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1032#discussion_r150683680
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/SchemaTreeProvider.java ---
    @@ -105,12 +106,36 @@ public SchemaPlus createRootSchema(final String userName, final SchemaConfigInfo
        * @return
        */
       public SchemaPlus createRootSchema(SchemaConfig schemaConfig) {
    +      final SchemaPlus rootSchema = DynamicSchema.createRootSchema(dContext.getStorage(), schemaConfig);
    +      schemaTreesToClose.add(rootSchema);
    +      return rootSchema;
    +  }
    +
    +  /**
    +   * Return full root schema with schema owner as the given user.
    +   *
    +   * @param userName Name of the user who is accessing the storage sources.
    +   * @param provider {@link SchemaConfigInfoProvider} instance
    +   * @return Root of the schema tree.
    +   */
    +  public SchemaPlus createFullRootSchema(final String userName, final SchemaConfigInfoProvider provider) {
    +    final String schemaUser = isImpersonationEnabled ? userName : ImpersonationUtil.getProcessUserName();
    --- End diff --
    
    Should this be factored out somewhere? Seems this magic stanza will be needed in many places: better to have one copy than several. Maybe a method in `ImpersonationUtil`?


---

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1032#discussion_r150680095
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java ---
    @@ -0,0 +1,125 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.exec.planner.sql;
    +
    +import com.google.common.collect.ImmutableSortedSet;
    +import com.google.common.collect.Lists;
    +import com.google.common.collect.Sets;
    +import org.apache.calcite.DataContext;
    +import org.apache.calcite.jdbc.CalciteRootSchema;
    +import org.apache.calcite.jdbc.CalciteSchema;
    +
    +import org.apache.calcite.linq4j.tree.Expression;
    +import org.apache.calcite.linq4j.tree.Expressions;
    +import org.apache.calcite.schema.SchemaPlus;
    +import org.apache.calcite.schema.impl.AbstractSchema;
    +import org.apache.calcite.util.BuiltInMethod;
    +import org.apache.calcite.util.Compatible;
    +import org.apache.drill.common.exceptions.ExecutionSetupException;
    +import org.apache.drill.exec.store.SchemaConfig;
    +import org.apache.drill.exec.store.StoragePlugin;
    +import org.apache.drill.exec.store.StoragePluginRegistry;
    +import org.apache.drill.exec.store.SubSchemaWrapper;
    +
    +import java.io.IOException;
    +import java.util.Map;
    +import java.util.NavigableSet;
    +import java.util.Set;
    +
    +public class DynamicRootSchema extends DynamicSchema
    +    implements CalciteRootSchema {
    +
    +  /** Creates a root schema. */
    +  DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig schemaConfig) {
    +    super(null, new RootSchema(), "");
    +    this.schemaConfig = schemaConfig;
    +    this.storages = storages;
    +  }
    +
    +  @Override
    +  public CalciteSchema getSubSchema(String schemaName, boolean caseSensitive) {
    +    CalciteSchema retSchema = getSubSchemaMap().get(schemaName);
    +
    +    if (retSchema == null) {
    +      loadSchemaFactory(schemaName, caseSensitive);
    +    }
    +
    +    retSchema = getSubSchemaMap().get(schemaName);
    +    return retSchema;
    +  }
    +
    +  @Override
    +  public NavigableSet<String> getTableNames() {
    +    Set<String> pluginNames = Sets.newHashSet();
    +    for (Map.Entry<String, StoragePlugin> storageEntry : getSchemaFactories()) {
    +      pluginNames.add(storageEntry.getKey());
    +    }
    +    return Compatible.INSTANCE.navigableSet(
    +        ImmutableSortedSet.copyOf(
    +            Sets.union(pluginNames, getSubSchemaMap().keySet())));
    +  }
    +
    +  /**
    +   * load schema factory(storage plugin) for schemaName
    +   * @param schemaName
    +   * @param caseSensitive
    +   */
    +  public void loadSchemaFactory(String schemaName, boolean caseSensitive) {
    +    try {
    +      SchemaPlus thisPlus = this.plus();
    +      StoragePlugin plugin = getSchemaFactories().getPlugin(schemaName);
    +      if (plugin != null) {
    +        plugin.registerSchemas(schemaConfig, thisPlus);
    +      }
    +      else {
    +        //this schema name could be `dfs.tmp`, a 2nd level schema under 'dfs'
    +        String[] paths = schemaName.split("\\.");
    +        if (paths.length == 2) {
    --- End diff --
    
    Do we support only 1- and 2-part names? Should we assert that the length <= 2?


---

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

Posted by arina-ielchiieva <gi...@git.apache.org>.
Github user arina-ielchiieva commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1032#discussion_r151717700
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java ---
    @@ -0,0 +1,140 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.exec.planner.sql;
    +
    +import com.google.common.collect.ImmutableSortedSet;
    +import com.google.common.collect.Lists;
    +import com.google.common.collect.Sets;
    +import org.apache.calcite.DataContext;
    +import org.apache.calcite.jdbc.CalciteRootSchema;
    +import org.apache.calcite.jdbc.CalciteSchema;
    +
    +import org.apache.calcite.linq4j.tree.Expression;
    +import org.apache.calcite.linq4j.tree.Expressions;
    +import org.apache.calcite.schema.SchemaPlus;
    +import org.apache.calcite.schema.impl.AbstractSchema;
    +import org.apache.calcite.util.BuiltInMethod;
    +import org.apache.calcite.util.Compatible;
    +import org.apache.drill.common.exceptions.ExecutionSetupException;
    +import org.apache.drill.exec.store.SchemaConfig;
    +import org.apache.drill.exec.store.StoragePlugin;
    +import org.apache.drill.exec.store.StoragePluginRegistry;
    +import org.apache.drill.exec.store.SubSchemaWrapper;
    +
    +import java.io.IOException;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.NavigableSet;
    +import java.util.Set;
    +
    +/**
    + * This class is to allow us loading schemas from storage plugins later when {@link #getSubSchema(String, boolean)}
    + * is called.
    + */
    +public class DynamicRootSchema extends DynamicSchema
    +    implements CalciteRootSchema {
    +  private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(DynamicRootSchema.class);
    +  /** Creates a root schema. */
    +  DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig schemaConfig) {
    +    super(null, new RootSchema(), "");
    +    this.schemaConfig = schemaConfig;
    +    this.storages = storages;
    +  }
    +
    +  @Override
    +  public CalciteSchema getSubSchema(String schemaName, boolean caseSensitive) {
    +    CalciteSchema retSchema = getSubSchemaMap().get(schemaName);
    +    if (retSchema != null) {
    +      return retSchema;
    +    }
    +
    +    loadSchemaFactory(schemaName, caseSensitive);
    +    retSchema = getSubSchemaMap().get(schemaName);
    +    return retSchema;
    +  }
    +
    +  @Override
    +  public NavigableSet<String> getTableNames() {
    +    Set<String> pluginNames = Sets.newHashSet();
    +    for (Map.Entry<String, StoragePlugin> storageEntry : getSchemaFactories()) {
    +      pluginNames.add(storageEntry.getKey());
    +    }
    +    return Compatible.INSTANCE.navigableSet(
    +        ImmutableSortedSet.copyOf(
    +            Sets.union(pluginNames, getSubSchemaMap().keySet())));
    +  }
    +
    +  /**
    +   * load schema factory(storage plugin) for schemaName
    +   * @param schemaName
    +   * @param caseSensitive
    +   */
    +  public void loadSchemaFactory(String schemaName, boolean caseSensitive) {
    +    try {
    +      SchemaPlus thisPlus = this.plus();
    +      StoragePlugin plugin = getSchemaFactories().getPlugin(schemaName);
    +      if (plugin != null) {
    +        plugin.registerSchemas(schemaConfig, thisPlus);
    +        return;
    +      }
    +
    +      // we could not find the plugin, the schemaName could be `dfs.tmp`, a 2nd level schema under 'dfs'
    +      String[] paths = schemaName.split("\\.");
    +      if (paths.length == 2) {
    +        plugin = getSchemaFactories().getPlugin(paths[0]);
    +        if (plugin == null) {
    +          return;
    +        }
    +
    +        // we could find storage plugin for first part(e.g. 'dfs') of schemaName (e.g. 'dfs.tmp')
    +        // register schema for this storage plugin to 'this'.
    +        plugin.registerSchemas(schemaConfig, thisPlus);
    --- End diff --
    
    Can we get NPE here? Let's say that after split on line 97 we got length as 1 or 3?


---

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1032#discussion_r150680632
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicRootSchema.java ---
    @@ -0,0 +1,125 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.exec.planner.sql;
    +
    +import com.google.common.collect.ImmutableSortedSet;
    +import com.google.common.collect.Lists;
    +import com.google.common.collect.Sets;
    +import org.apache.calcite.DataContext;
    +import org.apache.calcite.jdbc.CalciteRootSchema;
    +import org.apache.calcite.jdbc.CalciteSchema;
    +
    +import org.apache.calcite.linq4j.tree.Expression;
    +import org.apache.calcite.linq4j.tree.Expressions;
    +import org.apache.calcite.schema.SchemaPlus;
    +import org.apache.calcite.schema.impl.AbstractSchema;
    +import org.apache.calcite.util.BuiltInMethod;
    +import org.apache.calcite.util.Compatible;
    +import org.apache.drill.common.exceptions.ExecutionSetupException;
    +import org.apache.drill.exec.store.SchemaConfig;
    +import org.apache.drill.exec.store.StoragePlugin;
    +import org.apache.drill.exec.store.StoragePluginRegistry;
    +import org.apache.drill.exec.store.SubSchemaWrapper;
    +
    +import java.io.IOException;
    +import java.util.Map;
    +import java.util.NavigableSet;
    +import java.util.Set;
    +
    +public class DynamicRootSchema extends DynamicSchema
    +    implements CalciteRootSchema {
    +
    +  /** Creates a root schema. */
    +  DynamicRootSchema(StoragePluginRegistry storages, SchemaConfig schemaConfig) {
    +    super(null, new RootSchema(), "");
    +    this.schemaConfig = schemaConfig;
    +    this.storages = storages;
    +  }
    +
    +  @Override
    +  public CalciteSchema getSubSchema(String schemaName, boolean caseSensitive) {
    +    CalciteSchema retSchema = getSubSchemaMap().get(schemaName);
    +
    +    if (retSchema == null) {
    +      loadSchemaFactory(schemaName, caseSensitive);
    +    }
    +
    +    retSchema = getSubSchemaMap().get(schemaName);
    +    return retSchema;
    +  }
    +
    +  @Override
    +  public NavigableSet<String> getTableNames() {
    +    Set<String> pluginNames = Sets.newHashSet();
    +    for (Map.Entry<String, StoragePlugin> storageEntry : getSchemaFactories()) {
    +      pluginNames.add(storageEntry.getKey());
    +    }
    +    return Compatible.INSTANCE.navigableSet(
    +        ImmutableSortedSet.copyOf(
    +            Sets.union(pluginNames, getSubSchemaMap().keySet())));
    +  }
    +
    +  /**
    +   * load schema factory(storage plugin) for schemaName
    +   * @param schemaName
    +   * @param caseSensitive
    +   */
    +  public void loadSchemaFactory(String schemaName, boolean caseSensitive) {
    +    try {
    +      SchemaPlus thisPlus = this.plus();
    +      StoragePlugin plugin = getSchemaFactories().getPlugin(schemaName);
    +      if (plugin != null) {
    +        plugin.registerSchemas(schemaConfig, thisPlus);
    +      }
    +      else {
    +        //this schema name could be `dfs.tmp`, a 2nd level schema under 'dfs'
    +        String[] paths = schemaName.split("\\.");
    +        if (paths.length == 2) {
    +          plugin = getSchemaFactories().getPlugin(paths[0]);
    +          if (plugin != null) {
    +            plugin.registerSchemas(schemaConfig, thisPlus);
    +          }
    +
    +          final CalciteSchema secondLevelSchema = getSubSchemaMap().get(paths[0]).getSubSchema(paths[1], caseSensitive);
    +          if (secondLevelSchema != null) {
    +            SchemaPlus secondlevel = secondLevelSchema.plus();
    +            org.apache.drill.exec.store.AbstractSchema drillSchema =
    +                secondlevel.unwrap(org.apache.drill.exec.store.AbstractSchema.class);
    +            SubSchemaWrapper wrapper = new SubSchemaWrapper(drillSchema);
    +            thisPlus.add(wrapper.getName(), wrapper);
    +          }
    +        }
    +      }
    +    } catch(ExecutionSetupException | IOException ex) {
    +    }
    +  }
    +
    +  static class RootSchema extends AbstractSchema {
    +    RootSchema() {
    +      super();
    +    }
    --- End diff --
    
    Omit, this is a degenerate ctor that spells out what Java already does.


---

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1032#discussion_r150684557
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemSchemaFactory.java ---
    @@ -73,9 +87,10 @@ public void registerSchemas(SchemaConfig schemaConfig, SchemaPlus parent) throws
     
         public FileSystemSchema(String name, SchemaConfig schemaConfig) throws IOException {
           super(ImmutableList.<String>of(), name);
    +      final DrillFileSystem fs = ImpersonationUtil.createFileSystem(schemaConfig.getUserName(), plugin.getFsConf());
    --- End diff --
    
    Comment to explain what's happening here?


---

[GitHub] drill pull request #1032: DRILL-5089: Dynamically load schema of storage plu...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1032#discussion_r150682678
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DynamicSchema.java ---
    @@ -0,0 +1,65 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.drill.exec.planner.sql;
    +
    +import org.apache.calcite.jdbc.CalciteSchema;
    +import org.apache.calcite.jdbc.SimpleCalciteSchema;
    +import org.apache.calcite.schema.Schema;
    +import org.apache.calcite.schema.SchemaPlus;
    +import org.apache.drill.exec.store.SchemaConfig;
    +import org.apache.drill.exec.store.StoragePluginRegistry;
    +
    +
    +/**
    + * Unlike SimpleCalciteSchema, DynamicSchema could have an empty or partial schemaMap, but it could maintain a map of
    + * name->SchemaFactory, and only register schema when the corresponsdent name is requested.
    + */
    +public class DynamicSchema extends SimpleCalciteSchema {
    +
    +  protected SchemaConfig schemaConfig;
    +  protected StoragePluginRegistry storages;
    --- End diff --
    
    Seems that storages is never initialized or used except in the accessor.


---