You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by chunhui-shi <gi...@git.apache.org> on 2017/03/25 00:14:29 UTC

[GitHub] drill pull request #796: DRILL-5365: DrillFileSystem setConf in constructor....

GitHub user chunhui-shi opened a pull request:

    https://github.com/apache/drill/pull/796

    DRILL-5365: DrillFileSystem setConf in constructor. DrillFileSystem c\u2026

    \u2026ould be created based on provided URI.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/chunhui-shi/drill DRILL_5365

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/796.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #796
    
----
commit e91755a4668346636dc1a33b6b0a86ad6a5654e2
Author: chunhui-shi <cs...@maprtech.com>
Date:   2017-03-02T00:53:23Z

    DRILL-5365: DrillFileSystem setConf in constructor. DrillFileSystem could be created based on provided URI.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #796: DRILL-5365: DrillFileSystem setConf in constructor....

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/796#discussion_r108049271
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/DrillFileSystem.java ---
    @@ -89,22 +89,36 @@ public DrillFileSystem(Configuration fsConf) throws IOException {
       }
     
       public DrillFileSystem(Configuration fsConf, OperatorStats operatorStats) throws IOException {
    -    this.underlyingFs = FileSystem.get(fsConf);
    +    this(fsConf, URI.create(fsConf.getRaw(FS_DEFAULT_NAME_KEY)), operatorStats);
    +  }
    +
    +  public DrillFileSystem(Configuration fsConf, URI Uri, OperatorStats operatorStats) throws IOException {
    +    this.underlyingFs = FileSystem.get(Uri, fsConf);
    +    logger.trace("Configuration for the DrillFileSystem " + fsConf.getRaw(FS_DEFAULT_NAME_KEY) +
    +        ", underlyingFs: " + this.underlyingFs.getUri());
         this.codecFactory = new CompressionCodecFactory(fsConf);
         this.operatorStats = operatorStats;
    +    setConf(fsConf);
       }
     
       @Override
       public void setConf(Configuration conf) {
    --- End diff --
    
    This is existing code. But, the existence of this method is probably very poor design. This method says that we create a file system using one config, then change the config later. Does the class work if we start with, say, "file:///" then change to HDFS? The `setConf()` method should probably be removed and code in this class should say that the config is immutable once the `DrillFileSystem` is created. Else, all bets are off on semantics.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #796: DRILL-5365: DrillFileSystem setConf in constructor....

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/796#discussion_r108049374
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/DrillFileSystem.java ---
    @@ -89,22 +89,36 @@ public DrillFileSystem(Configuration fsConf) throws IOException {
       }
     
       public DrillFileSystem(Configuration fsConf, OperatorStats operatorStats) throws IOException {
    -    this.underlyingFs = FileSystem.get(fsConf);
    +    this(fsConf, URI.create(fsConf.getRaw(FS_DEFAULT_NAME_KEY)), operatorStats);
    +  }
    +
    +  public DrillFileSystem(Configuration fsConf, URI Uri, OperatorStats operatorStats) throws IOException {
    +    this.underlyingFs = FileSystem.get(Uri, fsConf);
    +    logger.trace("Configuration for the DrillFileSystem " + fsConf.getRaw(FS_DEFAULT_NAME_KEY) +
    +        ", underlyingFs: " + this.underlyingFs.getUri());
         this.codecFactory = new CompressionCodecFactory(fsConf);
         this.operatorStats = operatorStats;
    +    setConf(fsConf);
    --- End diff --
    
    I think this whole code block needs more thought. DrillFileSystem derives from the Hadoop `FileSystem` class, but we never invoke the `FileSystem` constructor. Why? Without proper init, can we guarantee that the methods defined in the base class actually work?
    
    On the other hand, it seems we've reimplemented all base class methods, basically masking the base class implementation. If that is true, why set the config on the base class?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #796: DRILL-5365: DrillFileSystem setConf in constructor. DrillF...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/796
  
    Is this PR ready for final review? Or, on hold waiting for something?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #796: DRILL-5365: DrillFileSystem setConf in constructor....

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/796#discussion_r108049239
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/DrillFileSystem.java ---
    @@ -89,22 +89,36 @@ public DrillFileSystem(Configuration fsConf) throws IOException {
       }
     
       public DrillFileSystem(Configuration fsConf, OperatorStats operatorStats) throws IOException {
    -    this.underlyingFs = FileSystem.get(fsConf);
    +    this(fsConf, URI.create(fsConf.getRaw(FS_DEFAULT_NAME_KEY)), operatorStats);
    --- End diff --
    
    Actually, why do we need to do this step? The original code seems to do the right thing:
    
    ```
    class FileSystem ...
      public static FileSystem get(Configuration conf) throws IOException {
        return get(getDefaultUri(conf), conf);
      }
      public static URI getDefaultUri(Configuration conf) {
        return URI.create(fixName(conf.get(FS_DEFAULT_NAME_KEY, DEFAULT_FS)));
      }
    ```
    
    That is, the original code gets the file system using the URI stored in the config. Standard practice is that the caller must have set the file system property: that is how we tell a "file:///" system from an HDFS system, etc.
    
    So, isn't the problem here with the caller?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #796: DRILL-5365: DrillFileSystem setConf in constructor....

Posted by chunhui-shi <gi...@git.apache.org>.
Github user chunhui-shi commented on a diff in the pull request:

    https://github.com/apache/drill/pull/796#discussion_r115107663
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/DrillFileSystem.java ---
    @@ -89,22 +89,36 @@ public DrillFileSystem(Configuration fsConf) throws IOException {
       }
     
       public DrillFileSystem(Configuration fsConf, OperatorStats operatorStats) throws IOException {
    -    this.underlyingFs = FileSystem.get(fsConf);
    +    this(fsConf, URI.create(fsConf.getRaw(FS_DEFAULT_NAME_KEY)), operatorStats);
    +  }
    +
    +  public DrillFileSystem(Configuration fsConf, URI Uri, OperatorStats operatorStats) throws IOException {
    --- End diff --
    
    Yes, this can be removed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill issue #796: DRILL-5365: DrillFileSystem setConf in constructor. DrillF...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/796
  
    Would be incredibly useful to provide a description of the problem and fix: either here in the PR or as a comment in the JIRA. Saves having the reviewer having to duplicate your debugging and diagnostic efforts. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #796: DRILL-5365: DrillFileSystem setConf in constructor....

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/796#discussion_r108048339
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/DrillFileSystem.java ---
    @@ -89,22 +89,36 @@ public DrillFileSystem(Configuration fsConf) throws IOException {
       }
     
       public DrillFileSystem(Configuration fsConf, OperatorStats operatorStats) throws IOException {
    -    this.underlyingFs = FileSystem.get(fsConf);
    +    this(fsConf, URI.create(fsConf.getRaw(FS_DEFAULT_NAME_KEY)), operatorStats);
    --- End diff --
    
    Not sure that `getRaw()` is what we want:
    
    > Get the value of the name property, without doing variable expansion.
    
    I suspect we want `getTrimmed()`.
    
    Also, can the value be null? If so, should we fall back to `FileSystem.DEFAULT_FS`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #796: DRILL-5365: DrillFileSystem setConf in constructor....

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/796#discussion_r108049396
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/DrillFileSystem.java ---
    @@ -89,22 +89,36 @@ public DrillFileSystem(Configuration fsConf) throws IOException {
       }
     
       public DrillFileSystem(Configuration fsConf, OperatorStats operatorStats) throws IOException {
    -    this.underlyingFs = FileSystem.get(fsConf);
    +    this(fsConf, URI.create(fsConf.getRaw(FS_DEFAULT_NAME_KEY)), operatorStats);
    +  }
    +
    +  public DrillFileSystem(Configuration fsConf, URI Uri, OperatorStats operatorStats) throws IOException {
    +    this.underlyingFs = FileSystem.get(Uri, fsConf);
    +    logger.trace("Configuration for the DrillFileSystem " + fsConf.getRaw(FS_DEFAULT_NAME_KEY) +
    +        ", underlyingFs: " + this.underlyingFs.getUri());
         this.codecFactory = new CompressionCodecFactory(fsConf);
         this.operatorStats = operatorStats;
    +    setConf(fsConf);
       }
     
       @Override
       public void setConf(Configuration conf) {
         // Guard against setConf(null) call that is called as part of superclass constructor (Configured) of the
         // DrillFileSystem, at which point underlyingFs is null.
    -    if (conf != null && underlyingFs != null) {
    -      underlyingFs.setConf(conf);
    +    if(conf != null) {
    +      super.setConf(conf);
    +      if (underlyingFs != null && underlyingFs.getConf() == null) {
    +        underlyingFs.setConf(conf);
    +      }
    +
         }
       }
     
       @Override
       public Configuration getConf() {
    +    if (super.getConf() != null) {
    --- End diff --
    
    Shouldn't the base class config and the underlying FS config be the same? If they differ, how do we have any idea which is the correct one? In what situation would we want the base class config but not the underlying FS config?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #796: DRILL-5365: DrillFileSystem setConf in constructor....

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/796#discussion_r108049241
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/DrillFileSystem.java ---
    @@ -89,22 +89,36 @@ public DrillFileSystem(Configuration fsConf) throws IOException {
       }
     
       public DrillFileSystem(Configuration fsConf, OperatorStats operatorStats) throws IOException {
    -    this.underlyingFs = FileSystem.get(fsConf);
    +    this(fsConf, URI.create(fsConf.getRaw(FS_DEFAULT_NAME_KEY)), operatorStats);
    +  }
    +
    +  public DrillFileSystem(Configuration fsConf, URI Uri, OperatorStats operatorStats) throws IOException {
    --- End diff --
    
    This constructor is added to allow passing a URI. But, the rules for `FileSystem` (see above), is that the URI is stored in the config as a property. So, is this constructor needed? (One argument for using it is that the base class, `FileSystem`, does provide this form.)
    
    Also, this constructor is new. But there is no code in this PR that calls this constructor. So, how does this new (unused) constructor fix the bug? Perhaps a bit of explanation would clear up the mystery...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] drill pull request #796: DRILL-5365: DrillFileSystem setConf in constructor....

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/796#discussion_r108049425
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPlugin.java ---
    @@ -74,6 +74,7 @@ public FileSystemPlugin(FileSystemConfig config, DrillbitContext context, String
               fsConf.set(s, config.config.get(s));
             }
           }
    +      fsConf.set("fs.default.name", config.connection);
    --- End diff --
    
    Please use `FileSystem.FS_DEFAULT_NAME_KEY`. It seems that `"fs.default.name"` is deprecated.
    
    But, note, that is exactly what the next line of code does. So, perhaps there is more going on here than a first read suggests? Or, are we confusing ourselves about the meaning of the two properties?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---