You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by paul-rogers <gi...@git.apache.org> on 2017/10/26 16:47:30 UTC

[GitHub] drill pull request #1011: Drill 1170: Drill-on-YARN

GitHub user paul-rogers opened a pull request:

    https://github.com/apache/drill/pull/1011

    Drill 1170: Drill-on-YARN

    Provides Drill integration with YARN. Runs Drill as a long-running task under YARN. Monitors the Drill cluster, restarting failed Drillbits. Provides a command-line UI to start, stop and resize the cluster. Provides a web-based UI to monitor the cluster.
    
    The Drill-on-YARN (DoY) code has been in use by commercial users for over a year, since Drill 1.8 and has proven quite stable. Usage has been on MapR's version of YARN, we seek feedback from users of the Apache and other versions of YARN.
    
    See [DRILL-1170](https://issues.apache.org/jira/browse/DRILL-1170) for design information. See the included `README.md`` for internals information and `USAGE.md` for a detailed user guide.
    
    This is a large PR; it will take time to review. The key goal at this moment is to allow interested users to download the PR, build DoY, and try it out in their environments. The DoY code is mostly independent of Drill itself. The DoY code can be used to launch any version of Drill since 1.8. See the usage guide for information.
    
    It has been suggested that the code move to the `contrib` directory. That change will be made. But, since the code works successfully in its current location; we'll leave it their for now to ensure users are successful if they choose to try it.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/paul-rogers/drill DRILL-1170

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/1011.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1011
    
----
commit e26012bc56cad3bf2819dff3bbdf70664de34955
Author: Paul Rogers <pr...@maprtech.com>
Date:   2017-10-26T07:24:00Z

    DRILL-1170: YARN integration for Drill
    
    This commit includes documentation files.

commit 3a6ffe78d9fe0e9a5beacd100e2e0ee6b40c7f34
Author: Paul Rogers <pr...@maprtech.com>
Date:   2017-10-26T07:25:34Z

    Client app

commit 509410c9710fc1ff23a4a13f51320a4c153f9328
Author: Paul Rogers <pr...@maprtech.com>
Date:   2017-10-26T07:26:50Z

    Files common to several modules

commit 36b8d323118077115ef1097ba8b23f6fc4a5390a
Author: Paul Rogers <pr...@maprtech.com>
Date:   2017-10-26T07:42:17Z

    Application master

commit 7fbc387634bb1acaf7807b02348b40364c81d282
Author: Paul Rogers <pr...@maprtech.com>
Date:   2017-10-26T07:44:33Z

    App Master web UI

commit 21fb93792290625b719899a4742573b8c3d4a7ce
Author: Paul Rogers <pr...@maprtech.com>
Date:   2017-10-26T07:45:44Z

    Distribution and project files

commit 567d36787b9ada60dd2141077e629158c53fc0c4
Author: Paul Rogers <pr...@maprtech.com>
Date:   2017-10-26T07:47:54Z

    Test files

----


---

[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN

Posted by Agirish <gi...@git.apache.org>.
Github user Agirish commented on the issue:

    https://github.com/apache/drill/pull/1011
  
    @paul-rogers, I'll give it a try and update with my findings. 


---

[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN

Posted by kr-arjun <gi...@git.apache.org>.
Github user kr-arjun commented on the issue:

    https://github.com/apache/drill/pull/1011
  
    @paul-rogers  I was able to resolve this issue by workaround of setting 'yarn.timeline-service.enabled' to false ( Copied yarn-site.xml with this property set to $DRILL_SITE directory). 
    
    This issue is specific to environment where Timeline server is enabled. Initially , it failed with 'java.lang.NoClassDefFoundError: com/sun/jersey/api/client/config/ClientConfig'. On copying required jars to Drill classpath , it failed with exception I have shared in the previous attachment. The same issue is reported in Spark as well (https://issues.apache.org/jira/browse/SPARK-15343). To find the error stack trace, I had to modify the DrillOnYarn.java to print StackTrace. Thought it would be useful if stack trace can be logged for troubleshooting purpose.


---

[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/1011
  
    Failing in Travis, apparently due to test-framework issue:
    ```
    Caused by: java.lang.ClassNotFoundException: org.apache.drill.categories.SecurityTest
    ```
    
    @ilooner, any idea what's going on? 


---

[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/1011
  
    @kr-arjun, thanks for your note on error handling. Where you using the `start` command? There is exactly one place where the error "Failed to start Drill application master" is thrown: it is when Drill-on-YARN fails to start the application master. There are lots of other messages for other issues such as "Error: AM already running as Application ID: 1234" or "Failed to allocate Drill application master."
    
    When writing the client, I made an explicit decision not to create a log file to avoid cluttering up things. There is no good place to put a client log since Drill does not actually run on the client machine. We could add a log, but it would be messy.
    
    What we can do, however, is include the text of the message we got from YARN when we tried to start the AM process. When an error occurs, the client will now print something line the following:
    
    ```
    Failed to start Drill application master.
      Caused by: Some YARN error
    ```
    
    The other thing we can add is a full stack dump, but only when requested with the `-v` (verbose) option:
    
    ```
    > drill-on-yarn.sh -v start
    Failed to start Drill application master.
      Caused by: Some YARN error
    Full stack trace:
    (stack trace here)
    ```
    
    I can't easily test this code. Please grab the latest sources and rerun your test case to ensure that it now prints out more information: whatever YARN tells us about why it would not start the AM.


---

[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on the issue:

    https://github.com/apache/drill/pull/1011
  
    @paul-rogers You need to add this dependency to your drill-yarn pom.xml
    
    ```
        <dependency>
          <groupId>org.apache.drill</groupId>
          <artifactId>drill-common</artifactId>
          <version>${project.version}</version>
          <classifier>tests</classifier>
          <scope>test</scope>
        </dependency>
    ```


---

[GitHub] drill pull request #1011: Drill 1170: Drill-on-YARN

Posted by arina-ielchiieva <gi...@git.apache.org>.
Github user arina-ielchiieva commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1011#discussion_r172052984
  
    --- Diff: distribution/src/assemble/bin.xml ---
    @@ -323,6 +333,21 @@
           <source>src/resources/sqlline.bat</source>
           <outputDirectory>bin</outputDirectory>
         </file>
    +    <file>
    --- End diff --
    
    Sure, the other will be fixed in PR-1139.


---

[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/1011
  
    Rebased on latest master and resolved merge conflicts.
    
    Some ZK-related classes changed. Would be good if Abhishek could do a quick sanity test on his test cluster to make sure things still work.
    
    This is a "minimum viable product" (MVP). It omits many nice-to-haves such as security, graceful shutdown, recovery from YARN RM failures and so on. Folks should feel free to file JIRAs for these enhancements as they find the need for them.


---

[GitHub] drill pull request #1011: Drill 1170: Drill-on-YARN

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1011#discussion_r172039445
  
    --- Diff: distribution/src/assemble/bin.xml ---
    @@ -323,6 +333,21 @@
           <source>src/resources/sqlline.bat</source>
           <outputDirectory>bin</outputDirectory>
         </file>
    +    <file>
    --- End diff --
    
    @arina-ielchiieva, I grabbed the latest master and rebased. But, I see that other scripts still use the old permissions:
    
    ```
        <file>
          <source>src/resources/drillbit.sh</source>
          <fileMode>0755</fileMode>
          <outputDirectory>bin</outputDirectory>
        </file>
    ``` 
    
    Is this a change that someone is making in another branch?
    
    Also, I noticed that we set the execute bit on scripts that are only ever sourced:
    
    ```
       <file>
          <source>src/resources/drill-conf</source>
          <fileMode>0755</fileMode>
          <outputDirectory>bin</outputDirectory>
        </file>
    ```
    
    These should probably change to 0640 also.
    
    I've gone ahead and changed the DoY files; please have someone change the others.


---

[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/1011
  
    @arina-ielchiieva, thanks much for your help with this PR. Glad to see it is finally in Drill master after all this time! 


---

[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/1011
  
    Fixed the drill-common dependency as @ilooner requested.


---

[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN

Posted by Agirish <gi...@git.apache.org>.
Github user Agirish commented on the issue:

    https://github.com/apache/drill/pull/1011
  
    Looks good!
    
    +1. Getting this into AD 1.13.0 would be great for users.


---

[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/1011
  
    Rebased onto latest master.


---

[GitHub] drill pull request #1011: Drill 1170: Drill-on-YARN

Posted by arina-ielchiieva <gi...@git.apache.org>.
Github user arina-ielchiieva commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1011#discussion_r172023820
  
    --- Diff: distribution/src/assemble/bin.xml ---
    @@ -323,6 +333,21 @@
           <source>src/resources/sqlline.bat</source>
           <outputDirectory>bin</outputDirectory>
         </file>
    +    <file>
    --- End diff --
    
    Per security recommendations we now add in Drill permissions should be the following:
    conf files - 0640
    sh files - 0750



---

[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN

Posted by Agirish <gi...@git.apache.org>.
Github user Agirish commented on the issue:

    https://github.com/apache/drill/pull/1011
  
    @arina-ielchiieva, sorry was held-up with something. I've just started on this - will get back shortly. 


---

[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN

Posted by sachouche <gi...@git.apache.org>.
Github user sachouche commented on the issue:

    https://github.com/apache/drill/pull/1011
  
    +1
    I have reviewed the code and overall looks good. My main feedback is that the current implementation doesn't currently support secure clusters (at least didn't see any logic associated with that). Yarn applications have issues staying up for a long time because of ticket renewal limitations. We might want to create another enhancement JIRA to support such use-cases.


---

[GitHub] drill pull request #1011: Drill 1170: Drill-on-YARN

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/drill/pull/1011


---

[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN

Posted by arina-ielchiieva <gi...@git.apache.org>.
Github user arina-ielchiieva commented on the issue:

    https://github.com/apache/drill/pull/1011
  
    @paul-rogers when unit running tests with mapr profile, they fail because this commit bring banned dependency:
    `[INFO] --- maven-enforcer-plugin:1.3.1:enforce (avoid_bad_dependencies) @ drill-java-exec ---
    [WARNING] Rule 0: org.apache.maven.plugins.enforcer.BannedDependencies failed with message:
    Found Banned Dependency: org.json:json:jar:20080701
    Use 'mvn dependency:tree' to locate the source of the banned dependencies.`
    
    Please use `mvn dependency:tree -Dincludes=org.json:json -Pmapr` to see the results.


---

[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/1011
  
    @arina-ielchiieva, turned out that there were unneeded dependencies in the DoY additions to the drill-root pom.xml file. Removed these and the json.org warnings went away.
    
    Please take a look at the new commits. If all looks good, I'll squash commits to prepare for merging into master. 


---

[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN

Posted by arina-ielchiieva <gi...@git.apache.org>.
Github user arina-ielchiieva commented on the issue:

    https://github.com/apache/drill/pull/1011
  
    @kr-arjun, I think logging full stack trace is good idea. Let's address in new Jira.
    +1, LGTM.


---

[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN

Posted by arina-ielchiieva <gi...@git.apache.org>.
Github user arina-ielchiieva commented on the issue:

    https://github.com/apache/drill/pull/1011
  
    It would be good to add this feature in the upcoming 1.13.0 Drill release. To do so we need to ensure the following:
    
    @paul-rogers
    1.  Could you please fix failures on Travis? Tim has added comment regarding the possible fix.
    2. Also it would great if you can file the Jira indicating what possible enhancements can be done. This will definitely help in future to identify main areas of improvement for Drill on Yarn.
    
    @Agirish 
    Did you have a chance to do sanity checks?


---

[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN

Posted by kr-arjun <gi...@git.apache.org>.
Github user kr-arjun commented on the issue:

    https://github.com/apache/drill/pull/1011
  
    @paul-rogers  
    Currently , the Client exception is being output as 'ClientContext.err.println(e.getMessage())' in DrillOnYarn.java. For most of application master launcher failures, only message available is 'Failed to start Drill application master'. Do you think it would benefit troubleshooting Drill on yarn client failures if exception stacktrace can be logged? 



---

[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN

Posted by priteshm <gi...@git.apache.org>.
Github user priteshm commented on the issue:

    https://github.com/apache/drill/pull/1011
  
    @sachouche @vrozov @arina-ielchiieva please review


---

[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN

Posted by kr-arjun <gi...@git.apache.org>.
Github user kr-arjun commented on the issue:

    https://github.com/apache/drill/pull/1011
  
    @paul-rogers The client error message changes look good. I did quick test with client error message changes and could verify that error message are logged. 
    
    > Where you using the start command?"
    
    Yes, I was trying to start DoY in YARN environment with timeline server enabled. It failed to start Drill application master due to timeline client related error. Since it failed within DOY client process, there were no stack trace available.  
    
    Attaching test scenarios of the changes for your reference.
    
    [DOY-client-error-logging.txt](https://github.com/apache/drill/files/1778349/DOY-client-error-logging.txt)



---

[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/1011
  
    @kr-arjun, thanks for the text file. The error is related to security. DoY, in its current for, is an "MVP": it works, but leaves off advanced features. One of those missing features is to work with a secure cluster.
    
    Please file a JIRA asking for DoY to support a secure cluster. While at it, please look at the internal JIRA and locate all DoY enhancements or bugs. Now that DoY is part of Drill, those tickets should be moved to the public Apache Drill Jira. (I can't do it because I don't have access to the internal tickets.) 


---

[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/1011
  
    @arina-ielchiieva, do you want to give this one a committer +1? Then I'll mark it ready-to-commit. Thanks! 


---

[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/1011
  
    Rebased onto latest master.


---

[GitHub] drill issue #1011: Drill 1170: Drill-on-YARN

Posted by arina-ielchiieva <gi...@git.apache.org>.
Github user arina-ielchiieva commented on the issue:

    https://github.com/apache/drill/pull/1011
  
    @paul-rogers based on @sachouche feedback could you please create Jira for enhancement and also resolve conflicts in bin.xml file? Thank you in advance!


---