You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Alan Gates (JIRA)" <ji...@apache.org> on 2018/05/05 17:35:00 UTC
[jira] [Commented] (HIVE-19429) Investigate alternative technologies like docker containers to increase parallelism

    [ https://issues.apache.org/jira/browse/HIVE-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16464846#comment-16464846 ] 

Alan Gates commented on HIVE-19429:
-----------------------------------

I've been working on the side on a tool to run all the Hive tests using docker.  You can take a look at it at [https://github.com/alanfgates/dtest]  It might be useful for reworking ptest or as a base for a new tool.

It works by first building a docker image from a git repo and branch.  If this succeeds it then runs the tests in containers.  The output is analyzed for failures, errors, or timeouts.  At the end the user is presented a list of tests that failed or resulted in an error.

Currently it uses docker directly, so it is confined to a single host.  It should be straight forward to rework it to use Yarn, Kubernetes, or other container managers so it can run in a cluster.  I've been running it on a 32 core box with 10 simultaneous containers and it finishes in about 2 hours 20 minutes (of which the first 20 minutes is build).

Limitations:
 * Some tests fail in it that don't fail in ptest.  So far the ones I have looked at fail on the box I'm using whether from the command line or in the container, so I do not think the failures are related to the tool.  At least some of these are ordering issues with queries that don't use order by.  I haven't examined all of them.
 * I have not analyzed whether every test run by ptest is also run by this.  The numbers are in the ballpark.  Following the logic of ptest has been challenging.  It would be very nice if 'mvn install' did the right thing for all these tests, rather than requiring reading multiple other config files to figure out which qfiles to use.
 * I don't have the Spark itests running in it yet.  When I tried to run them before they failed.  I haven't gotten around to diagnosing the issue.
 * It doesn't clean up after itself.  It creates about 150 docker containers and an image for every build.  I've been leaving these around after the builds for debugging.  There is a separate tool (dtest-cleanup) that will clean up old images and containers.  Eventually this should be integrated into the tool.
 * There's also a jenkins launch script.  I have it running on an internal machine at Hortonworks.

Let me know if you want to use parts of this, or have me contribute it back to Hive in a patch.  Originally I was working on it inside Hive (as evidenced by the package names) but then I pulled it into a separate repo because it was easier than keeping it on a separate Hive branch.

> Investigate alternative technologies like docker containers to increase parallelism
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-19429
>                 URL: https://issues.apache.org/jira/browse/HIVE-19429
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Vihang Karajgaonkar
>            Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)