You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Alan Gates (Commented) (JIRA)" <ji...@apache.org> on 2011/12/21 19:45:30 UTC

[jira] [Commented] (HIVE-2670) A cluster test utility for Hive

    [ https://issues.apache.org/jira/browse/HIVE-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174292#comment-13174292 ] 

Alan Gates commented on HIVE-2670:
----------------------------------

Attached a first patch.  This is not ready for inclusion yet, I'm just putting it up here to start getting feedback.  The following will need to be resolved before it is checked in:
# Currently it just has the base harness code included as a tar file.  This really should be externed from the Pig code base, as HCatalog does.
# I don't know if this is the right place in SVN or not.  I put it all in a test-e2e directory right under trunk.  I need feedback on whether this is a good spot or somewhere else would be preferred.
# Connect the top level build.xml to this so it is possible to invoke the tests from the top level directory.  I was waiting to do this until I had feedback on the proper directory structure.

How to use it:

After applying the patch you will need to copy the harness.tar file (attached) to test-e2e, since that is not done for you by the patch tool.

First you need an existing Hadoop cluster (it can be very small, just a few nodes) and a MySQL database.  I ran my tests against Hadoop 0.20.205.0, but this should run against any 0.20.x version of Hadoop.  Then:
# Run the script test-e2e/scripts/create_test_db.sql against your MySQL database as a user that can create users and databases, and grant to users (root is a good choice)
# Run "ant package" in the top level Hive directory
# cd test-e2e
# ant -Dharness.hadoop.home=<path_to_hadoop_home> -Dharness.hive.home=<path_to_hive_you_want_to_test> deploy
# ant -Dharness.hadoop.home=<path_to_hadoop_home> -Dharness.hive.home=<path_to_hive_you_want_to_test> deploy

Usually <path_to_hive_you_want_to_test> will be $CWD/../build/dist

The basic design of this test harness is each test consists of three phases:  run_test, generate_benchmark, and compare_results.  In run_test a particular test is run.  generate_benchmark runs the same or a similar test against a known source of truth.  compare_results then compares the results and declares the test to have succeeded, failed, or aborted.  The harness delegates each of these three functions to drivers that are specific to different types of tests.

This patch includes two drivers, a Hive driver and a Hive command line driver.  The Hive driver uses the MySQL database as a source of truth.  Each SQL script is run against Hive and against MySQL and the results compared using the Unix cksum tool.  

For more information on the test harness, including how to add tests to it, see https://cwiki.apache.org/confluence/display/PIG/HowToTest  The Hive driver does not yet support running alternate SQL for benchmarking nor using an old version of Hive for the benchmarks, though those should be added sometime.

                
> A cluster test utility for Hive
> -------------------------------
>
>                 Key: HIVE-2670
>                 URL: https://issues.apache.org/jira/browse/HIVE-2670
>             Project: Hive
>          Issue Type: New Feature
>          Components: Testing Infrastructure
>            Reporter: Alan Gates
>         Attachments: harness.tar, hive_cluster_test.patch
>
>
> Hive has an extensive set of unit tests, but it does not have an infrastructure for testing in a cluster environment.  Pig and HCatalog have been using a test harness for cluster testing for some time.  We have written Hive drivers and tests to run in this harness.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira