You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Alan Gates (Created) (JIRA)" <ji...@apache.org> on 2011/12/21 19:15:30 UTC

[jira] [Created] (HIVE-2670) A cluster test utility for Hive

A cluster test utility for Hive
-------------------------------

                 Key: HIVE-2670
                 URL: https://issues.apache.org/jira/browse/HIVE-2670
             Project: Hive
          Issue Type: New Feature
          Components: Testing Infrastructure
            Reporter: Alan Gates


Hive has an extensive set of unit tests, but it does not have an infrastructure for testing in a cluster environment.  Pig and HCatalog have been using a test harness for cluster testing for some time.  We have written Hive drivers and tests to run in this harness.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2670) A cluster test utility for Hive

Posted by "Alan Gates (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates updated HIVE-2670:
-----------------------------

    Attachment: harness.tar
                hive_cluster_test.patch
    
> A cluster test utility for Hive
> -------------------------------
>
>                 Key: HIVE-2670
>                 URL: https://issues.apache.org/jira/browse/HIVE-2670
>             Project: Hive
>          Issue Type: New Feature
>          Components: Testing Infrastructure
>            Reporter: Alan Gates
>         Attachments: harness.tar, hive_cluster_test.patch
>
>
> Hive has an extensive set of unit tests, but it does not have an infrastructure for testing in a cluster environment.  Pig and HCatalog have been using a test harness for cluster testing for some time.  We have written Hive drivers and tests to run in this harness.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2670) A cluster test utility for Hive

Posted by "Zhenxiao Luo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13279174#comment-13279174 ] 

Zhenxiao Luo commented on HIVE-2670:
------------------------------------

@Alan

1. Step#4 and Step#5 are the same, is there anything special to setup?
2. Is there a restore script used to restore mysql's initial state. Otherwise, if there is anything wrong running the patch, always have to restore manually
3. Any hints on setting permissions in the mysql script? Always get into problems when loading data using the mysql script.
                
> A cluster test utility for Hive
> -------------------------------
>
>                 Key: HIVE-2670
>                 URL: https://issues.apache.org/jira/browse/HIVE-2670
>             Project: Hive
>          Issue Type: New Feature
>          Components: Testing Infrastructure
>            Reporter: Alan Gates
>         Attachments: harness.tar, hive_cluster_test.patch, hive_cluster_test_2.patch
>
>
> Hive has an extensive set of unit tests, but it does not have an infrastructure for testing in a cluster environment.  Pig and HCatalog have been using a test harness for cluster testing for some time.  We have written Hive drivers and tests to run in this harness.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2670) A cluster test utility for Hive

Posted by "Johnny Zhang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481899#comment-13481899 ] 

Johnny Zhang commented on HIVE-2670:
------------------------------------

I just apply the patch and successfully deploy the data and kick off a fully run. Will update more when the test is complete. 
{noformat}
deploy-base:
     [exec] ================================================================================================
     [exec] LOGGING RESULTS TO /root/hive/test-e2e/testdist/./out/log/test_harnesss_1350946336
     [exec] ================================================================================================
     [exec] Generating data for studenttab10k
     [exec] Loading data into Hive for studenttab10k
     [exec] Loading data into MySQL for studenttab10k
     [exec] Generating data for votertab10k
     [exec] Loading data into Hive for votertab10k
     [exec] Loading data into MySQL for votertab10k
     [exec] Generating data for studentparttab30k
     [exec] Loading data into Hive for studentparttab30k
     [exec] Loading data into MySQL for studentparttab30k
     [exec] Generating data for studentnull10k
     [exec] Loading data into Hive for studentnull10k
     [exec] Loading data into MySQL for studentnull10k
     [exec] Generating data for all100k
     [exec] Loading data into Hive for all100k
     [exec] Loading data into MySQL for all100k
     [exec] Final results , PASSED: 0 FAILED: 0 SKIPPED: 0 ABORTED: 0 FAILED DEPENDENCY: 0

BUILD SUCCESSFUL
......
test-base:
     [exec] ================================================================================================
     [exec] LOGGING RESULTS TO /root/hive/test-e2e/testdist/./out/log/test_harnesss_1350946471
     [exec] ================================================================================================
     [exec] Results so far, PASSED: 1 FAILED: 0 SKIPPED: 0 ABORTED: 0 FAILED DEPENDENCY: 0
     [exec] Results so far, PASSED: 1 FAILED: 0 SKIPPED: 0 ABORTED: 1 FAILED DEPENDENCY: 0
     [exec] Results so far, PASSED: 1 FAILED: 0 SKIPPED: 0 ABORTED: 2 FAILED DEPENDENCY: 0
     [exec] Results so far, PASSED: 1 FAILED: 0 SKIPPED: 0 ABORTED: 3 FAILED DEPENDENCY: 0
     [exec] Results so far, PASSED: 1 FAILED: 0 SKIPPED: 0 ABORTED: 4 FAILED DEPENDENCY: 0
     [exec] Results so far, PASSED: 1 FAILED: 0 SKIPPED: 0 ABORTED: 5 FAILED DEPENDENCY: 0
     [exec] Results so far, PASSED: 1 FAILED: 0 SKIPPED: 0 ABORTED: 6 FAILED DEPENDENCY: 0
     [exec] Results so far, PASSED: 1 FAILED: 0 SKIPPED: 0 ABORTED: 7 FAILED DEPENDENCY: 0
     [exec] Results so far, PASSED: 1 FAILED: 0 SKIPPED: 0 ABORTED: 8 FAILED DEPENDENCY: 0
     [exec] Results so far, PASSED: 1 FAILED: 0 SKIPPED: 0 ABORTED: 9 FAILED DEPENDENCY: 0
{noformat}
                
> A cluster test utility for Hive
> -------------------------------
>
>                 Key: HIVE-2670
>                 URL: https://issues.apache.org/jira/browse/HIVE-2670
>             Project: Hive
>          Issue Type: New Feature
>          Components: Testing Infrastructure
>            Reporter: Alan Gates
>            Assignee: Zhenxiao Luo
>         Attachments: harness.tar, hive_cluster_test_2.patch, hive_cluster_test.patch
>
>
> Hive has an extensive set of unit tests, but it does not have an infrastructure for testing in a cluster environment.  Pig and HCatalog have been using a test harness for cluster testing for some time.  We have written Hive drivers and tests to run in this harness.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2670) A cluster test utility for Hive

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13268595#comment-13268595 ] 

Namit Jain commented on HIVE-2670:
----------------------------------

https://cwiki.apache.org/confluence/display/Hive/End2EndTests
                
> A cluster test utility for Hive
> -------------------------------
>
>                 Key: HIVE-2670
>                 URL: https://issues.apache.org/jira/browse/HIVE-2670
>             Project: Hive
>          Issue Type: New Feature
>          Components: Testing Infrastructure
>            Reporter: Alan Gates
>         Attachments: harness.tar, hive_cluster_test.patch, hive_cluster_test_2.patch
>
>
> Hive has an extensive set of unit tests, but it does not have an infrastructure for testing in a cluster environment.  Pig and HCatalog have been using a test harness for cluster testing for some time.  We have written Hive drivers and tests to run in this harness.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2670) A cluster test utility for Hive

Posted by "Alan Gates (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates updated HIVE-2670:
-----------------------------

    Attachment: hive_cluster_test_2.patch

Here's a second version of the patch, with changes to run tests for HIVE-2616.  They are included in cmdline.conf, and called HCat_sudo.  To run these two new tests it works better if you set up a Hive server and point the test harness at it by adding:

-Dharness.metastore.host=<hostname> -Dharness.metastore.port=<port> -Dharness.metastore.passwd=<passwd> -Dharness.metastore.thrift=1

You also need to define the user to sudo to, and your password to use to sudo.  Add these on the command line as:
-Dharness.sudo.to=<user_to_sudo_to> -Dharness.sudo.pass=<passwd>
                
> A cluster test utility for Hive
> -------------------------------
>
>                 Key: HIVE-2670
>                 URL: https://issues.apache.org/jira/browse/HIVE-2670
>             Project: Hive
>          Issue Type: New Feature
>          Components: Testing Infrastructure
>            Reporter: Alan Gates
>         Attachments: harness.tar, hive_cluster_test.patch, hive_cluster_test_2.patch
>
>
> Hive has an extensive set of unit tests, but it does not have an infrastructure for testing in a cluster environment.  Pig and HCatalog have been using a test harness for cluster testing for some time.  We have written Hive drivers and tests to run in this harness.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2670) A cluster test utility for Hive

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-2670:
---------------------------------

    Assignee: Zhenxiao Luo
    
> A cluster test utility for Hive
> -------------------------------
>
>                 Key: HIVE-2670
>                 URL: https://issues.apache.org/jira/browse/HIVE-2670
>             Project: Hive
>          Issue Type: New Feature
>          Components: Testing Infrastructure
>            Reporter: Alan Gates
>            Assignee: Zhenxiao Luo
>         Attachments: harness.tar, hive_cluster_test.patch, hive_cluster_test_2.patch
>
>
> Hive has an extensive set of unit tests, but it does not have an infrastructure for testing in a cluster environment.  Pig and HCatalog have been using a test harness for cluster testing for some time.  We have written Hive drivers and tests to run in this harness.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2670) A cluster test utility for Hive

Posted by "Johnny Zhang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13497551#comment-13497551 ] 

Johnny Zhang commented on HIVE-2670:
------------------------------------

{noformat}
Final results , PASSED: 101 FAILED: 7 SKIPPED: 0 ABORTED: 1 FAILED DEPENDENCY: 0
{noformat}
I looked the 7 failures:

(1) the 5 FAILED case seems because hive and mysql has slight different float number calculation result, for example, hive get -10058.09 while mysql get -10058.08,
(2) another 2 FAILED case are because of hcatlog related test case (seems need start hive-metastore process to make it work)

still looking how to fix it
                
> A cluster test utility for Hive
> -------------------------------
>
>                 Key: HIVE-2670
>                 URL: https://issues.apache.org/jira/browse/HIVE-2670
>             Project: Hive
>          Issue Type: New Feature
>          Components: Testing Infrastructure
>            Reporter: Alan Gates
>            Assignee: Johnny Zhang
>         Attachments: harness.tar, hive_cluster_test_2.patch, hive_cluster_test.patch
>
>
> Hive has an extensive set of unit tests, but it does not have an infrastructure for testing in a cluster environment.  Pig and HCatalog have been using a test harness for cluster testing for some time.  We have written Hive drivers and tests to run in this harness.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HIVE-2670) A cluster test utility for Hive

Posted by "Johnny Zhang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Johnny Zhang reassigned HIVE-2670:
----------------------------------

    Assignee: Johnny Zhang  (was: Zhenxiao Luo)
    
> A cluster test utility for Hive
> -------------------------------
>
>                 Key: HIVE-2670
>                 URL: https://issues.apache.org/jira/browse/HIVE-2670
>             Project: Hive
>          Issue Type: New Feature
>          Components: Testing Infrastructure
>            Reporter: Alan Gates
>            Assignee: Johnny Zhang
>         Attachments: harness.tar, hive_cluster_test_2.patch, hive_cluster_test.patch
>
>
> Hive has an extensive set of unit tests, but it does not have an infrastructure for testing in a cluster environment.  Pig and HCatalog have been using a test harness for cluster testing for some time.  We have written Hive drivers and tests to run in this harness.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2670) A cluster test utility for Hive

Posted by "Zhenxiao Luo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13279190#comment-13279190 ] 

Zhenxiao Luo commented on HIVE-2670:
------------------------------------

In testdist/studenttab10k.mysql.sql:

load data infile '' into table studenttab10k ...

should be updated to:

load data local infile '' into table studenttab10k ...

mysql errors out with the first syntax, and setting permissions could not fix it.

                
> A cluster test utility for Hive
> -------------------------------
>
>                 Key: HIVE-2670
>                 URL: https://issues.apache.org/jira/browse/HIVE-2670
>             Project: Hive
>          Issue Type: New Feature
>          Components: Testing Infrastructure
>            Reporter: Alan Gates
>         Attachments: harness.tar, hive_cluster_test.patch, hive_cluster_test_2.patch
>
>
> Hive has an extensive set of unit tests, but it does not have an infrastructure for testing in a cluster environment.  Pig and HCatalog have been using a test harness for cluster testing for some time.  We have written Hive drivers and tests to run in this harness.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2670) A cluster test utility for Hive

Posted by "Alan Gates (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174292#comment-13174292 ] 

Alan Gates commented on HIVE-2670:
----------------------------------

Attached a first patch.  This is not ready for inclusion yet, I'm just putting it up here to start getting feedback.  The following will need to be resolved before it is checked in:
# Currently it just has the base harness code included as a tar file.  This really should be externed from the Pig code base, as HCatalog does.
# I don't know if this is the right place in SVN or not.  I put it all in a test-e2e directory right under trunk.  I need feedback on whether this is a good spot or somewhere else would be preferred.
# Connect the top level build.xml to this so it is possible to invoke the tests from the top level directory.  I was waiting to do this until I had feedback on the proper directory structure.

How to use it:

After applying the patch you will need to copy the harness.tar file (attached) to test-e2e, since that is not done for you by the patch tool.

First you need an existing Hadoop cluster (it can be very small, just a few nodes) and a MySQL database.  I ran my tests against Hadoop 0.20.205.0, but this should run against any 0.20.x version of Hadoop.  Then:
# Run the script test-e2e/scripts/create_test_db.sql against your MySQL database as a user that can create users and databases, and grant to users (root is a good choice)
# Run "ant package" in the top level Hive directory
# cd test-e2e
# ant -Dharness.hadoop.home=<path_to_hadoop_home> -Dharness.hive.home=<path_to_hive_you_want_to_test> deploy
# ant -Dharness.hadoop.home=<path_to_hadoop_home> -Dharness.hive.home=<path_to_hive_you_want_to_test> deploy

Usually <path_to_hive_you_want_to_test> will be $CWD/../build/dist

The basic design of this test harness is each test consists of three phases:  run_test, generate_benchmark, and compare_results.  In run_test a particular test is run.  generate_benchmark runs the same or a similar test against a known source of truth.  compare_results then compares the results and declares the test to have succeeded, failed, or aborted.  The harness delegates each of these three functions to drivers that are specific to different types of tests.

This patch includes two drivers, a Hive driver and a Hive command line driver.  The Hive driver uses the MySQL database as a source of truth.  Each SQL script is run against Hive and against MySQL and the results compared using the Unix cksum tool.  

For more information on the test harness, including how to add tests to it, see https://cwiki.apache.org/confluence/display/PIG/HowToTest  The Hive driver does not yet support running alternate SQL for benchmarking nor using an old version of Hive for the benchmarks, though those should be added sometime.

                
> A cluster test utility for Hive
> -------------------------------
>
>                 Key: HIVE-2670
>                 URL: https://issues.apache.org/jira/browse/HIVE-2670
>             Project: Hive
>          Issue Type: New Feature
>          Components: Testing Infrastructure
>            Reporter: Alan Gates
>         Attachments: harness.tar, hive_cluster_test.patch
>
>
> Hive has an extensive set of unit tests, but it does not have an infrastructure for testing in a cluster environment.  Pig and HCatalog have been using a test harness for cluster testing for some time.  We have written Hive drivers and tests to run in this harness.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-2670) A cluster test utility for Hive

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13279200#comment-13279200 ] 

Carl Steinbach commented on HIVE-2670:
--------------------------------------

@Alan: Zhenxiao and I will work together to refresh this patch, taking into account the new e2e stuff that's gone into HCatalog (thanks for the pointers).
                
> A cluster test utility for Hive
> -------------------------------
>
>                 Key: HIVE-2670
>                 URL: https://issues.apache.org/jira/browse/HIVE-2670
>             Project: Hive
>          Issue Type: New Feature
>          Components: Testing Infrastructure
>            Reporter: Alan Gates
>         Attachments: harness.tar, hive_cluster_test.patch, hive_cluster_test_2.patch
>
>
> Hive has an extensive set of unit tests, but it does not have an infrastructure for testing in a cluster environment.  Pig and HCatalog have been using a test harness for cluster testing for some time.  We have written Hive drivers and tests to run in this harness.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira