You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org> on 2010/07/14 02:25:52 UTC

[jira] Created: (HIVE-1464) improve test query performance

improve test query performance
------------------------------

                 Key: HIVE-1464
                 URL: https://issues.apache.org/jira/browse/HIVE-1464
             Project: Hadoop Hive
          Issue Type: Test
          Components: Testing Infrastructure
            Reporter: Joydeep Sen Sarma


clientpositive/negative tests are extremely slow.

one major problem seems to be that all the test warehouse tables are deleted and created/re-populated for each test. most of the times this is not required and if we can fix this the tests will run much faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1464) improve test query performance

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12890474#action_12890474 ] 

Joydeep Sen Sarma commented on HIVE-1464:
-----------------------------------------

very weird. i didn't see this in an earlier run on testclidriver and it is showing up and the new output is not correct. looking into it. a little foxed - the metadata output is not consistent with the state of the filesystem

> improve test query performance
> ------------------------------
>
>                 Key: HIVE-1464
>                 URL: https://issues.apache.org/jira/browse/HIVE-1464
>             Project: Hadoop Hive
>          Issue Type: Test
>          Components: Testing Infrastructure
>            Reporter: Joydeep Sen Sarma
>         Attachments: 1464.1.patch
>
>
> clientpositive/negative tests are extremely slow.
> one major problem seems to be that all the test warehouse tables are deleted and created/re-populated for each test. most of the times this is not required and if we can fix this the tests will run much faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1464) improve test query performance

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12890449#action_12890449 ] 

Ning Zhang commented on HIVE-1464:
----------------------------------

Looks good to me. 

John, any comments?

> improve test query performance
> ------------------------------
>
>                 Key: HIVE-1464
>                 URL: https://issues.apache.org/jira/browse/HIVE-1464
>             Project: Hadoop Hive
>          Issue Type: Test
>          Components: Testing Infrastructure
>            Reporter: Joydeep Sen Sarma
>         Attachments: 1464.1.patch
>
>
> clientpositive/negative tests are extremely slow.
> one major problem seems to be that all the test warehouse tables are deleted and created/re-populated for each test. most of the times this is not required and if we can fix this the tests will run much faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1464) improve test query performance

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joydeep Sen Sarma updated HIVE-1464:
------------------------------------

    Status: Patch Available  (was: Open)

> improve test query performance
> ------------------------------
>
>                 Key: HIVE-1464
>                 URL: https://issues.apache.org/jira/browse/HIVE-1464
>             Project: Hadoop Hive
>          Issue Type: Test
>          Components: Testing Infrastructure
>            Reporter: Joydeep Sen Sarma
>         Attachments: 1464.1.patch
>
>
> clientpositive/negative tests are extremely slow.
> one major problem seems to be that all the test warehouse tables are deleted and created/re-populated for each test. most of the times this is not required and if we can fix this the tests will run much faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1464) improve test query performance

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ning Zhang updated HIVE-1464:
-----------------------------

           Status: Resolved  (was: Patch Available)
    Fix Version/s: 0.7.0
       Resolution: Fixed

Committed. Thanks Joydeep!

> improve test query performance
> ------------------------------
>
>                 Key: HIVE-1464
>                 URL: https://issues.apache.org/jira/browse/HIVE-1464
>             Project: Hadoop Hive
>          Issue Type: Test
>          Components: Testing Infrastructure
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>             Fix For: 0.7.0
>
>         Attachments: 1464.1.patch
>
>
> clientpositive/negative tests are extremely slow.
> one major problem seems to be that all the test warehouse tables are deleted and created/re-populated for each test. most of the times this is not required and if we can fix this the tests will run much faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1464) improve test query performance

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12890527#action_12890527 ] 

John Sichi commented on HIVE-1464:
----------------------------------

No additional comments from me except that I am looking forward to faster tests!  :)

For Eigenbase, what we do is to have a set of fixture objects which tests are not supposed to mess with, and we have an automatic cleanup at the beginning of each test case which drops anything in the catalog other than fixture objects.  That allows us to avoid all those DROP TABLES at the beginning and ending of each test (and allows DROP TABLE to fail as it should when the object doesn't exist).

Joy's approach takes us most of the way there.


> improve test query performance
> ------------------------------
>
>                 Key: HIVE-1464
>                 URL: https://issues.apache.org/jira/browse/HIVE-1464
>             Project: Hadoop Hive
>          Issue Type: Test
>          Components: Testing Infrastructure
>            Reporter: Joydeep Sen Sarma
>         Attachments: 1464.1.patch
>
>
> clientpositive/negative tests are extremely slow.
> one major problem seems to be that all the test warehouse tables are deleted and created/re-populated for each test. most of the times this is not required and if we can fix this the tests will run much faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1464) improve test query performance

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12890562#action_12890562 ] 

Joydeep Sen Sarma commented on HIVE-1464:
-----------------------------------------

hey ning - i wasn't able to resolve the failing clientpositive test. i see that u didn't check in the new output - were u able to resolve it somehow.

> improve test query performance
> ------------------------------
>
>                 Key: HIVE-1464
>                 URL: https://issues.apache.org/jira/browse/HIVE-1464
>             Project: Hadoop Hive
>          Issue Type: Test
>          Components: Testing Infrastructure
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>             Fix For: 0.7.0
>
>         Attachments: 1464.1.patch
>
>
> clientpositive/negative tests are extremely slow.
> one major problem seems to be that all the test warehouse tables are deleted and created/re-populated for each test. most of the times this is not required and if we can fix this the tests will run much faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HIVE-1464) improve test query performance

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Sichi reassigned HIVE-1464:
--------------------------------

    Assignee: Joydeep Sen Sarma

> improve test query performance
> ------------------------------
>
>                 Key: HIVE-1464
>                 URL: https://issues.apache.org/jira/browse/HIVE-1464
>             Project: Hadoop Hive
>          Issue Type: Test
>          Components: Testing Infrastructure
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>         Attachments: 1464.1.patch
>
>
> clientpositive/negative tests are extremely slow.
> one major problem seems to be that all the test warehouse tables are deleted and created/re-populated for each test. most of the times this is not required and if we can fix this the tests will run much faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1464) improve test query performance

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12890597#action_12890597 ] 

Joydeep Sen Sarma commented on HIVE-1464:
-----------------------------------------

ok - i haven't been able to figure out why that test output changed. the new output is not correct. will investigate tomorrow.

> improve test query performance
> ------------------------------
>
>                 Key: HIVE-1464
>                 URL: https://issues.apache.org/jira/browse/HIVE-1464
>             Project: Hadoop Hive
>          Issue Type: Test
>          Components: Testing Infrastructure
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>             Fix For: 0.7.0
>
>         Attachments: 1464.1.patch
>
>
> clientpositive/negative tests are extremely slow.
> one major problem seems to be that all the test warehouse tables are deleted and created/re-populated for each test. most of the times this is not required and if we can fix this the tests will run much faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1464) improve test query performance

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12890753#action_12890753 ] 

Joydeep Sen Sarma commented on HIVE-1464:
-----------------------------------------

i think i found why it's happening.

these lines in build-common.xml:

    <copy todir="${test.data.dir}">
      <fileset dir="${test.src.data.dir}">
        <exclude name="**/.svn"/>
      </fileset>
    </copy>

are copying .gitignore to test/data/warehouse/src

it remains there after this. the reason this is happening now is that 'cleanup()' happens via metadata commands - and there are no tables in metadata at the beginning of the test. so nothing gets deleted. then a whole bunch of 'load data' commands are executed - which also don't delete old stuff.

after this - earlier each test would call cleanup() again - which would delete the entire directory. but now this is not done anymore. so the .gitignore stays there.

lots of ways of fixing this - but this whole data/warehouse dir and then copying it recursively is totally unnecessary i think. i will file a separate jira.

> improve test query performance
> ------------------------------
>
>                 Key: HIVE-1464
>                 URL: https://issues.apache.org/jira/browse/HIVE-1464
>             Project: Hadoop Hive
>          Issue Type: Test
>          Components: Testing Infrastructure
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>             Fix For: 0.7.0
>
>         Attachments: 1464.1.patch
>
>
> clientpositive/negative tests are extremely slow.
> one major problem seems to be that all the test warehouse tables are deleted and created/re-populated for each test. most of the times this is not required and if we can fix this the tests will run much faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1464) improve test query performance

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12890608#action_12890608 ] 

Ning Zhang commented on HIVE-1464:
----------------------------------

I commented out the cleanup function and found out there are 3 files in src's warehouse directory: kv1.txt, .kv1.txt.crc, and .gitignore. The first 2 should be correct but the .gitignore is the cause (56Bytes). I'm not sure where it is created though.

> improve test query performance
> ------------------------------
>
>                 Key: HIVE-1464
>                 URL: https://issues.apache.org/jira/browse/HIVE-1464
>             Project: Hadoop Hive
>          Issue Type: Test
>          Components: Testing Infrastructure
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>             Fix For: 0.7.0
>
>         Attachments: 1464.1.patch
>
>
> clientpositive/negative tests are extremely slow.
> one major problem seems to be that all the test warehouse tables are deleted and created/re-populated for each test. most of the times this is not required and if we can fix this the tests will run much faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1464) improve test query performance

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12890452#action_12890452 ] 

Ning Zhang commented on HIVE-1464:
----------------------------------

Joy, the test has a diff on clientpositive/show_tablestatus.q. Can you take a look to see if it is expected?



> improve test query performance
> ------------------------------
>
>                 Key: HIVE-1464
>                 URL: https://issues.apache.org/jira/browse/HIVE-1464
>             Project: Hadoop Hive
>          Issue Type: Test
>          Components: Testing Infrastructure
>            Reporter: Joydeep Sen Sarma
>         Attachments: 1464.1.patch
>
>
> clientpositive/negative tests are extremely slow.
> one major problem seems to be that all the test warehouse tables are deleted and created/re-populated for each test. most of the times this is not required and if we can fix this the tests will run much faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1464) improve test query performance

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joydeep Sen Sarma updated HIVE-1464:
------------------------------------

    Attachment: 1464.1.patch

- skips loading source tables again and again for each test. 
- clears state before a test by removing all tables other than the source tables and reinitializing hiveconf
- some test results have changed because the previous results were incorrect. they depended on the presence of tables from prior tests. once those tables are cleared out - the test results change
- added a pre-execute hook for test runs to make sure that none of the source tables are cleaned out.

Result: TestNegativeCliDriver is taking about:
- 7 min in current trunk
- 1.5 min with this patch

this includes compilation etc. as well i presume - so the actual speedup is higher.

TestCliDriver is too long to do a comparative run.

> improve test query performance
> ------------------------------
>
>                 Key: HIVE-1464
>                 URL: https://issues.apache.org/jira/browse/HIVE-1464
>             Project: Hadoop Hive
>          Issue Type: Test
>          Components: Testing Infrastructure
>            Reporter: Joydeep Sen Sarma
>         Attachments: 1464.1.patch
>
>
> clientpositive/negative tests are extremely slow.
> one major problem seems to be that all the test warehouse tables are deleted and created/re-populated for each test. most of the times this is not required and if we can fix this the tests will run much faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1464) improve test query performance

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12890580#action_12890580 ] 

Ning Zhang commented on HIVE-1464:
----------------------------------

Ahh, sorry I forgot there is a pending issue. Can you upload an additional patch based on the current trunk?

> improve test query performance
> ------------------------------
>
>                 Key: HIVE-1464
>                 URL: https://issues.apache.org/jira/browse/HIVE-1464
>             Project: Hadoop Hive
>          Issue Type: Test
>          Components: Testing Infrastructure
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>             Fix For: 0.7.0
>
>         Attachments: 1464.1.patch
>
>
> clientpositive/negative tests are extremely slow.
> one major problem seems to be that all the test warehouse tables are deleted and created/re-populated for each test. most of the times this is not required and if we can fix this the tests will run much faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1464) improve test query performance

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12890330#action_12890330 ] 

Ning Zhang commented on HIVE-1464:
----------------------------------

Great! I'll take a look. 

> improve test query performance
> ------------------------------
>
>                 Key: HIVE-1464
>                 URL: https://issues.apache.org/jira/browse/HIVE-1464
>             Project: Hadoop Hive
>          Issue Type: Test
>          Components: Testing Infrastructure
>            Reporter: Joydeep Sen Sarma
>         Attachments: 1464.1.patch
>
>
> clientpositive/negative tests are extremely slow.
> one major problem seems to be that all the test warehouse tables are deleted and created/re-populated for each test. most of the times this is not required and if we can fix this the tests will run much faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.