You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Namit Jain (JIRA)" <ji...@apache.org> on 2009/05/26 22:47:45 UTC

[jira] Created: (HIVE-518) test mode in hive

test mode in hive
-----------------

                 Key: HIVE-518
                 URL: https://issues.apache.org/jira/browse/HIVE-518
             Project: Hadoop Hive
          Issue Type: Bug
          Components: Query Processor
    Affects Versions: 0.3.1
            Reporter: Namit Jain
            Assignee: Namit Jain
             Fix For: 0.4.0
         Attachments: hive.518.1.patch

It would be good to have a test mode in hive - this will help in checking the validity of a hive drop on a production cluster.

The following would be good to have:

Testmode --> In testmode, all input tables are sampled (if not already sampled) and all output tables are prefixed by a user supplied name.
This way, multiple hive drops can be compared quickly for correctness

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-518) test mode in hive

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-518:
----------------------------

    Attachment: hive.518.3.patch

incorporated comments

> test mode in hive
> -----------------
>
>                 Key: HIVE-518
>                 URL: https://issues.apache.org/jira/browse/HIVE-518
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.3.1
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.4.0
>
>         Attachments: hive.518.1.patch, hive.518.2.patch, hive.518.3.patch
>
>
> It would be good to have a test mode in hive - this will help in checking the validity of a hive drop on a production cluster.
> The following would be good to have:
> Testmode --> In testmode, all input tables are sampled (if not already sampled) and all output tables are prefixed by a user supplied name.
> This way, multiple hive drops can be compared quickly for correctness

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-518) test mode in hive

Posted by "Raghotham Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713898#action_12713898 ] 

Raghotham Murthy commented on HIVE-518:
---------------------------------------

running tests now.

> test mode in hive
> -----------------
>
>                 Key: HIVE-518
>                 URL: https://issues.apache.org/jira/browse/HIVE-518
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.3.1
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.4.0
>
>         Attachments: hive-518.5.patch, hive.518.1.patch, hive.518.2.patch, hive.518.3.patch, hive.518.4.patch, hive.518.6.patch, hive.518.7.patch
>
>
> It would be good to have a test mode in hive - this will help in checking the validity of a hive drop on a production cluster.
> The following would be good to have:
> Testmode --> In testmode, all input tables are sampled (if not already sampled) and all output tables are prefixed by a user supplied name.
> This way, multiple hive drops can be compared quickly for correctness

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-518) test mode in hive

Posted by "Raghotham Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713830#action_12713830 ] 

Raghotham Murthy commented on HIVE-518:
---------------------------------------

There is now a test failure because of a change in hive-511 which was committed after you created the patch. Can you regenerate the patch with the latest code?

    [junit] 09/05/27 18:25:14 INFO exec.FileSinkOperator: Moving tmp dir: /mnt/vol/devrs005.snc1/rmurthy/hive-committer/build/ql/tmp/_tmp.816501927.10000.insclause-0 to: /mnt/vol/devrs005.snc1/rmurthy/hive-committer/build/ql/tmp/816501927.10000.insclause-0
    [junit] diff -a -I \(file:\)\|\(/tmp/.*\) /mnt/vol/devrs005.snc1/rmurthy/hive-committer/build/ql/test/logs/clientpositive/input30.q.out /mnt/vol/devrs005.snc1/rmurthy/hive-committer/ql/src/test/results/clientpositive/input30.q.out
    [junit] 23c23
    [junit] <                     expr: (((hash(rand(UDFToLong(460476415))) & 2147483647) % 32) = 0)
    [junit] ---
    [junit] >                     expr: (((default_sample_hashfn(rand(UDFToLong(460476415))) & 2147483647) % 32) = 0)


> test mode in hive
> -----------------
>
>                 Key: HIVE-518
>                 URL: https://issues.apache.org/jira/browse/HIVE-518
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.3.1
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.4.0
>
>         Attachments: hive-518.5.patch, hive.518.1.patch, hive.518.2.patch, hive.518.3.patch, hive.518.4.patch, hive.518.6.patch
>
>
> It would be good to have a test mode in hive - this will help in checking the validity of a hive drop on a production cluster.
> The following would be good to have:
> Testmode --> In testmode, all input tables are sampled (if not already sampled) and all output tables are prefixed by a user supplied name.
> This way, multiple hive drops can be compared quickly for correctness

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-518) test mode in hive

Posted by "Raghotham Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghotham Murthy updated HIVE-518:
----------------------------------

    Issue Type: New Feature  (was: Bug)

> test mode in hive
> -----------------
>
>                 Key: HIVE-518
>                 URL: https://issues.apache.org/jira/browse/HIVE-518
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.3.1
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.4.0
>
>         Attachments: hive-518.5.patch, hive.518.1.patch, hive.518.2.patch, hive.518.3.patch, hive.518.4.patch, hive.518.6.patch, hive.518.7.patch
>
>
> It would be good to have a test mode in hive - this will help in checking the validity of a hive drop on a production cluster.
> The following would be good to have:
> Testmode --> In testmode, all input tables are sampled (if not already sampled) and all output tables are prefixed by a user supplied name.
> This way, multiple hive drops can be compared quickly for correctness

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-518) test mode in hive

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713889#action_12713889 ] 

Namit Jain commented on HIVE-518:
---------------------------------

Uploaded the new patch - if u r up - can u merge this

Anyway, I will only deploy tomorrow - might sleep soon, and don't want to deploy late night


On 5/27/09 4:13 PM, "Raghotham Murthy (JIRA)" <ji...@apache.org> wrote:



     [ https://issues.apache.org/jira/browse/HIVE-518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghotham Murthy updated HIVE-518:
----------------------------------

    Attachment: hive-518.5.patch

Seeing some weird errors. input30.q, input31.q and input32.q individually succeed. But, when run along with other queries, it seems like the specific queries are not being run in test mode.


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.




> test mode in hive
> -----------------
>
>                 Key: HIVE-518
>                 URL: https://issues.apache.org/jira/browse/HIVE-518
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.3.1
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.4.0
>
>         Attachments: hive-518.5.patch, hive.518.1.patch, hive.518.2.patch, hive.518.3.patch, hive.518.4.patch, hive.518.6.patch, hive.518.7.patch
>
>
> It would be good to have a test mode in hive - this will help in checking the validity of a hive drop on a production cluster.
> The following would be good to have:
> Testmode --> In testmode, all input tables are sampled (if not already sampled) and all output tables are prefixed by a user supplied name.
> This way, multiple hive drops can be compared quickly for correctness

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-518) test mode in hive

Posted by "Raghotham Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghotham Murthy updated HIVE-518:
----------------------------------

    Comment: was deleted

(was: Uploaded the new patch - if u r up - can u merge this

Anyway, I will only deploy tomorrow - might sleep soon, and don't want to deploy late night


On 5/27/09 4:13 PM, "Raghotham Murthy (JIRA)" <ji...@apache.org> wrote:



     [ https://issues.apache.org/jira/browse/HIVE-518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghotham Murthy updated HIVE-518:
----------------------------------

    Attachment: hive-518.5.patch

Seeing some weird errors. input30.q, input31.q and input32.q individually succeed. But, when run along with other queries, it seems like the specific queries are not being run in test mode.


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


)

> test mode in hive
> -----------------
>
>                 Key: HIVE-518
>                 URL: https://issues.apache.org/jira/browse/HIVE-518
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.3.1
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.4.0
>
>         Attachments: hive-518.5.patch, hive.518.1.patch, hive.518.2.patch, hive.518.3.patch, hive.518.4.patch, hive.518.6.patch, hive.518.7.patch
>
>
> It would be good to have a test mode in hive - this will help in checking the validity of a hive drop on a production cluster.
> The following would be good to have:
> Testmode --> In testmode, all input tables are sampled (if not already sampled) and all output tables are prefixed by a user supplied name.
> This way, multiple hive drops can be compared quickly for correctness

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-518) test mode in hive

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-518:
----------------------------

    Attachment: hive.518.1.patch

> test mode in hive
> -----------------
>
>                 Key: HIVE-518
>                 URL: https://issues.apache.org/jira/browse/HIVE-518
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.3.1
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.4.0
>
>         Attachments: hive.518.1.patch
>
>
> It would be good to have a test mode in hive - this will help in checking the validity of a hive drop on a production cluster.
> The following would be good to have:
> Testmode --> In testmode, all input tables are sampled (if not already sampled) and all output tables are prefixed by a user supplied name.
> This way, multiple hive drops can be compared quickly for correctness

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-518) test mode in hive

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-518:
----------------------------

    Attachment: hive.518.7.patch

resolved conflicts

> test mode in hive
> -----------------
>
>                 Key: HIVE-518
>                 URL: https://issues.apache.org/jira/browse/HIVE-518
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.3.1
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.4.0
>
>         Attachments: hive-518.5.patch, hive.518.1.patch, hive.518.2.patch, hive.518.3.patch, hive.518.4.patch, hive.518.6.patch, hive.518.7.patch
>
>
> It would be good to have a test mode in hive - this will help in checking the validity of a hive drop on a production cluster.
> The following would be good to have:
> Testmode --> In testmode, all input tables are sampled (if not already sampled) and all output tables are prefixed by a user supplied name.
> This way, multiple hive drops can be compared quickly for correctness

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-518) test mode in hive

Posted by "Raghotham Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghotham Murthy updated HIVE-518:
----------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Committed. Thanks Namit!

> test mode in hive
> -----------------
>
>                 Key: HIVE-518
>                 URL: https://issues.apache.org/jira/browse/HIVE-518
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.3.1
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.4.0
>
>         Attachments: hive-518.5.patch, hive.518.1.patch, hive.518.2.patch, hive.518.3.patch, hive.518.4.patch, hive.518.6.patch, hive.518.7.patch
>
>
> It would be good to have a test mode in hive - this will help in checking the validity of a hive drop on a production cluster.
> The following would be good to have:
> Testmode --> In testmode, all input tables are sampled (if not already sampled) and all output tables are prefixed by a user supplied name.
> This way, multiple hive drops can be compared quickly for correctness

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-518) test mode in hive

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713419#action_12713419 ] 

Namit Jain commented on HIVE-518:
---------------------------------

I agree with it - it will not lead to any problem since the join results will be empty in both the new and
the old drop, but the whole purpose of testing may be lost.

Hinting	seems useless, because if the pipelines can be modified to add query level hints, the queries themselves
can be modified.

Via a configuration parameter, the list of tables can be specified and sampling may only	be applicable to
those tables. It will need the pipelines to be modified, or we can take a more aggressive approach and add
sampling to all tables unless the user asks us not to do so. This way, only the offending pipelines (for eg.
the one	pointed by Raghu) needs to be modified.


> test mode in hive
> -----------------
>
>                 Key: HIVE-518
>                 URL: https://issues.apache.org/jira/browse/HIVE-518
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.3.1
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.4.0
>
>         Attachments: hive.518.1.patch, hive.518.2.patch
>
>
> It would be good to have a test mode in hive - this will help in checking the validity of a hive drop on a production cluster.
> The following would be good to have:
> Testmode --> In testmode, all input tables are sampled (if not already sampled) and all output tables are prefixed by a user supplied name.
> This way, multiple hive drops can be compared quickly for correctness

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-518) test mode in hive

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-518:
----------------------------

    Attachment: hive.518.6.patch

> test mode in hive
> -----------------
>
>                 Key: HIVE-518
>                 URL: https://issues.apache.org/jira/browse/HIVE-518
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.3.1
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.4.0
>
>         Attachments: hive-518.5.patch, hive.518.1.patch, hive.518.2.patch, hive.518.3.patch, hive.518.4.patch, hive.518.6.patch
>
>
> It would be good to have a test mode in hive - this will help in checking the validity of a hive drop on a production cluster.
> The following would be good to have:
> Testmode --> In testmode, all input tables are sampled (if not already sampled) and all output tables are prefixed by a user supplied name.
> This way, multiple hive drops can be compared quickly for correctness

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-518) test mode in hive

Posted by "Raghotham Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713694#action_12713694 ] 

Raghotham Murthy commented on HIVE-518:
---------------------------------------

can you add a test for the unsampled tables feature?

> test mode in hive
> -----------------
>
>                 Key: HIVE-518
>                 URL: https://issues.apache.org/jira/browse/HIVE-518
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.3.1
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.4.0
>
>         Attachments: hive.518.1.patch, hive.518.2.patch, hive.518.3.patch
>
>
> It would be good to have a test mode in hive - this will help in checking the validity of a hive drop on a production cluster.
> The following would be good to have:
> Testmode --> In testmode, all input tables are sampled (if not already sampled) and all output tables are prefixed by a user supplied name.
> This way, multiple hive drops can be compared quickly for correctness

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-518) test mode in hive

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713393#action_12713393 ] 

Zheng Shao commented on HIVE-518:
---------------------------------

One additional comment: Can you use random(460476415) instead of random(1)? random(1) is likely to appear in user's query as well, which may make the query sampling non-uniform.

This is a really simple change now, but might save a lot of time debugging in the future.


> test mode in hive
> -----------------
>
>                 Key: HIVE-518
>                 URL: https://issues.apache.org/jira/browse/HIVE-518
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.3.1
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.4.0
>
>         Attachments: hive.518.1.patch, hive.518.2.patch
>
>
> It would be good to have a test mode in hive - this will help in checking the validity of a hive drop on a production cluster.
> The following would be good to have:
> Testmode --> In testmode, all input tables are sampled (if not already sampled) and all output tables are prefixed by a user supplied name.
> This way, multiple hive drops can be compared quickly for correctness

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-518) test mode in hive

Posted by "Raghotham Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713666#action_12713666 ] 

Raghotham Murthy commented on HIVE-518:
---------------------------------------

one solution would be to provide another set option with the list of tables which should not be sampled in test mode.

set hive.test.mode.unsampled.tables=table1,table2

Another option might be to actually allow users to specify the entire tablesample clause for every table that needs to sampled in test-mode. But that seems like a lot more work for not a lot of additional benefit.

> test mode in hive
> -----------------
>
>                 Key: HIVE-518
>                 URL: https://issues.apache.org/jira/browse/HIVE-518
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.3.1
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.4.0
>
>         Attachments: hive.518.1.patch, hive.518.2.patch
>
>
> It would be good to have a test mode in hive - this will help in checking the validity of a hive drop on a production cluster.
> The following would be good to have:
> Testmode --> In testmode, all input tables are sampled (if not already sampled) and all output tables are prefixed by a user supplied name.
> This way, multiple hive drops can be compared quickly for correctness

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-518) test mode in hive

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713342#action_12713342 ] 

Zheng Shao commented on HIVE-518:
---------------------------------

1. Can you add comment for SemanticAnalyzer.genSamplePredicate? Especially for the new planExpr parameter. It's not clear what this parameter means.

2. Can you add comments on what we will do in case hive is running in test mode and table is bucketed?

+<property>
+  <name>hive.test.mode.samplefreq</name>
+  <value>32</value>
+  <description>if hive is running in test mode and table is not bucketed, sampling frequency</description>
+</property>


> test mode in hive
> -----------------
>
>                 Key: HIVE-518
>                 URL: https://issues.apache.org/jira/browse/HIVE-518
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.3.1
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.4.0
>
>         Attachments: hive.518.1.patch
>
>
> It would be good to have a test mode in hive - this will help in checking the validity of a hive drop on a production cluster.
> The following would be good to have:
> Testmode --> In testmode, all input tables are sampled (if not already sampled) and all output tables are prefixed by a user supplied name.
> This way, multiple hive drops can be compared quickly for correctness

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-518) test mode in hive

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-518:
----------------------------

    Attachment: hive.518.2.patch

incorporated comments

> test mode in hive
> -----------------
>
>                 Key: HIVE-518
>                 URL: https://issues.apache.org/jira/browse/HIVE-518
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.3.1
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.4.0
>
>         Attachments: hive.518.1.patch, hive.518.2.patch
>
>
> It would be good to have a test mode in hive - this will help in checking the validity of a hive drop on a production cluster.
> The following would be good to have:
> Testmode --> In testmode, all input tables are sampled (if not already sampled) and all output tables are prefixed by a user supplied name.
> This way, multiple hive drops can be compared quickly for correctness

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-518) test mode in hive

Posted by "Raghotham Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713825#action_12713825 ] 

Raghotham Murthy commented on HIVE-518:
---------------------------------------

+1

will commit once tests pass.

> test mode in hive
> -----------------
>
>                 Key: HIVE-518
>                 URL: https://issues.apache.org/jira/browse/HIVE-518
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.3.1
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.4.0
>
>         Attachments: hive-518.5.patch, hive.518.1.patch, hive.518.2.patch, hive.518.3.patch, hive.518.4.patch, hive.518.6.patch
>
>
> It would be good to have a test mode in hive - this will help in checking the validity of a hive drop on a production cluster.
> The following would be good to have:
> Testmode --> In testmode, all input tables are sampled (if not already sampled) and all output tables are prefixed by a user supplied name.
> This way, multiple hive drops can be compared quickly for correctness

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-518) test mode in hive

Posted by "Raghotham Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713396#action_12713396 ] 

Raghotham Murthy commented on HIVE-518:
---------------------------------------

what happens if the production query has one sampled table joined against an unsampled table? A common example is facts table sampled by user, joined with a dimension table on a dimension attribute like gender/country etc. by adding an arbitrary sample clause on the dimension table, the join result may be empty.

> test mode in hive
> -----------------
>
>                 Key: HIVE-518
>                 URL: https://issues.apache.org/jira/browse/HIVE-518
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.3.1
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.4.0
>
>         Attachments: hive.518.1.patch, hive.518.2.patch
>
>
> It would be good to have a test mode in hive - this will help in checking the validity of a hive drop on a production cluster.
> The following would be good to have:
> Testmode --> In testmode, all input tables are sampled (if not already sampled) and all output tables are prefixed by a user supplied name.
> This way, multiple hive drops can be compared quickly for correctness

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-518) test mode in hive

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HIVE-518:
----------------------------

    Description: 
It would be good to have a test mode in hive - this will help in checking the validity of a hive drop on a production cluster.

The following would be good to have:

Testmode --> In testmode, all input tables are sampled (if not already sampled) and all output tables are prefixed by a user supplied name.
This way, multiple hive drops can be compared quickly for correctness


New Options:

{code}

// whether hive is running in test mode. If yes, it turns on sampling and prefixes the output tablename
set hive.test.mode=true;
// if hive is running in test mode, prefixes the output table by this string
set hive.test.mode.prefix=;
// if hive is running in test mode and table is not bucketed, sampling frequency
set hive.test.mode.samplefreq=256;
// if hive is running in test mode, dont sample the above comma seperated list of tables
set hive.test.mode.nosamplelist=;

{code}


  was:
It would be good to have a test mode in hive - this will help in checking the validity of a hive drop on a production cluster.

The following would be good to have:

Testmode --> In testmode, all input tables are sampled (if not already sampled) and all output tables are prefixed by a user supplied name.
This way, multiple hive drops can be compared quickly for correctness


> test mode in hive
> -----------------
>
>                 Key: HIVE-518
>                 URL: https://issues.apache.org/jira/browse/HIVE-518
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.3.1
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.4.0
>
>         Attachments: hive-518.5.patch, hive.518.1.patch, hive.518.2.patch, hive.518.3.patch, hive.518.4.patch, hive.518.6.patch, hive.518.7.patch
>
>
> It would be good to have a test mode in hive - this will help in checking the validity of a hive drop on a production cluster.
> The following would be good to have:
> Testmode --> In testmode, all input tables are sampled (if not already sampled) and all output tables are prefixed by a user supplied name.
> This way, multiple hive drops can be compared quickly for correctness
> New Options:
> {code}
> // whether hive is running in test mode. If yes, it turns on sampling and prefixes the output tablename
> set hive.test.mode=true;
> // if hive is running in test mode, prefixes the output table by this string
> set hive.test.mode.prefix=;
> // if hive is running in test mode and table is not bucketed, sampling frequency
> set hive.test.mode.samplefreq=256;
> // if hive is running in test mode, dont sample the above comma seperated list of tables
> set hive.test.mode.nosamplelist=;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-518) test mode in hive

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-518:
----------------------------

    Status: Patch Available  (was: Open)

> test mode in hive
> -----------------
>
>                 Key: HIVE-518
>                 URL: https://issues.apache.org/jira/browse/HIVE-518
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.3.1
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.4.0
>
>         Attachments: hive.518.1.patch
>
>
> It would be good to have a test mode in hive - this will help in checking the validity of a hive drop on a production cluster.
> The following would be good to have:
> Testmode --> In testmode, all input tables are sampled (if not already sampled) and all output tables are prefixed by a user supplied name.
> This way, multiple hive drops can be compared quickly for correctness

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-518) test mode in hive

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-518:
----------------------------

    Attachment: hive.518.4.patch

> test mode in hive
> -----------------
>
>                 Key: HIVE-518
>                 URL: https://issues.apache.org/jira/browse/HIVE-518
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.3.1
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.4.0
>
>         Attachments: hive.518.1.patch, hive.518.2.patch, hive.518.3.patch, hive.518.4.patch
>
>
> It would be good to have a test mode in hive - this will help in checking the validity of a hive drop on a production cluster.
> The following would be good to have:
> Testmode --> In testmode, all input tables are sampled (if not already sampled) and all output tables are prefixed by a user supplied name.
> This way, multiple hive drops can be compared quickly for correctness

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-518) test mode in hive

Posted by "Raghotham Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghotham Murthy updated HIVE-518:
----------------------------------

    Attachment: hive-518.5.patch

Seeing some weird errors. input30.q, input31.q and input32.q individually succeed. But, when run along with other queries, it seems like the specific queries are not being run in test mode.

> test mode in hive
> -----------------
>
>                 Key: HIVE-518
>                 URL: https://issues.apache.org/jira/browse/HIVE-518
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.3.1
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.4.0
>
>         Attachments: hive-518.5.patch, hive.518.1.patch, hive.518.2.patch, hive.518.3.patch, hive.518.4.patch
>
>
> It would be good to have a test mode in hive - this will help in checking the validity of a hive drop on a production cluster.
> The following would be good to have:
> Testmode --> In testmode, all input tables are sampled (if not already sampled) and all output tables are prefixed by a user supplied name.
> This way, multiple hive drops can be compared quickly for correctness

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-518) test mode in hive

Posted by "Raghotham Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12713714#action_12713714 ] 

Raghotham Murthy commented on HIVE-518:
---------------------------------------

+1

looks good. will commit once tests pass.

> test mode in hive
> -----------------
>
>                 Key: HIVE-518
>                 URL: https://issues.apache.org/jira/browse/HIVE-518
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.3.1
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.4.0
>
>         Attachments: hive.518.1.patch, hive.518.2.patch, hive.518.3.patch, hive.518.4.patch
>
>
> It would be good to have a test mode in hive - this will help in checking the validity of a hive drop on a production cluster.
> The following would be good to have:
> Testmode --> In testmode, all input tables are sampled (if not already sampled) and all output tables are prefixed by a user supplied name.
> This way, multiple hive drops can be compared quickly for correctness

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.