You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org> on 2008/11/21 22:53:45 UTC

[jira] Created: (HIVE-77) thread safe query execution

thread safe query execution
---------------------------

                 Key: HIVE-77
                 URL: https://issues.apache.org/jira/browse/HIVE-77
             Project: Hadoop Hive
          Issue Type: Bug
            Reporter: Joydeep Sen Sarma


this came up in hive-30 where there's a multithreaded hive server in the works. at the minimum, the sessionstate objects should be thread local singletons. but filing a more general bug that can cover this issue + code audit for any other static variables + test suite for running queries from a multi-threaded environment

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-77) thread safe query execution

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-77?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joydeep Sen Sarma updated HIVE-77:
----------------------------------

    Attachment: hive-77.patch.4

another one after resolving Namit's changes.

> thread safe query execution
> ---------------------------
>
>                 Key: HIVE-77
>                 URL: https://issues.apache.org/jira/browse/HIVE-77
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Server Infrastructure
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>         Attachments: hive-77.patch.1, hive-77.patch.2, hive-77.patch.3, hive-77.patch.4
>
>
> this came up in hive-30 where there's a multithreaded hive server in the works. at the minimum, the sessionstate objects should be thread local singletons. but filing a more general bug that can cover this issue + code audit for any other static variables + test suite for running queries from a multi-threaded environment

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-77) thread safe query execution

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652195#action_12652195 ] 

Joydeep Sen Sarma commented on HIVE-77:
---------------------------------------

thx - will fix the inline comments.

static LOG: this is not a blocking issue. I am assuming that log4j itself is multithread safe. The main issue is that all log events are going to end up in the same log file - and log entries from different sessions will be interleaved without any clear headers. There's something called NDC that's apparently used widely in multi-threaded environments to put prefixes on log entries per thread (in our case session). we can use NDC interfaces from log4j - but it's more work - potentially more disruptive changes.

QTestUtil: don't understand this. today every query file is run independently in a different junit test. if QTestUtil ran all of them (either in sequence or in parallel) - this would not be possible. can u explain a bit more?

> thread safe query execution
> ---------------------------
>
>                 Key: HIVE-77
>                 URL: https://issues.apache.org/jira/browse/HIVE-77
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Server Infrastructure
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>         Attachments: hive-77.patch.1
>
>
> this came up in hive-30 where there's a multithreaded hive server in the works. at the minimum, the sessionstate objects should be thread local singletons. but filing a more general bug that can cover this issue + code audit for any other static variables + test suite for running queries from a multi-threaded environment

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-77) thread safe query execution

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-77?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joydeep Sen Sarma updated HIVE-77:
----------------------------------

    Attachment: hive-77.patch.3

> thread safe query execution
> ---------------------------
>
>                 Key: HIVE-77
>                 URL: https://issues.apache.org/jira/browse/HIVE-77
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Server Infrastructure
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>         Attachments: hive-77.patch.1, hive-77.patch.2, hive-77.patch.3
>
>
> this came up in hive-30 where there's a multithreaded hive server in the works. at the minimum, the sessionstate objects should be thread local singletons. but filing a more general bug that can cover this issue + code audit for any other static variables + test suite for running queries from a multi-threaded environment

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-77) thread safe query execution

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-77?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HIVE-77:
---------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]
          Status: Resolved  (was: Patch Available)

HIVE-77. Thread safe query execution. (Joydeep through zshao)
Committed revision 723699.


> thread safe query execution
> ---------------------------
>
>                 Key: HIVE-77
>                 URL: https://issues.apache.org/jira/browse/HIVE-77
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Server Infrastructure
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>         Attachments: hive-77.patch.1, hive-77.patch.2, hive-77.patch.3, hive-77.patch.4
>
>
> this came up in hive-30 where there's a multithreaded hive server in the works. at the minimum, the sessionstate objects should be thread local singletons. but filing a more general bug that can cover this issue + code audit for any other static variables + test suite for running queries from a multi-threaded environment

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-77) thread safe query execution

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-77?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joydeep Sen Sarma updated HIVE-77:
----------------------------------

    Attachment: hive-77.patch.1

> thread safe query execution
> ---------------------------
>
>                 Key: HIVE-77
>                 URL: https://issues.apache.org/jira/browse/HIVE-77
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Joydeep Sen Sarma
>         Attachments: hive-77.patch.1
>
>
> this came up in hive-30 where there's a multithreaded hive server in the works. at the minimum, the sessionstate objects should be thread local singletons. but filing a more general bug that can cover this issue + code audit for any other static variables + test suite for running queries from a multi-threaded environment

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-77) thread safe query execution

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652188#action_12652188 ] 

Ashish Thusoo commented on HIVE-77:
-----------------------------------

Comments are inlined.

There are two major ones:
1. static LOG - we do that all over our code. Will that cause a problem or do we have to clean that up in the same way that you have done for CliDriver and SessionState.
2. This is the bigger one - I think we should just extend QTestUtil so that it can run in a multi threaded mode instead of creating another class to run those tests. We can then call QTestUtil in two modes and pass the list of tests that want it to executed in the multi threaded mode. That would be more maintainable in the long run.

Rest a few minor things like missing javadocs..
Inline Comments
ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java:687	Please add javadocs for these.
ql/src/test/org/apache/hadoop/hive/ql/TestMTQueries.java:0	This file needs javadocs.
ql/src/test/org/apache/hadoop/hive/ql/TestMTQueries.java:32	Can we pass the names into this instead of hardcoding these in the code.
ql/src/test/org/apache/hadoop/hive/ql/TestMTQueries.java:0	Also instead of a brand new test class, it is perhaps better to extend QTestUtil to run using the Runner and then run it in two modes, concurrent and serial. We would be able to avoid duplicate code and just have a single utility to test out both the concurrent and serial tests.
ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java:202	Can we create a similar array as source array for destination tables and destination files and then loop over those arrays.
ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java:391	javadocs.
ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java:391	Where is this function called from?
build-common.xml:226	why are we excluding TestMTQueries here?
cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java:62	Would static log not work. If so, we have that all over the code and just fixing it here will not work.
ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java:265	javadocs for these.

> thread safe query execution
> ---------------------------
>
>                 Key: HIVE-77
>                 URL: https://issues.apache.org/jira/browse/HIVE-77
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Server Infrastructure
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>         Attachments: hive-77.patch.1
>
>
> this came up in hive-30 where there's a multithreaded hive server in the works. at the minimum, the sessionstate objects should be thread local singletons. but filing a more general bug that can cover this issue + code audit for any other static variables + test suite for running queries from a multi-threaded environment

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-77) thread safe query execution

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-77?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joydeep Sen Sarma updated HIVE-77:
----------------------------------

    Status: Patch Available  (was: Open)

> thread safe query execution
> ---------------------------
>
>                 Key: HIVE-77
>                 URL: https://issues.apache.org/jira/browse/HIVE-77
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Joydeep Sen Sarma
>         Attachments: hive-77.patch.1
>
>
> this came up in hive-30 where there's a multithreaded hive server in the works. at the minimum, the sessionstate objects should be thread local singletons. but filing a more general bug that can cover this issue + code audit for any other static variables + test suite for running queries from a multi-threaded environment

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-77) thread safe query execution

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-77?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-77:
-------------------------------

    Fix Version/s: 0.3.0

> thread safe query execution
> ---------------------------
>
>                 Key: HIVE-77
>                 URL: https://issues.apache.org/jira/browse/HIVE-77
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Server Infrastructure
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>             Fix For: 0.3.0
>
>         Attachments: hive-77.patch.1, hive-77.patch.2, hive-77.patch.3, hive-77.patch.4
>
>
> this came up in hive-30 where there's a multithreaded hive server in the works. at the minimum, the sessionstate objects should be thread local singletons. but filing a more general bug that can cover this issue + code audit for any other static variables + test suite for running queries from a multi-threaded environment

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-77) thread safe query execution

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12651321#action_12651321 ] 

Joydeep Sen Sarma commented on HIVE-77:
---------------------------------------

this patch includes:
- test for multithreaded execution
- fixes to SessionState.java and CliDriver.java for MT safe (and to run MT tests)

however - the test is disabled. while this seems to fix issues with session management etc. - there are bugs in metastore client code that are not resolved. i would like to file a separate jira for those.

unfortunately metastore client issues are not limited to DDLTasks only - regular table metadata fetches are also affected. (the patch allows testing of clientpositive queries without running DDL commands in MT mode - but in an extremely hacky way)

> thread safe query execution
> ---------------------------
>
>                 Key: HIVE-77
>                 URL: https://issues.apache.org/jira/browse/HIVE-77
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Joydeep Sen Sarma
>         Attachments: hive-77.patch.1
>
>
> this came up in hive-30 where there's a multithreaded hive server in the works. at the minimum, the sessionstate objects should be thread local singletons. but filing a more general bug that can cover this issue + code audit for any other static variables + test suite for running queries from a multi-threaded environment

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-77) thread safe query execution

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652247#action_12652247 ] 

Ashish Thusoo commented on HIVE-77:
-----------------------------------

@Joy

I meant rolling up the TestMTQueries into QTestUtil so that we just have a single tool to deal with when it comes to query testing.

%t as suggested by Edward should work for now. Later when we expand this so that a thread is able to handle multiple sessions (a requirement on JDBC) we can start exploring NDC and MDCs to log thread specific information.

> thread safe query execution
> ---------------------------
>
>                 Key: HIVE-77
>                 URL: https://issues.apache.org/jira/browse/HIVE-77
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Server Infrastructure
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>         Attachments: hive-77.patch.1
>
>
> this came up in hive-30 where there's a multithreaded hive server in the works. at the minimum, the sessionstate objects should be thread local singletons. but filing a more general bug that can cover this issue + code audit for any other static variables + test suite for running queries from a multi-threaded environment

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HIVE-77) thread safe query execution

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-77?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joydeep Sen Sarma reassigned HIVE-77:
-------------------------------------

    Assignee: Joydeep Sen Sarma

> thread safe query execution
> ---------------------------
>
>                 Key: HIVE-77
>                 URL: https://issues.apache.org/jira/browse/HIVE-77
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>         Attachments: hive-77.patch.1
>
>
> this came up in hive-30 where there's a multithreaded hive server in the works. at the minimum, the sessionstate objects should be thread local singletons. but filing a more general bug that can cover this issue + code audit for any other static variables + test suite for running queries from a multi-threaded environment

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-77) thread safe query execution

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-77?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joydeep Sen Sarma updated HIVE-77:
----------------------------------

    Attachment: hive-77.patch.2

> thread safe query execution
> ---------------------------
>
>                 Key: HIVE-77
>                 URL: https://issues.apache.org/jira/browse/HIVE-77
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Server Infrastructure
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>         Attachments: hive-77.patch.1, hive-77.patch.2
>
>
> this came up in hive-30 where there's a multithreaded hive server in the works. at the minimum, the sessionstate objects should be thread local singletons. but filing a more general bug that can cover this issue + code audit for any other static variables + test suite for running queries from a multi-threaded environment

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-77) thread safe query execution

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653510#action_12653510 ] 

Ashish Thusoo commented on HIVE-77:
-----------------------------------

+1

looks good to me.

Zheng can you checkin...

> thread safe query execution
> ---------------------------
>
>                 Key: HIVE-77
>                 URL: https://issues.apache.org/jira/browse/HIVE-77
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Server Infrastructure
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>         Attachments: hive-77.patch.1, hive-77.patch.2, hive-77.patch.3, hive-77.patch.4
>
>
> this came up in hive-30 where there's a multithreaded hive server in the works. at the minimum, the sessionstate objects should be thread local singletons. but filing a more general bug that can cover this issue + code audit for any other static variables + test suite for running queries from a multi-threaded environment

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-77) thread safe query execution

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-77?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashish Thusoo updated HIVE-77:
------------------------------

    Component/s: Server Infrastructure

> thread safe query execution
> ---------------------------
>
>                 Key: HIVE-77
>                 URL: https://issues.apache.org/jira/browse/HIVE-77
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Server Infrastructure
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>         Attachments: hive-77.patch.1
>
>
> this came up in hive-30 where there's a multithreaded hive server in the works. at the minimum, the sessionstate objects should be thread local singletons. but filing a more general bug that can cover this issue + code audit for any other static variables + test suite for running queries from a multi-threaded environment

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-77) thread safe query execution

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652211#action_12652211 ] 

Edward Capriolo commented on HIVE-77:
-------------------------------------

Log4j has a variable that will print the thread name. 

http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/PatternLayout.html

%t should do it.

> thread safe query execution
> ---------------------------
>
>                 Key: HIVE-77
>                 URL: https://issues.apache.org/jira/browse/HIVE-77
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Server Infrastructure
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>         Attachments: hive-77.patch.1
>
>
> this came up in hive-30 where there's a multithreaded hive server in the works. at the minimum, the sessionstate objects should be thread local singletons. but filing a more general bug that can cover this issue + code audit for any other static variables + test suite for running queries from a multi-threaded environment

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-77) thread safe query execution

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652448#action_12652448 ] 

Joydeep Sen Sarma commented on HIVE-77:
---------------------------------------

fixed all the inline comments and moved multithreaded query harness to QTestUtil. 

> ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java:391 Where is this function called from?
This is a temporary hack for non-DDL related metastore issues.

also - i am actually able to run TestMTqueries succesfully now. funnily all the metastore issues are not showing up. There were some more changes required:
- randomize hadoop execution directories for local mode execution (otherwise there are collisions when submitting jobs concurrently)
- fix taskfactory id to be threadlocal static


> thread safe query execution
> ---------------------------
>
>                 Key: HIVE-77
>                 URL: https://issues.apache.org/jira/browse/HIVE-77
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Server Infrastructure
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>         Attachments: hive-77.patch.1, hive-77.patch.2, hive-77.patch.3
>
>
> this came up in hive-30 where there's a multithreaded hive server in the works. at the minimum, the sessionstate objects should be thread local singletons. but filing a more general bug that can cover this issue + code audit for any other static variables + test suite for running queries from a multi-threaded environment

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.