You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Bennie Schut (JIRA)" <ji...@apache.org> on 2010/07/23 10:40:51 UTC

[jira] Created: (HIVE-1482) Not all jdbc calls are threadsafe.

Not all jdbc calls are threadsafe.
----------------------------------

                 Key: HIVE-1482
                 URL: https://issues.apache.org/jira/browse/HIVE-1482
             Project: Hadoop Hive
          Issue Type: Bug
          Components: Drivers
    Affects Versions: 0.7.0
            Reporter: Bennie Schut
             Fix For: 0.7.0


As per jdbc spec they should be threadsafe:
http://download.oracle.com/docs/cd/E17476_01/javase/1.3/docs/guide/jdbc/spec/jdbc-spec.frame9.html


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1482) Not all jdbc calls are threadsafe.

Posted by "Bennie Schut (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895997#action_12895997 ] 

Bennie Schut commented on HIVE-1482:
------------------------------------

Ok I think I covered the sync stuff like you sugested.
For the state part I was looking at how SessionState is used within the HiveConnection.
There are several spots where we pass a sessionstate object (HivePreparedStatement, HiveStatement) but both places end up not actually using the object. Simplest thing would be not to pass them. But even if we do pass it I think it only contains session data and not query specific data.
Query specific things like column names/types etc. can be done nicely within a sync block together with the client call for the query itself so that shouldn't be a problem.

I'll upload a patch with what I have soon.

> Not all jdbc calls are threadsafe.
> ----------------------------------
>
>                 Key: HIVE-1482
>                 URL: https://issues.apache.org/jira/browse/HIVE-1482
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Drivers
>    Affects Versions: 0.7.0
>            Reporter: Bennie Schut
>             Fix For: 0.7.0
>
>
> As per jdbc spec they should be threadsafe:
> http://download.oracle.com/docs/cd/E17476_01/javase/1.3/docs/guide/jdbc/spec/jdbc-spec.frame9.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1482) Not all jdbc calls are threadsafe.

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897892#action_12897892 ] 

John Sichi commented on HIVE-1482:
----------------------------------

Yeah, the problems may only show up with remoting (which is why I mentioned having the test run against a thrift server instead of embedded).


> Not all jdbc calls are threadsafe.
> ----------------------------------
>
>                 Key: HIVE-1482
>                 URL: https://issues.apache.org/jira/browse/HIVE-1482
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Drivers
>    Affects Versions: 0.7.0
>            Reporter: Bennie Schut
>            Assignee: Bennie Schut
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1482-1.patch
>
>
> As per jdbc spec they should be threadsafe:
> http://download.oracle.com/docs/cd/E17476_01/javase/1.3/docs/guide/jdbc/spec/jdbc-spec.frame9.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1482) Not all jdbc calls are threadsafe.

Posted by "Bennie Schut (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896897#action_12896897 ] 

Bennie Schut commented on HIVE-1482:
------------------------------------

The proxy sounds like an interesting approach. It will probably synchronize a little more then we need but it's a lot safer and the normal use case is calling the methods sequentially anyway so not much loss there.

Doing a little mutithreading test is a good idea. I've had some HIVE_PLAN exceptions in the past (different jira) which would probably also become visible then. I'll have a look.

Thanks.

> Not all jdbc calls are threadsafe.
> ----------------------------------
>
>                 Key: HIVE-1482
>                 URL: https://issues.apache.org/jira/browse/HIVE-1482
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Drivers
>    Affects Versions: 0.7.0
>            Reporter: Bennie Schut
>            Assignee: Bennie Schut
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1482-1.patch
>
>
> As per jdbc spec they should be threadsafe:
> http://download.oracle.com/docs/cd/E17476_01/javase/1.3/docs/guide/jdbc/spec/jdbc-spec.frame9.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1482) Not all jdbc calls are threadsafe.

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896710#action_12896710 ] 

John Sichi commented on HIVE-1482:
----------------------------------

After looking at the code, I'm thinking we could use Java proxies to make the synchronization automatic and bulletproof.

http://stackoverflow.com/questions/743288/java-synchronization-utility

Proxy the HiveInterface, and pass the proxy instance down.  That way we don't even need to pass around the connectionMutex (it will be hidden inside the proxy).

What do you think?


> Not all jdbc calls are threadsafe.
> ----------------------------------
>
>                 Key: HIVE-1482
>                 URL: https://issues.apache.org/jira/browse/HIVE-1482
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Drivers
>    Affects Versions: 0.7.0
>            Reporter: Bennie Schut
>            Assignee: Bennie Schut
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1482-1.patch
>
>
> As per jdbc spec they should be threadsafe:
> http://download.oracle.com/docs/cd/E17476_01/javase/1.3/docs/guide/jdbc/spec/jdbc-spec.frame9.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1482) Not all jdbc calls are threadsafe.

Posted by "Bennie Schut (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891522#action_12891522 ] 

Bennie Schut commented on HIVE-1482:
------------------------------------

On HiveConnection we reuse the "client" on many calls but we've already seen the "DatabaseMetaData getMetaData()" call being used in a multithreaded way.

> Not all jdbc calls are threadsafe.
> ----------------------------------
>
>                 Key: HIVE-1482
>                 URL: https://issues.apache.org/jira/browse/HIVE-1482
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Drivers
>    Affects Versions: 0.7.0
>            Reporter: Bennie Schut
>             Fix For: 0.7.0
>
>
> As per jdbc spec they should be threadsafe:
> http://download.oracle.com/docs/cd/E17476_01/javase/1.3/docs/guide/jdbc/spec/jdbc-spec.frame9.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1482) Not all jdbc calls are threadsafe.

Posted by "Bennie Schut (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897742#action_12897742 ] 

Bennie Schut commented on HIVE-1482:
------------------------------------

I've tried but this is actually surprisingly difficult to reproduce failure in a test on TestJdbcDriver. Perhaps there is something synchronized about the use of the embedded mode the test is running in?

> Not all jdbc calls are threadsafe.
> ----------------------------------
>
>                 Key: HIVE-1482
>                 URL: https://issues.apache.org/jira/browse/HIVE-1482
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Drivers
>    Affects Versions: 0.7.0
>            Reporter: Bennie Schut
>            Assignee: Bennie Schut
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1482-1.patch
>
>
> As per jdbc spec they should be threadsafe:
> http://download.oracle.com/docs/cd/E17476_01/javase/1.3/docs/guide/jdbc/spec/jdbc-spec.frame9.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1482) Not all jdbc calls are threadsafe.

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896718#action_12896718 ] 

John Sichi commented on HIVE-1482:
----------------------------------

Also, can you create a followup for adding a multithreaded test running JDBC against a thrift server?


> Not all jdbc calls are threadsafe.
> ----------------------------------
>
>                 Key: HIVE-1482
>                 URL: https://issues.apache.org/jira/browse/HIVE-1482
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Drivers
>    Affects Versions: 0.7.0
>            Reporter: Bennie Schut
>            Assignee: Bennie Schut
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1482-1.patch
>
>
> As per jdbc spec they should be threadsafe:
> http://download.oracle.com/docs/cd/E17476_01/javase/1.3/docs/guide/jdbc/spec/jdbc-spec.frame9.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1482) Not all jdbc calls are threadsafe.

Posted by "Bennie Schut (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bennie Schut updated HIVE-1482:
-------------------------------

    Attachment: HIVE-1482-1.patch

Full test ran with success on my pc. Plus no additional check-style errors found.
Anything else we can think of for this?

> Not all jdbc calls are threadsafe.
> ----------------------------------
>
>                 Key: HIVE-1482
>                 URL: https://issues.apache.org/jira/browse/HIVE-1482
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Drivers
>    Affects Versions: 0.7.0
>            Reporter: Bennie Schut
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1482-1.patch
>
>
> As per jdbc spec they should be threadsafe:
> http://download.oracle.com/docs/cd/E17476_01/javase/1.3/docs/guide/jdbc/spec/jdbc-spec.frame9.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1482) Not all jdbc calls are threadsafe.

Posted by "Bennie Schut (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bennie Schut updated HIVE-1482:
-------------------------------

    Attachment: HIVE-1482-2.patch

Still need to reproduce the problem in a test. So this is just to show some progress. It's now using generics with a SynchronizedFactory. 

> Not all jdbc calls are threadsafe.
> ----------------------------------
>
>                 Key: HIVE-1482
>                 URL: https://issues.apache.org/jira/browse/HIVE-1482
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Drivers
>    Affects Versions: 0.7.0
>            Reporter: Bennie Schut
>            Assignee: Bennie Schut
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1482-1.patch, HIVE-1482-2.patch
>
>
> As per jdbc spec they should be threadsafe:
> http://download.oracle.com/docs/cd/E17476_01/javase/1.3/docs/guide/jdbc/spec/jdbc-spec.frame9.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1482) Not all jdbc calls are threadsafe.

Posted by "Bennie Schut (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895161#action_12895161 ] 

Bennie Schut commented on HIVE-1482:
------------------------------------

Ok I guess the first questions would be:

Do we want to go for as much concurrent work as possible or are we ok with using some synchronize on the client call's? I would say sync. is prefered over creating new connections in this case right?

Do we perhaps want to solve this on the "ThriftHive" level since that's the part which doesn't allow the multi threaded calls or perhaps more on the "HiveDatabaseMetaData" since that's the part which has the requirement of multi thread safety?

Any idea's are welcome on this.

> Not all jdbc calls are threadsafe.
> ----------------------------------
>
>                 Key: HIVE-1482
>                 URL: https://issues.apache.org/jira/browse/HIVE-1482
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Drivers
>    Affects Versions: 0.7.0
>            Reporter: Bennie Schut
>             Fix For: 0.7.0
>
>
> As per jdbc spec they should be threadsafe:
> http://download.oracle.com/docs/cd/E17476_01/javase/1.3/docs/guide/jdbc/spec/jdbc-spec.frame9.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1482) Not all jdbc calls are threadsafe.

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895344#action_12895344 ] 

John Sichi commented on HIVE-1482:
----------------------------------

Yes, synchronized is the way to go.

I think the synchronization has to be at the connection level.  For example, HiveStatement also needs to make calls on the thrift interface.  It's not just DatabaseMetaData.

So we should add a new data member to HiveConnection:

Object connectionMutex = new Object();

Then pass connectionMutex to constructors of sub-objects which need to participate in synchronization.  They can then do

synchronized(connectionMutex) {
...
}

around their critical sections.

Creating a separate object for this purpose allows us to keep control over synchronization (e.g. so it doesn't get mixed up with user-level or thrift-level synchronization code later).  We'll also need to be able to skip synchronization in the case of asynchronous cancel, but that's a separate task.

We should also review to see if there is any client-side state which needs protection.


> Not all jdbc calls are threadsafe.
> ----------------------------------
>
>                 Key: HIVE-1482
>                 URL: https://issues.apache.org/jira/browse/HIVE-1482
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Drivers
>    Affects Versions: 0.7.0
>            Reporter: Bennie Schut
>             Fix For: 0.7.0
>
>
> As per jdbc spec they should be threadsafe:
> http://download.oracle.com/docs/cd/E17476_01/javase/1.3/docs/guide/jdbc/spec/jdbc-spec.frame9.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HIVE-1482) Not all jdbc calls are threadsafe.

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Sichi reassigned HIVE-1482:
--------------------------------

    Assignee: Bennie Schut

> Not all jdbc calls are threadsafe.
> ----------------------------------
>
>                 Key: HIVE-1482
>                 URL: https://issues.apache.org/jira/browse/HIVE-1482
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Drivers
>    Affects Versions: 0.7.0
>            Reporter: Bennie Schut
>            Assignee: Bennie Schut
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1482-1.patch
>
>
> As per jdbc spec they should be threadsafe:
> http://download.oracle.com/docs/cd/E17476_01/javase/1.3/docs/guide/jdbc/spec/jdbc-spec.frame9.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.