You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Cliff Resnick <cr...@proclivitysystems.com> on 2009/08/28 17:52:43 UTC

quick fix for thread-safe connection pool

We're stepping up our Hive integration, and after confronting the 
HiveServer thread safety issue, I implemented a simple co-located hive 
connection pool. Since we use our own java-only network communication 
code, the service does not use Thrift; instead it's just a lightweight 
network service that manages a pool of HiveConnections.

Of course, after implementing this I found that it was still not 
thread-safe. I'm not very familiar with the hive code, but I did find a 
quick fix to be surprisingly easy. In 
org.apache.hadoop.hive.ql.exec.Utilities there is a static field 
instance of mapredWork. I changed it to a ThreadLocal instance and 
suddenly I have a thread-safe connection pool.

Now, I do understand that this is just a quick fix. The call stack from 
HiveStatement is synchronous, but the ThreadLocal solution would need to 
be revisited if any async processing is introduced. Also, I can't help 
wondering, why is a Utilities class holding state, let alone the state 
of the entire execution plan? Finally, I doubt such a simple solution 
solves hive-80, so I imagine other threading issues are in play when 
Thrift is involved. In the meantime however, I'm hoping the fix can be 
made, patch attached.

Cliff

RE: quick fix for thread-safe connection pool

Posted by Ashish Thusoo <at...@facebook.com>.
Yes, just post it to that issue for now. We may move it to a new JIRA if needed.

Ashish 

-----Original Message-----
From: Cliff Resnick [mailto:cresnick@proclivitysystems.com] 
Sent: Friday, August 28, 2009 3:15 PM
To: hive-dev@hadoop.apache.org
Subject: Re: quick fix for thread-safe connection pool

Ashish,

Which issue should I post it to? Hive-80?

-Cliff

On 08/28/2009 06:07 PM, Ashish Thusoo wrote:
> Hi Cliff,
>
> Can you post the patch to the JIRA?
>
> Ashish
>
> -----Original Message-----
> From: Cliff Resnick [mailto:cresnick@proclivitysystems.com]
> Sent: Friday, August 28, 2009 8:53 AM
> To: hive-dev@hadoop.apache.org
> Subject: quick fix for thread-safe connection pool
>
> We're stepping up our Hive integration, and after confronting the HiveServer thread safety issue, I implemented a simple co-located hive connection pool. Since we use our own java-only network communication code, the service does not use Thrift; instead it's just a lightweight network service that manages a pool of HiveConnections.
>
> Of course, after implementing this I found that it was still not thread-safe. I'm not very familiar with the hive code, but I did find a quick fix to be surprisingly easy. In org.apache.hadoop.hive.ql.exec.Utilities there is a static field instance of mapredWork. I changed it to a ThreadLocal instance and suddenly I have a thread-safe connection pool.
>
> Now, I do understand that this is just a quick fix. The call stack from HiveStatement is synchronous, but the ThreadLocal solution would need to be revisited if any async processing is introduced. Also, I can't help wondering, why is a Utilities class holding state, let alone the state of the entire execution plan? Finally, I doubt such a simple solution solves hive-80, so I imagine other threading issues are in play when Thrift is involved. In the meantime however, I'm hoping the fix can be made, patch attached.
>
> Cliff
>    

Re: quick fix for thread-safe connection pool

Posted by Cliff Resnick <cr...@proclivitysystems.com>.
Ashish,

Which issue should I post it to? Hive-80?

-Cliff

On 08/28/2009 06:07 PM, Ashish Thusoo wrote:
> Hi Cliff,
>
> Can you post the patch to the JIRA?
>
> Ashish
>
> -----Original Message-----
> From: Cliff Resnick [mailto:cresnick@proclivitysystems.com]
> Sent: Friday, August 28, 2009 8:53 AM
> To: hive-dev@hadoop.apache.org
> Subject: quick fix for thread-safe connection pool
>
> We're stepping up our Hive integration, and after confronting the HiveServer thread safety issue, I implemented a simple co-located hive connection pool. Since we use our own java-only network communication code, the service does not use Thrift; instead it's just a lightweight network service that manages a pool of HiveConnections.
>
> Of course, after implementing this I found that it was still not thread-safe. I'm not very familiar with the hive code, but I did find a quick fix to be surprisingly easy. In org.apache.hadoop.hive.ql.exec.Utilities there is a static field instance of mapredWork. I changed it to a ThreadLocal instance and suddenly I have a thread-safe connection pool.
>
> Now, I do understand that this is just a quick fix. The call stack from HiveStatement is synchronous, but the ThreadLocal solution would need to be revisited if any async processing is introduced. Also, I can't help wondering, why is a Utilities class holding state, let alone the state of the entire execution plan? Finally, I doubt such a simple solution solves hive-80, so I imagine other threading issues are in play when Thrift is involved. In the meantime however, I'm hoping the fix can be made, patch attached.
>
> Cliff
>    

RE: quick fix for thread-safe connection pool

Posted by Ashish Thusoo <at...@facebook.com>.
Hi Cliff,

Can you post the patch to the JIRA?

Ashish 

-----Original Message-----
From: Cliff Resnick [mailto:cresnick@proclivitysystems.com] 
Sent: Friday, August 28, 2009 8:53 AM
To: hive-dev@hadoop.apache.org
Subject: quick fix for thread-safe connection pool

We're stepping up our Hive integration, and after confronting the HiveServer thread safety issue, I implemented a simple co-located hive connection pool. Since we use our own java-only network communication code, the service does not use Thrift; instead it's just a lightweight network service that manages a pool of HiveConnections.

Of course, after implementing this I found that it was still not thread-safe. I'm not very familiar with the hive code, but I did find a quick fix to be surprisingly easy. In org.apache.hadoop.hive.ql.exec.Utilities there is a static field instance of mapredWork. I changed it to a ThreadLocal instance and suddenly I have a thread-safe connection pool.

Now, I do understand that this is just a quick fix. The call stack from HiveStatement is synchronous, but the ThreadLocal solution would need to be revisited if any async processing is introduced. Also, I can't help wondering, why is a Utilities class holding state, let alone the state of the entire execution plan? Finally, I doubt such a simple solution solves hive-80, so I imagine other threading issues are in play when Thrift is involved. In the meantime however, I'm hoping the fix can be made, patch attached.

Cliff