You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Cliff Resnick <cr...@proclivitysystems.com> on 2009/08/28 17:52:43 UTC
quick fix for thread-safe connection pool
We're stepping up our Hive integration, and after confronting the
HiveServer thread safety issue, I implemented a simple co-located hive
connection pool. Since we use our own java-only network communication
code, the service does not use Thrift; instead it's just a lightweight
network service that manages a pool of HiveConnections.
Of course, after implementing this I found that it was still not
thread-safe. I'm not very familiar with the hive code, but I did find a
quick fix to be surprisingly easy. In
org.apache.hadoop.hive.ql.exec.Utilities there is a static field
instance of mapredWork. I changed it to a ThreadLocal instance and
suddenly I have a thread-safe connection pool.
Now, I do understand that this is just a quick fix. The call stack from
HiveStatement is synchronous, but the ThreadLocal solution would need to
be revisited if any async processing is introduced. Also, I can't help
wondering, why is a Utilities class holding state, let alone the state
of the entire execution plan? Finally, I doubt such a simple solution
solves hive-80, so I imagine other threading issues are in play when
Thrift is involved. In the meantime however, I'm hoping the fix can be
made, patch attached.
Cliff
RE: quick fix for thread-safe connection pool
Posted by Ashish Thusoo <at...@facebook.com>.
Yes, just post it to that issue for now. We may move it to a new JIRA if needed.
Ashish
-----Original Message-----
From: Cliff Resnick [mailto:cresnick@proclivitysystems.com]
Sent: Friday, August 28, 2009 3:15 PM
To: hive-dev@hadoop.apache.org
Subject: Re: quick fix for thread-safe connection pool
Ashish,
Which issue should I post it to? Hive-80?
-Cliff
On 08/28/2009 06:07 PM, Ashish Thusoo wrote:
> Hi Cliff,
>
> Can you post the patch to the JIRA?
>
> Ashish
>
> -----Original Message-----
> From: Cliff Resnick [mailto:cresnick@proclivitysystems.com]
> Sent: Friday, August 28, 2009 8:53 AM
> To: hive-dev@hadoop.apache.org
> Subject: quick fix for thread-safe connection pool
>
> We're stepping up our Hive integration, and after confronting the HiveServer thread safety issue, I implemented a simple co-located hive connection pool. Since we use our own java-only network communication code, the service does not use Thrift; instead it's just a lightweight network service that manages a pool of HiveConnections.
>
> Of course, after implementing this I found that it was still not thread-safe. I'm not very familiar with the hive code, but I did find a quick fix to be surprisingly easy. In org.apache.hadoop.hive.ql.exec.Utilities there is a static field instance of mapredWork. I changed it to a ThreadLocal instance and suddenly I have a thread-safe connection pool.
>
> Now, I do understand that this is just a quick fix. The call stack from HiveStatement is synchronous, but the ThreadLocal solution would need to be revisited if any async processing is introduced. Also, I can't help wondering, why is a Utilities class holding state, let alone the state of the entire execution plan? Finally, I doubt such a simple solution solves hive-80, so I imagine other threading issues are in play when Thrift is involved. In the meantime however, I'm hoping the fix can be made, patch attached.
>
> Cliff
>
Re: quick fix for thread-safe connection pool
Posted by Cliff Resnick <cr...@proclivitysystems.com>.
Ashish,
Which issue should I post it to? Hive-80?
-Cliff
On 08/28/2009 06:07 PM, Ashish Thusoo wrote:
> Hi Cliff,
>
> Can you post the patch to the JIRA?
>
> Ashish
>
> -----Original Message-----
> From: Cliff Resnick [mailto:cresnick@proclivitysystems.com]
> Sent: Friday, August 28, 2009 8:53 AM
> To: hive-dev@hadoop.apache.org
> Subject: quick fix for thread-safe connection pool
>
> We're stepping up our Hive integration, and after confronting the HiveServer thread safety issue, I implemented a simple co-located hive connection pool. Since we use our own java-only network communication code, the service does not use Thrift; instead it's just a lightweight network service that manages a pool of HiveConnections.
>
> Of course, after implementing this I found that it was still not thread-safe. I'm not very familiar with the hive code, but I did find a quick fix to be surprisingly easy. In org.apache.hadoop.hive.ql.exec.Utilities there is a static field instance of mapredWork. I changed it to a ThreadLocal instance and suddenly I have a thread-safe connection pool.
>
> Now, I do understand that this is just a quick fix. The call stack from HiveStatement is synchronous, but the ThreadLocal solution would need to be revisited if any async processing is introduced. Also, I can't help wondering, why is a Utilities class holding state, let alone the state of the entire execution plan? Finally, I doubt such a simple solution solves hive-80, so I imagine other threading issues are in play when Thrift is involved. In the meantime however, I'm hoping the fix can be made, patch attached.
>
> Cliff
>
RE: quick fix for thread-safe connection pool
Posted by Ashish Thusoo <at...@facebook.com>.
Hi Cliff,
Can you post the patch to the JIRA?
Ashish
-----Original Message-----
From: Cliff Resnick [mailto:cresnick@proclivitysystems.com]
Sent: Friday, August 28, 2009 8:53 AM
To: hive-dev@hadoop.apache.org
Subject: quick fix for thread-safe connection pool
We're stepping up our Hive integration, and after confronting the HiveServer thread safety issue, I implemented a simple co-located hive connection pool. Since we use our own java-only network communication code, the service does not use Thrift; instead it's just a lightweight network service that manages a pool of HiveConnections.
Of course, after implementing this I found that it was still not thread-safe. I'm not very familiar with the hive code, but I did find a quick fix to be surprisingly easy. In org.apache.hadoop.hive.ql.exec.Utilities there is a static field instance of mapredWork. I changed it to a ThreadLocal instance and suddenly I have a thread-safe connection pool.
Now, I do understand that this is just a quick fix. The call stack from HiveStatement is synchronous, but the ThreadLocal solution would need to be revisited if any async processing is introduced. Also, I can't help wondering, why is a Utilities class holding state, let alone the state of the entire execution plan? Finally, I doubt such a simple solution solves hive-80, so I imagine other threading issues are in play when Thrift is involved. In the meantime however, I'm hoping the fix can be made, patch attached.
Cliff