You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Darren Yin <da...@gmail.com> on 2014/07/23 22:27:13 UTC

MoveTasks releasing locks that don't belong to it?

In releaseLocks in MoveTask
<https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java>,
it looks like the lockMgr.getLocks line actually grabs all locks associated
with whatever lock objects (e.g., a partition) the MoveTask is concerned
with.

My theory for what can happen: A MoveTask starts writing to a partition,
taking some of its own locks. While it's running, other hive jobs put locks
on the same partition or table. Then, when the MoveTask ends, it finds all
locks associated with that partition and releases them all, and then the
other hive jobs who originally put those locks there run into NoNode
KeeperExceptions (using Zookeeper). I'm actually on a modified build of
Hive 0.11, but from reading the code, it appears that the issue still
exists in trunk as well.

Has anyone else run into this sort of issue?

Thanks!
--Darren

Re: MoveTasks releasing locks that don't belong to it?

Posted by Edward Capriolo <ed...@gmail.com>.

For what it is worth I run with locks off. I played with it in versions
0.8x -> 0.10. I found them problematic particularly from hive-server. We
ended up doing our own locking application side. I am very suprised that
some vendors "distributions" suggest that this is better/safer "on" when I
have found the opposite to be true.

Do not get me wrong the feature works and we used to for a while, but
periodically things would pop up around this feature that would
log-jam/break some etl process and the reward was not worth the drawbacks.

On Wed, Jul 23, 2014 at 4:27 PM, Darren Yin <da...@gmail.com> wrote:

> In releaseLocks in MoveTask
> <https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java>,
> it looks like the lockMgr.getLocks line actually grabs all locks associated
> with whatever lock objects (e.g., a partition) the MoveTask is concerned
> with.
>
> My theory for what can happen: A MoveTask starts writing to a partition,
> taking some of its own locks. While it's running, other hive jobs put locks
> on the same partition or table. Then, when the MoveTask ends, it finds all
> locks associated with that partition and releases them all, and then the
> other hive jobs who originally put those locks there run into NoNode
> KeeperExceptions (using Zookeeper). I'm actually on a modified build of
> Hive 0.11, but from reading the code, it appears that the issue still
> exists in trunk as well.
>
> Has anyone else run into this sort of issue?
>
> Thanks!
> --Darren
>