You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by David Morin <mo...@gmail.com> on 2019/09/09 15:29:33 UTC

Locks with ACID: need some clarifications

Hello,

I use in production HDP 2.6.5 with Hive 2.1.0
We use transactional tables and we try to ingest data in a streaming way (despite the fact we still use Hive 2)
I've read some docs but I would like some clarifications concerning the use of Locks with transactional tables.
Do we have to use locks during insert or delete ?
Let's consider this pipeline:
1. create a transaction for delete and related lock (shared)
2. create delta directory + file with this new transaction (original transaction != current with original is the transaction used for the last insert)
3 Same steps 1 and 2 for Insert (except original transaction = current)
4. commit transactions

Can we use Shared lock here ? Thus select queries can still be used

Thanks
David







Re: Locks with ACID: need some clarifications

Posted by David Morin <mo...@gmail.com>.
Ok, I got it
Thanks Alan

Le lun. 9 sept. 2019 à 20:34, Alan Gates <al...@gmail.com> a écrit :

> Not simultaneously.  In Hive 2 the first delete started will obtain a
> lock, and the second will have to wait.  In Hive 3, the first one to commit
> will win and the second will fail (at commit time).
>
> Alan.
>
> On Mon, Sep 9, 2019 at 10:55 AM David Morin <mo...@gmail.com>
> wrote:
>
>> Thanks Alan,
>>
>> When you say "you just can't have two simultaneous deletes in the same
>> partition", simultaneous means for the same transaction ?
>> If a create 2 "transactions" for 2 deletes on the same table/partition it
>> works. Am I right ?
>>
>>
>> Le lun. 9 sept. 2019 à 19:04, Alan Gates <al...@gmail.com> a écrit :
>>
>>> In Hive 2 update and delete take what are called semi-shared locks
>>> (meaning they allow shared locks through, while not allowing other
>>> semi-shared locks), and insert and select take shared locks.  So you can
>>> insert or select while deleting, you just can't have two simultaneous
>>> deletes in the same partition.
>>>
>>> The reason insert can take a shared lock is because Hive does not
>>> enforce uniqueness constraints, so there's no concept of overwriting an
>>> existing row.  Multiple inserts can also proceed simultaneously.
>>>
>>> This changes in Hive 3, where update and delete also take shared locks
>>> and a first committer wins strategy is employed instead.
>>>
>>> Alan.
>>>
>>> On Mon, Sep 9, 2019 at 8:29 AM David Morin <mo...@gmail.com>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I use in production HDP 2.6.5 with Hive 2.1.0
>>>> We use transactional tables and we try to ingest data in a streaming
>>>> way (despite the fact we still use Hive 2)
>>>> I've read some docs but I would like some clarifications concerning the
>>>> use of Locks with transactional tables.
>>>> Do we have to use locks during insert or delete ?
>>>> Let's consider this pipeline:
>>>> 1. create a transaction for delete and related lock (shared)
>>>> 2. create delta directory + file with this new transaction (original
>>>> transaction != current with original is the transaction used for the last
>>>> insert)
>>>> 3 Same steps 1 and 2 for Insert (except original transaction = current)
>>>> 4. commit transactions
>>>>
>>>> Can we use Shared lock here ? Thus select queries can still be used
>>>>
>>>> Thanks
>>>> David
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>

Re: Locks with ACID: need some clarifications

Posted by Alan Gates <al...@gmail.com>.
Not simultaneously.  In Hive 2 the first delete started will obtain a lock,
and the second will have to wait.  In Hive 3, the first one to commit will
win and the second will fail (at commit time).

Alan.

On Mon, Sep 9, 2019 at 10:55 AM David Morin <mo...@gmail.com>
wrote:

> Thanks Alan,
>
> When you say "you just can't have two simultaneous deletes in the same
> partition", simultaneous means for the same transaction ?
> If a create 2 "transactions" for 2 deletes on the same table/partition it
> works. Am I right ?
>
>
> Le lun. 9 sept. 2019 à 19:04, Alan Gates <al...@gmail.com> a écrit :
>
>> In Hive 2 update and delete take what are called semi-shared locks
>> (meaning they allow shared locks through, while not allowing other
>> semi-shared locks), and insert and select take shared locks.  So you can
>> insert or select while deleting, you just can't have two simultaneous
>> deletes in the same partition.
>>
>> The reason insert can take a shared lock is because Hive does not enforce
>> uniqueness constraints, so there's no concept of overwriting an existing
>> row.  Multiple inserts can also proceed simultaneously.
>>
>> This changes in Hive 3, where update and delete also take shared locks
>> and a first committer wins strategy is employed instead.
>>
>> Alan.
>>
>> On Mon, Sep 9, 2019 at 8:29 AM David Morin <mo...@gmail.com>
>> wrote:
>>
>>> Hello,
>>>
>>> I use in production HDP 2.6.5 with Hive 2.1.0
>>> We use transactional tables and we try to ingest data in a streaming way
>>> (despite the fact we still use Hive 2)
>>> I've read some docs but I would like some clarifications concerning the
>>> use of Locks with transactional tables.
>>> Do we have to use locks during insert or delete ?
>>> Let's consider this pipeline:
>>> 1. create a transaction for delete and related lock (shared)
>>> 2. create delta directory + file with this new transaction (original
>>> transaction != current with original is the transaction used for the last
>>> insert)
>>> 3 Same steps 1 and 2 for Insert (except original transaction = current)
>>> 4. commit transactions
>>>
>>> Can we use Shared lock here ? Thus select queries can still be used
>>>
>>> Thanks
>>> David
>>>
>>>
>>>
>>>
>>>
>>>
>>>

Re: Locks with ACID: need some clarifications

Posted by David Morin <mo...@gmail.com>.
Thanks Alan,

When you say "you just can't have two simultaneous deletes in the same
partition", simultaneous means for the same transaction ?
If a create 2 "transactions" for 2 deletes on the same table/partition it
works. Am I right ?


Le lun. 9 sept. 2019 à 19:04, Alan Gates <al...@gmail.com> a écrit :

> In Hive 2 update and delete take what are called semi-shared locks
> (meaning they allow shared locks through, while not allowing other
> semi-shared locks), and insert and select take shared locks.  So you can
> insert or select while deleting, you just can't have two simultaneous
> deletes in the same partition.
>
> The reason insert can take a shared lock is because Hive does not enforce
> uniqueness constraints, so there's no concept of overwriting an existing
> row.  Multiple inserts can also proceed simultaneously.
>
> This changes in Hive 3, where update and delete also take shared locks and
> a first committer wins strategy is employed instead.
>
> Alan.
>
> On Mon, Sep 9, 2019 at 8:29 AM David Morin <mo...@gmail.com>
> wrote:
>
>> Hello,
>>
>> I use in production HDP 2.6.5 with Hive 2.1.0
>> We use transactional tables and we try to ingest data in a streaming way
>> (despite the fact we still use Hive 2)
>> I've read some docs but I would like some clarifications concerning the
>> use of Locks with transactional tables.
>> Do we have to use locks during insert or delete ?
>> Let's consider this pipeline:
>> 1. create a transaction for delete and related lock (shared)
>> 2. create delta directory + file with this new transaction (original
>> transaction != current with original is the transaction used for the last
>> insert)
>> 3 Same steps 1 and 2 for Insert (except original transaction = current)
>> 4. commit transactions
>>
>> Can we use Shared lock here ? Thus select queries can still be used
>>
>> Thanks
>> David
>>
>>
>>
>>
>>
>>
>>

Re: Locks with ACID: need some clarifications

Posted by Alan Gates <al...@gmail.com>.
In Hive 2 update and delete take what are called semi-shared locks (meaning
they allow shared locks through, while not allowing other semi-shared
locks), and insert and select take shared locks.  So you can insert or
select while deleting, you just can't have two simultaneous deletes in the
same partition.

The reason insert can take a shared lock is because Hive does not enforce
uniqueness constraints, so there's no concept of overwriting an existing
row.  Multiple inserts can also proceed simultaneously.

This changes in Hive 3, where update and delete also take shared locks and
a first committer wins strategy is employed instead.

Alan.

On Mon, Sep 9, 2019 at 8:29 AM David Morin <mo...@gmail.com>
wrote:

> Hello,
>
> I use in production HDP 2.6.5 with Hive 2.1.0
> We use transactional tables and we try to ingest data in a streaming way
> (despite the fact we still use Hive 2)
> I've read some docs but I would like some clarifications concerning the
> use of Locks with transactional tables.
> Do we have to use locks during insert or delete ?
> Let's consider this pipeline:
> 1. create a transaction for delete and related lock (shared)
> 2. create delta directory + file with this new transaction (original
> transaction != current with original is the transaction used for the last
> insert)
> 3 Same steps 1 and 2 for Insert (except original transaction = current)
> 4. commit transactions
>
> Can we use Shared lock here ? Thus select queries can still be used
>
> Thanks
> David
>
>
>
>
>
>
>