You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by "Heather, James (ELS)" <ja...@elsevier.com> on 2016/07/28 15:18:57 UTC

Emulating a true INSERT or UPDATE

What would I need to do in order to emulate an INSERT or UPDATE in Phoenix, as opposed to an UPSERT?


Suppose I had a TRANSACTIONAL table. To do an INSERT, I then:

  1.  Start a transaction
  2.  SELECT the relevant rows, and throw an error if the SELECT is non-empty
  3.  UPSERT
  4.  Commit the transaction

To do an UPDATE, I do the same, except that in step 2 I throw an error if the SELECT is empty.


If all of the possible writes to those rows are enclosed in transactions, will this avoid the race conditions and give me a true INSERT and UPDATE (at a cost of having to make multiple queries, of course)?


The case I have in mind is where we might have DELETE and PATCH queries coming into our API. With a back end that supports UPDATE, it's not a problem if a DELETE and a PATCH come in at the same time: either the DELETE succeeds and then the PATCH fails, or the PATCH succeeds and then the DELETE succeeds. Either way, you end up with the row's being deleted. But if we use an UPSERT statement for the PATCH, we have a problem: the DELETE can succeed, and then the PATCH will still succeed, but it'll insert the row back in again.


I'm unclear as to how to use transactions to guarantee the right behaviour here.


James

________________________________

Elsevier Limited. Registered Office: The Boulevard, Langford Lane, Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, Registered in England and Wales.

Re: Emulating a true INSERT or UPDATE

Posted by James Taylor <ja...@apache.org>.
James - you do have to be a bit careful with selects over transactional
tables. If you have a connection that's only doing selects (over
transactional tables) and you want each of them to run in a new
transaction, you can call conn.setAutoCommit(true) which will cause each
select to start a new transaction before running. I'd also recommend
setting conn.setReadOnly(true) on the connection if this is the case. This
will prevent any DML from running and eventually be optimized to reduce RPC
traffic for transactional tables (see TTEPHRA-159).

If you've got a mix of selects and updates, then you have to be aware that
the first statement against a transactional table will implicitly start a
transaction. The transaction will be ended when a commit or rollback is
done.

Thanks,
James

On Fri, Jul 29, 2016 at 11:23 AM, Thomas D'Silva <td...@salesforce.com>
wrote:

> Yes, you don't have to start a transaction explicitly. You don't need to
> commit after you have done only reads. Transactions timeout depending
> on the data.tx.timeout attribute (see
> http://phoenix.apache.org/transactions.html).
>
> On Thu, Jul 28, 2016 at 11:06 PM, Heather, James (ELS) <
> james.heather@elsevier.com> wrote:
>
>> Thanks.
>>
>> What do you mean when you say the transaction starts automatically at
>> (2)? Do you mean that you don't need to start a transaction explicitly?
>>
>> If that's right, does that mean that you need to commit even after you've
>> only done read operations, so that Phoenix knows to close the transaction
>> it's created for you?
>>
>> James
>>
>> On 28 July 2016 11:45:17 p.m. James Taylor <ja...@apache.org>
>> wrote:
>>
>>> James,
>>> Your logic looks correct, assuming that you have the complete row keys
>>> in your SELECT statement. FYI, the transaction will start automatically at
>>> (2). You can optimize this slightly by just doing a COUNT(*) instead of
>>> returning the rows back to the client. For the UPDATE case, you'd throw if
>>> the count doesn't match the number of rows you have. You'll also have the
>>> added benefit that another client attempting to INSERT or UPDATE the same
>>> rows at the same time would fail (that's the conflict detection piece that
>>> Thomas mentioned).
>>> Thanks,
>>> James
>>>
>>> On Thu, Jul 28, 2016 at 2:46 PM, Thomas D'Silva <td...@salesforce.com>
>>> wrote:
>>>
>>>> If the table is transactional, you are guaranteed that if there are
>>>> overlapping transactions that try to commit the same row one will succeed
>>>> and the others will fail with an exception. There is also an additional
>>>> cost to doing conflict detection at commit time.
>>>>
>>>>
>>>> On Thu, Jul 28, 2016 at 8:18 AM, Heather, James (ELS) <
>>>> james.heather@elsevier.com> wrote:
>>>>
>>>>> What would I need to do in order to emulate an INSERT or UPDATE in
>>>>> Phoenix, as opposed to an UPSERT?
>>>>>
>>>>>
>>>>> Suppose I had a TRANSACTIONAL table. To do an INSERT, I then:
>>>>>
>>>>>    1. Start a transaction
>>>>>    2. SELECT the relevant rows, and throw an error if the SELECT is
>>>>>    non-empty
>>>>>    3. UPSERT
>>>>>    4. Commit the transaction
>>>>>
>>>>> To do an UPDATE, I do the same, except that in step 2 I throw an error
>>>>> if the SELECT is empty.
>>>>>
>>>>>
>>>>> If all of the possible writes to those rows are enclosed in
>>>>> transactions, will this avoid the race conditions and give me a true INSERT
>>>>> and UPDATE (at a cost of having to make multiple queries, of course)?
>>>>>
>>>>>
>>>>> The case I have in mind is where we might have DELETE and PATCH
>>>>> queries coming into our API. With a back end that supports UPDATE, it's not
>>>>> a problem if a DELETE and a PATCH come in at the same time: either the
>>>>> DELETE succeeds and then the PATCH fails, or the PATCH succeeds and then
>>>>> the DELETE succeeds. Either way, you end up with the row's being deleted.
>>>>> But if we use an UPSERT statement for the PATCH, we have a problem: the
>>>>> DELETE can succeed, and then the PATCH will still succeed, but it'll insert
>>>>> the row back in again.
>>>>>
>>>>>
>>>>> I'm unclear as to how to use transactions to guarantee the right
>>>>> behaviour here.
>>>>>
>>>>>
>>>>> James
>>>>>
>>>>> ------------------------------
>>>>>
>>>>> Elsevier Limited. Registered Office: The Boulevard, Langford Lane,
>>>>> Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084,
>>>>> Registered in England and Wales.
>>>>>
>>>>
>>>>
>>>
>> ------------------------------
>>
>> Elsevier Limited. Registered Office: The Boulevard, Langford Lane,
>> Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084,
>> Registered in England and Wales.
>>
>
>

Re: Emulating a true INSERT or UPDATE

Posted by Thomas D'Silva <td...@salesforce.com>.
Yes, you don't have to start a transaction explicitly. You don't need to
commit after you have done only reads. Transactions timeout depending
on the data.tx.timeout attribute (see
http://phoenix.apache.org/transactions.html).

On Thu, Jul 28, 2016 at 11:06 PM, Heather, James (ELS) <
james.heather@elsevier.com> wrote:

> Thanks.
>
> What do you mean when you say the transaction starts automatically at (2)?
> Do you mean that you don't need to start a transaction explicitly?
>
> If that's right, does that mean that you need to commit even after you've
> only done read operations, so that Phoenix knows to close the transaction
> it's created for you?
>
> James
>
> On 28 July 2016 11:45:17 p.m. James Taylor <ja...@apache.org> wrote:
>
>> James,
>> Your logic looks correct, assuming that you have the complete row keys in
>> your SELECT statement. FYI, the transaction will start automatically at
>> (2). You can optimize this slightly by just doing a COUNT(*) instead of
>> returning the rows back to the client. For the UPDATE case, you'd throw if
>> the count doesn't match the number of rows you have. You'll also have the
>> added benefit that another client attempting to INSERT or UPDATE the same
>> rows at the same time would fail (that's the conflict detection piece that
>> Thomas mentioned).
>> Thanks,
>> James
>>
>> On Thu, Jul 28, 2016 at 2:46 PM, Thomas D'Silva <td...@salesforce.com>
>> wrote:
>>
>>> If the table is transactional, you are guaranteed that if there are
>>> overlapping transactions that try to commit the same row one will succeed
>>> and the others will fail with an exception. There is also an additional
>>> cost to doing conflict detection at commit time.
>>>
>>>
>>> On Thu, Jul 28, 2016 at 8:18 AM, Heather, James (ELS) <
>>> james.heather@elsevier.com> wrote:
>>>
>>>> What would I need to do in order to emulate an INSERT or UPDATE in
>>>> Phoenix, as opposed to an UPSERT?
>>>>
>>>>
>>>> Suppose I had a TRANSACTIONAL table. To do an INSERT, I then:
>>>>
>>>>    1. Start a transaction
>>>>    2. SELECT the relevant rows, and throw an error if the SELECT is
>>>>    non-empty
>>>>    3. UPSERT
>>>>    4. Commit the transaction
>>>>
>>>> To do an UPDATE, I do the same, except that in step 2 I throw an error
>>>> if the SELECT is empty.
>>>>
>>>>
>>>> If all of the possible writes to those rows are enclosed in
>>>> transactions, will this avoid the race conditions and give me a true INSERT
>>>> and UPDATE (at a cost of having to make multiple queries, of course)?
>>>>
>>>>
>>>> The case I have in mind is where we might have DELETE and PATCH queries
>>>> coming into our API. With a back end that supports UPDATE, it's not a
>>>> problem if a DELETE and a PATCH come in at the same time: either the DELETE
>>>> succeeds and then the PATCH fails, or the PATCH succeeds and then the
>>>> DELETE succeeds. Either way, you end up with the row's being deleted. But
>>>> if we use an UPSERT statement for the PATCH, we have a problem: the DELETE
>>>> can succeed, and then the PATCH will still succeed, but it'll insert the
>>>> row back in again.
>>>>
>>>>
>>>> I'm unclear as to how to use transactions to guarantee the right
>>>> behaviour here.
>>>>
>>>>
>>>> James
>>>>
>>>> ------------------------------
>>>>
>>>> Elsevier Limited. Registered Office: The Boulevard, Langford Lane,
>>>> Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084,
>>>> Registered in England and Wales.
>>>>
>>>
>>>
>>
> ------------------------------
>
> Elsevier Limited. Registered Office: The Boulevard, Langford Lane,
> Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084,
> Registered in England and Wales.
>

Re: Emulating a true INSERT or UPDATE

Posted by "Heather, James (ELS)" <ja...@elsevier.com>.
Thanks.

What do you mean when you say the transaction starts automatically at (2)? Do you mean that you don't need to start a transaction explicitly?

If that's right, does that mean that you need to commit even after you've only done read operations, so that Phoenix knows to close the transaction it's created for you?

James

On 28 July 2016 11:45:17 p.m. James Taylor <ja...@apache.org> wrote:

James,
Your logic looks correct, assuming that you have the complete row keys in your SELECT statement. FYI, the transaction will start automatically at (2). You can optimize this slightly by just doing a COUNT(*) instead of returning the rows back to the client. For the UPDATE case, you'd throw if the count doesn't match the number of rows you have. You'll also have the added benefit that another client attempting to INSERT or UPDATE the same rows at the same time would fail (that's the conflict detection piece that Thomas mentioned).
Thanks,
James

On Thu, Jul 28, 2016 at 2:46 PM, Thomas D'Silva <td...@salesforce.com>> wrote:
If the table is transactional, you are guaranteed that if there are overlapping transactions that try to commit the same row one will succeed and the others will fail with an exception. There is also an additional cost to doing conflict detection at commit time.


On Thu, Jul 28, 2016 at 8:18 AM, Heather, James (ELS) <ja...@elsevier.com>> wrote:

What would I need to do in order to emulate an INSERT or UPDATE in Phoenix, as opposed to an UPSERT?


Suppose I had a TRANSACTIONAL table. To do an INSERT, I then:

  1.  Start a transaction
  2.  SELECT the relevant rows, and throw an error if the SELECT is non-empty
  3.  UPSERT
  4.  Commit the transaction

To do an UPDATE, I do the same, except that in step 2 I throw an error if the SELECT is empty.


If all of the possible writes to those rows are enclosed in transactions, will this avoid the race conditions and give me a true INSERT and UPDATE (at a cost of having to make multiple queries, of course)?


The case I have in mind is where we might have DELETE and PATCH queries coming into our API. With a back end that supports UPDATE, it's not a problem if a DELETE and a PATCH come in at the same time: either the DELETE succeeds and then the PATCH fails, or the PATCH succeeds and then the DELETE succeeds. Either way, you end up with the row's being deleted. But if we use an UPSERT statement for the PATCH, we have a problem: the DELETE can succeed, and then the PATCH will still succeed, but it'll insert the row back in again.


I'm unclear as to how to use transactions to guarantee the right behaviour here.


James

________________________________

Elsevier Limited. Registered Office: The Boulevard, Langford Lane, Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, Registered in England and Wales.



________________________________

Elsevier Limited. Registered Office: The Boulevard, Langford Lane, Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, Registered in England and Wales.

Re: Emulating a true INSERT or UPDATE

Posted by James Taylor <ja...@apache.org>.
James,
Your logic looks correct, assuming that you have the complete row keys in
your SELECT statement. FYI, the transaction will start automatically at
(2). You can optimize this slightly by just doing a COUNT(*) instead of
returning the rows back to the client. For the UPDATE case, you'd throw if
the count doesn't match the number of rows you have. You'll also have the
added benefit that another client attempting to INSERT or UPDATE the same
rows at the same time would fail (that's the conflict detection piece that
Thomas mentioned).
Thanks,
James

On Thu, Jul 28, 2016 at 2:46 PM, Thomas D'Silva <td...@salesforce.com>
wrote:

> If the table is transactional, you are guaranteed that if there are
> overlapping transactions that try to commit the same row one will succeed
> and the others will fail with an exception. There is also an additional
> cost to doing conflict detection at commit time.
>
>
> On Thu, Jul 28, 2016 at 8:18 AM, Heather, James (ELS) <
> james.heather@elsevier.com> wrote:
>
>> What would I need to do in order to emulate an INSERT or UPDATE in
>> Phoenix, as opposed to an UPSERT?
>>
>>
>> Suppose I had a TRANSACTIONAL table. To do an INSERT, I then:
>>
>>    1. Start a transaction
>>    2. SELECT the relevant rows, and throw an error if the SELECT is
>>    non-empty
>>    3. UPSERT
>>    4. Commit the transaction
>>
>> To do an UPDATE, I do the same, except that in step 2 I throw an error if
>> the SELECT is empty.
>>
>>
>> If all of the possible writes to those rows are enclosed in transactions,
>> will this avoid the race conditions and give me a true INSERT and UPDATE
>> (at a cost of having to make multiple queries, of course)?
>>
>>
>> The case I have in mind is where we might have DELETE and PATCH queries
>> coming into our API. With a back end that supports UPDATE, it's not a
>> problem if a DELETE and a PATCH come in at the same time: either the DELETE
>> succeeds and then the PATCH fails, or the PATCH succeeds and then the
>> DELETE succeeds. Either way, you end up with the row's being deleted. But
>> if we use an UPSERT statement for the PATCH, we have a problem: the DELETE
>> can succeed, and then the PATCH will still succeed, but it'll insert the
>> row back in again.
>>
>>
>> I'm unclear as to how to use transactions to guarantee the right
>> behaviour here.
>>
>>
>> James
>>
>> ------------------------------
>>
>> Elsevier Limited. Registered Office: The Boulevard, Langford Lane,
>> Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084,
>> Registered in England and Wales.
>>
>
>

Re: Emulating a true INSERT or UPDATE

Posted by Thomas D'Silva <td...@salesforce.com>.
If the table is transactional, you are guaranteed that if there are
overlapping transactions that try to commit the same row one will succeed
and the others will fail with an exception. There is also an additional
cost to doing conflict detection at commit time.


On Thu, Jul 28, 2016 at 8:18 AM, Heather, James (ELS) <
james.heather@elsevier.com> wrote:

> What would I need to do in order to emulate an INSERT or UPDATE in
> Phoenix, as opposed to an UPSERT?
>
>
> Suppose I had a TRANSACTIONAL table. To do an INSERT, I then:
>
>    1. Start a transaction
>    2. SELECT the relevant rows, and throw an error if the SELECT is
>    non-empty
>    3. UPSERT
>    4. Commit the transaction
>
> To do an UPDATE, I do the same, except that in step 2 I throw an error if
> the SELECT is empty.
>
>
> If all of the possible writes to those rows are enclosed in transactions,
> will this avoid the race conditions and give me a true INSERT and UPDATE
> (at a cost of having to make multiple queries, of course)?
>
>
> The case I have in mind is where we might have DELETE and PATCH queries
> coming into our API. With a back end that supports UPDATE, it's not a
> problem if a DELETE and a PATCH come in at the same time: either the DELETE
> succeeds and then the PATCH fails, or the PATCH succeeds and then the
> DELETE succeeds. Either way, you end up with the row's being deleted. But
> if we use an UPSERT statement for the PATCH, we have a problem: the DELETE
> can succeed, and then the PATCH will still succeed, but it'll insert the
> row back in again.
>
>
> I'm unclear as to how to use transactions to guarantee the right behaviour
> here.
>
>
> James
>
> ------------------------------
>
> Elsevier Limited. Registered Office: The Boulevard, Langford Lane,
> Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084,
> Registered in England and Wales.
>