You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Mridul Muralidharan <mr...@yahoo-inc.com> on 2010/01/03 19:46:25 UTC

Secondary indexes and transactions

Hi,


   I was wondering about the atomicity guarantees when using secondary 
indexes from within a transaction.

 From what I could gather, updates to the index table goes through its 
own (set of) rpc before the underlying transactional table is updated - 
and these update happens outside of the locks for the transaction table.
Also, the index regions need not colocate with the table region.

So essentially wondering
a) if the index can go out of sync with the transactional table ?
b) if there are errors with update to table, are the indexes rolled back ?
c) Whether there can be issues if there are parallel updates invoked for 
the same row - whether index changes end up being inconsistent with 
table data (due to lock not being held while updating index).


I guess they are all kind of related queries.


I was not able to get a clear picture from the archives, so 
RTFM/pointers would be helpful if this is already answered.

Thanks,
Mridul

Re: Secondary indexes and transactions

Posted by "Murali Krishna. P" <mu...@yahoo.com>.
As far as I understand transaction guarantee is there only for the operations done via api using with TransacatonState (http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/client/transactional/TransactionalTable.html). So if IndexRegion also uses a TranscationalTable for secondary index table instead of simple HTable and uses the same TransactionState that client provides for updateIndex methods , we can gaurantee index consistency?

Can be done via overriding Transactional api's in IndexTable and modifying IndexRegion ? 
Is there any reason for not supporting transcation gaurantee with secondary indexes, isn't it dangerous without this? I am asking so as to decide whether we should modify this to support it or should we implement a separate TransactionalIndexTable ?

 Thanks,
Murali Krishna




________________________________
From: Andrew Purtell <ap...@apache.org>
To: hbase-user@hadoop.apache.org
Sent: Mon, 4 January, 2010 6:43:15 AM
Subject: Re: Secondary indexes and transactions

> > Yes.  But IIUC, the client is running a transaction that spans the update to
> > the two tables.  It'll take care of the undo should say the update to the
> > transacation table fails.
> > 
> Isn't the update to the secondary index implicitly done ? As in, does 
> the client 'see' this update ?
> My impression was that the secondary index update was done by the 
> indexedregion - and was not visible to the client : which manages occ 
> transaction ...


Yes, you are correct. I think Stack just was not precise enough in his language.

  - Andy

Re: Secondary indexes and transactions

Posted by Andrew Purtell <ap...@apache.org>.
> > Yes.  But IIUC, the client is running a transaction that spans the update to
> > the two tables.  It'll take care of the undo should say the update to the
> > transacation table fails.
> > 
> Isn't the update to the secondary index implicitly done ? As in, does 
> the client 'see' this update ?
> My impression was that the secondary index update was done by the 
> indexedregion - and was not visible to the client : which manages occ 
> transaction ...


Yes, you are correct. I think Stack just was not precise enough in his language.

  - Andy


      


Re: Secondary indexes and transactions

Posted by Mridul Muralidharan <mr...@yahoo-inc.com>.
Thanks a lot for clarifying !
That was very helpful.


Regards,
Mridul

Clint Morgan wrote:
> Sorry I have been so slow in understanding. I now see what you mean. I was
> trying to explain how I thought it *should* work, rather than what it
> actually does now.
> 
> That method of aborting on an exception in the 2nd phase is incorrect for
> the reason you mention: For a 3 region transaction, we could have committed
> the first region, error-ed on the 2nd region, and then aborted the 3rd
> region. So even dis-regarding indexes, we would lose our atomic property in
> the base table.
> 
> Rather we can let just the 2nd region fail with the assumption that it has
> all the information that it needs to get that transaction committed when it
> recovers from the WAL. So when the 2nd region is finally recovered and ready
> to serve again it will have the transaction committed.
> 
> For your second point about not aborting in the case of failure in the
> regionserver, you also raise a valid point. A failure of the filesystem will
> cause an abort, and then initiate the WAL recovery properly. However other
> exceptions could sneak through (maybe an OOME failure on the Indexed put
> rpc), and cause an inconsistent index and/or some of the trx puts not being
> applied.
> 
> Rather we should probably be more explicit about handling IOE's in the
> transactional layer. The trx region server needs to guarantee that when it
> is told to commit a transaction, the writes will eventually occur. It may be
> as simple as handling an exception in the commit methods by aborting the
> region server, but this seems to fragile.
> 
> I've been delaying worrying to much in the details of transactional failure
> recovery until we have append and a working write-ahead-log in core hbase.
> But its probably about time to revisit...
> 
> Thank you very much for digging in here, a second set of eyes is handy.
> -clint
> 
> On Tue, Jan 19, 2010 at 1:37 AM, Mridul Muralidharan
> <mr...@yahoo-inc.com>wrote:
> 
>> Clint Morgan wrote:
>>
>>> After the 2PC process has determined that a commit should happen there is
>>> no
>>> roll-back. The commit must be processed.
>>>
>>
>> From org.apache.hadoop.hbase.client.transactional.TransactionManager
>>
>> doCommit() which is the 2nd phase of 2-phase commit, on throwing Exception
>> results in abort() which does the rollback.
>> And this abort specifically ignores the region which hit the error -
>> thereby making the index go out of sync.
>>
>>
>> I hope I am not missing something with this assertion, since I had
>> mentioned this earlier too (possibly got buried in my details ?).
>>
>>
>> Since abort is resulting in an rpc call, which results in some log
>> manipulation, I left it at that and did not dig deeper - do you mean it
>> actually does nothing ?
>>
>>
>>
>>
>>> So in your example, a commit has been approved, and one the of the regions
>>> is told to go ahead and commit. The region triggers the index Put, but
>>> then
>>> fails on his Puts (like out of disk space, out of memory, etc). This
>>> should
>>> shutdown the RegionServer. Then when the region's WAL is recovered from,
>>> the
>>> trx puts from the partially-committed transaction will be there. We will
>>> look in the global transaction log to see that the trx is to be committed,
>>> and then apply the puts to the base table.
>>>
>>
>> I relooked at the implementation just to make sure I got the basic issue
>> right.
>> I did not see this behavior you mention above - of IOException resulting in
>> shutting down of a region server - and quite a lot of methods actually could
>> result in IOException's getting thrown when traversing the call-graph from
>> indexedregion.Put's invocation (filesystem going missing is just one case
>> where this happens I think - but I did not see this as being the only case :
>> atleast impl/doc wise).
>>
>>
>>
>>
>> Anyway, to make progress, if commit failure in a indexed regionserver does
>> a rollback of the txn, then the issue I mentioned can occur ?
>>
>>
>> Thanks for your patience and time !
>>
>> Regards,
>> Mridul
>>
>>
>>
>>> -clint
>>>
>>> On Fri, Jan 15, 2010 at 2:43 AM, Mridul Muralidharan
>>> <mr...@yahoo-inc.com>wrote:
>>>
>>>  I think I might not have explained it well enough.
>>>> As part of executing a Put, the index update happens prior to updating
>>>> the
>>>> underlying transactional table currently - and is done outside of the
>>>> lock's.
>>>> If the underlying transactional table update results in an exception -
>>>> what
>>>> is the state of the index ? From what I understand, a rollback is
>>>> initiated
>>>> - and this results in rolling back all regions - except for the one which
>>>> threw the exception : and so the secondary index update which happened
>>>> implicitly is never reverted.
>>>> Or am I missing something here ?
>>>>
>>>> To be clear, I am talking about the actual commit as part of the two
>>>> phase
>>>> commit throwing an exception : not a conflict exception, but an
>>>> IOException
>>>> or variant - which can result in the secondary index going out of sync.
>>>> I am contrasting it with the case of explicit indexes maintained by
>>>> client
>>>> - where the rollback by client (when the commit fails for a region)
>>>> results
>>>> in rollback on all the regions in the transaction - which includes the
>>>> seconday indexes 'visible' to the client.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Thanks,
>>>> Mridul
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>  If the regionserver crashes during this commit process, then I *think*
>>>>> it
>>>>> should still recover correctly. It will see the transactional operations
>>>>> in
>>>>> the WAL, and the propagate the puts into the index. However this WAL
>>>>> recovery stuff has been changing, and I'm not confident that it
>>>>> currently
>>>>> works in all failure cases.
>>>>>
>>>>> Does this normal case address your concerns?
>>>>>
>>>>> -clint
>>>>>
>>>>> On Sun, Jan 3, 2010 at 4:46 PM, Mridul Muralidharan
>>>>> <mr...@yahoo-inc.com>wrote:
>>>>>
>>>>>  stack wrote:
>>>>>
>>>>>>  On Sun, Jan 3, 2010 at 10:46 AM, Mridul Muralidharan
>>>>>>
>>>>>>> <mr...@yahoo-inc.com>wrote:
>>>>>>>
>>>>>>>  I was wondering about the atomicity guarantees when using secondary
>>>>>>>
>>>>>>>  indexes from within a transaction.
>>>>>>>>
>>>>>>>>  You are talking about indexed hbase from transactional hbase
>>>>>>>> contrib?
>>>>>>>>
>>>>>>>>  Yes, exactly.
>>>>>>
>>>>>>
>>>>>>  From what I could gather, updates to the index table goes through its
>>>>>>
>>>>>>> own
>>>>>>>
>>>>>>>  (set of) rpc before the underlying transactional table is updated -
>>>>>>>> and
>>>>>>>> these update happens outside of the locks for the transaction table.
>>>>>>>>
>>>>>>>>
>>>>>>>>  Yes.  But IIUC, the client is running a transaction that spans the
>>>>>>>>
>>>>>>> update
>>>>>>> to
>>>>>>> the two tables.  It'll take care of the undo should say the update to
>>>>>>> the
>>>>>>> transacation table fails.
>>>>>>>
>>>>>>>
>>>>>>>  Isn't the update to the secondary index implicitly done ? As in, does
>>>>>>>
>>>>>> the
>>>>>> client 'see' this update ?
>>>>>> My impression was that the secondary index update was done by the
>>>>>> indexedregion - and was not visible to the client : which manages occ
>>>>>> transaction ...
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>  Also, the index regions need not colocate with the table region.
>>>>>>
>>>>>>> So essentially wondering
>>>>>>>> a) if the index can go out of sync with the transactional table ?
>>>>>>>>
>>>>>>>>
>>>>>>>>  It should not.  The client should run the undos if the insert does
>>>>>>>> not
>>>>>>>>
>>>>>>> go
>>>>>>> into both tables successfully.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  b) if there are errors with update to table, are the indexes rolled
>>>>>>> back
>>>>>>>
>>>>>>>  ?
>>>>>>>>
>>>>>>>>  Yes.
>>>>>>>>
>>>>>>>
>>>>>>>  c) Whether there can be issues if there are parallel updates invoked
>>>>>>> for
>>>>>>>
>>>>>>>  the same row - whether index changes end up being inconsistent with
>>>>>>>> table
>>>>>>>> data (due to lock not being held while updating index).
>>>>>>>>
>>>>>>>>
>>>>>>>>  This might be possible.  There is a lock held on a row.  I'm not
>>>>>>>> sure
>>>>>>>>
>>>>>>> if
>>>>>>> the
>>>>>>> lock is held on transaction table row while the update is being done
>>>>>>> to
>>>>>>> the
>>>>>>> index table.
>>>>>>>
>>>>>>> This is the doc. as it stands on transactional hbase:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/client/transactional/package-summary.html#package_description
>>>>>>>
>>>>>>> Here is the doc. on indexed-transactional hbase:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/client/tableindexed/package-summary.html#package_description
>>>>>>>
>>>>>>> You've probably tripped over it already but just in case, it might
>>>>>>> help.
>>>>>>>
>>>>>>>
>>>>>>>  I did go through the package sumamries, thanks : which is what
>>>>>> increased
>>>>>> my
>>>>>> confusion.
>>>>>>
>>>>>> My current understanding is :
>>>>>>
>>>>>> a) Client 'simulates' the transaction - by inspecting the state of the
>>>>>> rows
>>>>>> on commit and rolls back in case of conflicting updates.
>>>>>>
>>>>>> b) secondary index updates are transparent to client api and are
>>>>>> directly
>>>>>> done by the indexedregion as part of its implementation.
>>>>>>
>>>>>>
>>>>>> If this is correct, I am wondering if overlapping rollbacks can result
>>>>>> in
>>>>>> secondary index going out of sync with the table since (a) does not see
>>>>>> those (one update gets rolled back while another goes through - or
>>>>>> variations of it).
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Mridul
>>>>>>
>>>>>>
>>>>>>
>>>>>>  St.Ack
>>>>>>
>>>>>>
>>>>>>>  I guess they are all kind of related queries.
>>>>>>>
>>>>>>>> I was not able to get a clear picture from the archives, so
>>>>>>>> RTFM/pointers
>>>>>>>> would be helpful if this is already answered.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Mridul
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>


Re: Secondary indexes and transactions

Posted by Clint Morgan <cl...@troove.net>.
Sorry I have been so slow in understanding. I now see what you mean. I was
trying to explain how I thought it *should* work, rather than what it
actually does now.

That method of aborting on an exception in the 2nd phase is incorrect for
the reason you mention: For a 3 region transaction, we could have committed
the first region, error-ed on the 2nd region, and then aborted the 3rd
region. So even dis-regarding indexes, we would lose our atomic property in
the base table.

Rather we can let just the 2nd region fail with the assumption that it has
all the information that it needs to get that transaction committed when it
recovers from the WAL. So when the 2nd region is finally recovered and ready
to serve again it will have the transaction committed.

For your second point about not aborting in the case of failure in the
regionserver, you also raise a valid point. A failure of the filesystem will
cause an abort, and then initiate the WAL recovery properly. However other
exceptions could sneak through (maybe an OOME failure on the Indexed put
rpc), and cause an inconsistent index and/or some of the trx puts not being
applied.

Rather we should probably be more explicit about handling IOE's in the
transactional layer. The trx region server needs to guarantee that when it
is told to commit a transaction, the writes will eventually occur. It may be
as simple as handling an exception in the commit methods by aborting the
region server, but this seems to fragile.

I've been delaying worrying to much in the details of transactional failure
recovery until we have append and a working write-ahead-log in core hbase.
But its probably about time to revisit...

Thank you very much for digging in here, a second set of eyes is handy.
-clint

On Tue, Jan 19, 2010 at 1:37 AM, Mridul Muralidharan
<mr...@yahoo-inc.com>wrote:

> Clint Morgan wrote:
>
>> After the 2PC process has determined that a commit should happen there is
>> no
>> roll-back. The commit must be processed.
>>
>
>
> From org.apache.hadoop.hbase.client.transactional.TransactionManager
>
> doCommit() which is the 2nd phase of 2-phase commit, on throwing Exception
> results in abort() which does the rollback.
> And this abort specifically ignores the region which hit the error -
> thereby making the index go out of sync.
>
>
> I hope I am not missing something with this assertion, since I had
> mentioned this earlier too (possibly got buried in my details ?).
>
>
> Since abort is resulting in an rpc call, which results in some log
> manipulation, I left it at that and did not dig deeper - do you mean it
> actually does nothing ?
>
>
>
>
>> So in your example, a commit has been approved, and one the of the regions
>> is told to go ahead and commit. The region triggers the index Put, but
>> then
>> fails on his Puts (like out of disk space, out of memory, etc). This
>> should
>> shutdown the RegionServer. Then when the region's WAL is recovered from,
>> the
>> trx puts from the partially-committed transaction will be there. We will
>> look in the global transaction log to see that the trx is to be committed,
>> and then apply the puts to the base table.
>>
>
>
> I relooked at the implementation just to make sure I got the basic issue
> right.
> I did not see this behavior you mention above - of IOException resulting in
> shutting down of a region server - and quite a lot of methods actually could
> result in IOException's getting thrown when traversing the call-graph from
> indexedregion.Put's invocation (filesystem going missing is just one case
> where this happens I think - but I did not see this as being the only case :
> atleast impl/doc wise).
>
>
>
>
> Anyway, to make progress, if commit failure in a indexed regionserver does
> a rollback of the txn, then the issue I mentioned can occur ?
>
>
> Thanks for your patience and time !
>
> Regards,
> Mridul
>
>
>
>> -clint
>>
>> On Fri, Jan 15, 2010 at 2:43 AM, Mridul Muralidharan
>> <mr...@yahoo-inc.com>wrote:
>>
>>  I think I might not have explained it well enough.
>>> As part of executing a Put, the index update happens prior to updating
>>> the
>>> underlying transactional table currently - and is done outside of the
>>> lock's.
>>> If the underlying transactional table update results in an exception -
>>> what
>>> is the state of the index ? From what I understand, a rollback is
>>> initiated
>>> - and this results in rolling back all regions - except for the one which
>>> threw the exception : and so the secondary index update which happened
>>> implicitly is never reverted.
>>> Or am I missing something here ?
>>>
>>> To be clear, I am talking about the actual commit as part of the two
>>> phase
>>> commit throwing an exception : not a conflict exception, but an
>>> IOException
>>> or variant - which can result in the secondary index going out of sync.
>>> I am contrasting it with the case of explicit indexes maintained by
>>> client
>>> - where the rollback by client (when the commit fails for a region)
>>> results
>>> in rollback on all the regions in the transaction - which includes the
>>> seconday indexes 'visible' to the client.
>>>
>>>
>>>
>>>
>>>
>>> Thanks,
>>> Mridul
>>>
>>>
>>>
>>>
>>>
>>>  If the regionserver crashes during this commit process, then I *think*
>>>> it
>>>> should still recover correctly. It will see the transactional operations
>>>> in
>>>> the WAL, and the propagate the puts into the index. However this WAL
>>>> recovery stuff has been changing, and I'm not confident that it
>>>> currently
>>>> works in all failure cases.
>>>>
>>>> Does this normal case address your concerns?
>>>>
>>>> -clint
>>>>
>>>> On Sun, Jan 3, 2010 at 4:46 PM, Mridul Muralidharan
>>>> <mr...@yahoo-inc.com>wrote:
>>>>
>>>>  stack wrote:
>>>>
>>>>>  On Sun, Jan 3, 2010 at 10:46 AM, Mridul Muralidharan
>>>>>
>>>>>> <mr...@yahoo-inc.com>wrote:
>>>>>>
>>>>>>  I was wondering about the atomicity guarantees when using secondary
>>>>>>
>>>>>>  indexes from within a transaction.
>>>>>>>
>>>>>>>
>>>>>>>  You are talking about indexed hbase from transactional hbase
>>>>>>> contrib?
>>>>>>>
>>>>>>>  Yes, exactly.
>>>>>
>>>>>
>>>>>
>>>>>  From what I could gather, updates to the index table goes through its
>>>>>
>>>>>> own
>>>>>>
>>>>>>  (set of) rpc before the underlying transactional table is updated -
>>>>>>> and
>>>>>>> these update happens outside of the locks for the transaction table.
>>>>>>>
>>>>>>>
>>>>>>>  Yes.  But IIUC, the client is running a transaction that spans the
>>>>>>>
>>>>>> update
>>>>>> to
>>>>>> the two tables.  It'll take care of the undo should say the update to
>>>>>> the
>>>>>> transacation table fails.
>>>>>>
>>>>>>
>>>>>>  Isn't the update to the secondary index implicitly done ? As in, does
>>>>>>
>>>>> the
>>>>> client 'see' this update ?
>>>>> My impression was that the secondary index update was done by the
>>>>> indexedregion - and was not visible to the client : which manages occ
>>>>> transaction ...
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>  Also, the index regions need not colocate with the table region.
>>>>>
>>>>>> So essentially wondering
>>>>>>> a) if the index can go out of sync with the transactional table ?
>>>>>>>
>>>>>>>
>>>>>>>  It should not.  The client should run the undos if the insert does
>>>>>>> not
>>>>>>>
>>>>>> go
>>>>>> into both tables successfully.
>>>>>>
>>>>>>
>>>>>>
>>>>>>  b) if there are errors with update to table, are the indexes rolled
>>>>>> back
>>>>>>
>>>>>>  ?
>>>>>>>
>>>>>>>
>>>>>>>  Yes.
>>>>>>>
>>>>>>
>>>>>>
>>>>>>  c) Whether there can be issues if there are parallel updates invoked
>>>>>> for
>>>>>>
>>>>>>  the same row - whether index changes end up being inconsistent with
>>>>>>> table
>>>>>>> data (due to lock not being held while updating index).
>>>>>>>
>>>>>>>
>>>>>>>  This might be possible.  There is a lock held on a row.  I'm not
>>>>>>> sure
>>>>>>>
>>>>>> if
>>>>>> the
>>>>>> lock is held on transaction table row while the update is being done
>>>>>> to
>>>>>> the
>>>>>> index table.
>>>>>>
>>>>>> This is the doc. as it stands on transactional hbase:
>>>>>>
>>>>>>
>>>>>>
>>>>>> http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/client/transactional/package-summary.html#package_description
>>>>>>
>>>>>> Here is the doc. on indexed-transactional hbase:
>>>>>>
>>>>>>
>>>>>>
>>>>>> http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/client/tableindexed/package-summary.html#package_description
>>>>>>
>>>>>> You've probably tripped over it already but just in case, it might
>>>>>> help.
>>>>>>
>>>>>>
>>>>>>  I did go through the package sumamries, thanks : which is what
>>>>> increased
>>>>> my
>>>>> confusion.
>>>>>
>>>>> My current understanding is :
>>>>>
>>>>> a) Client 'simulates' the transaction - by inspecting the state of the
>>>>> rows
>>>>> on commit and rolls back in case of conflicting updates.
>>>>>
>>>>> b) secondary index updates are transparent to client api and are
>>>>> directly
>>>>> done by the indexedregion as part of its implementation.
>>>>>
>>>>>
>>>>> If this is correct, I am wondering if overlapping rollbacks can result
>>>>> in
>>>>> secondary index going out of sync with the table since (a) does not see
>>>>> those (one update gets rolled back while another goes through - or
>>>>> variations of it).
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Mridul
>>>>>
>>>>>
>>>>>
>>>>>  St.Ack
>>>>>
>>>>>
>>>>>>  I guess they are all kind of related queries.
>>>>>>
>>>>>>>
>>>>>>> I was not able to get a clear picture from the archives, so
>>>>>>> RTFM/pointers
>>>>>>> would be helpful if this is already answered.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Mridul
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>

Re: Secondary indexes and transactions

Posted by Mridul Muralidharan <mr...@yahoo-inc.com>.
Clint Morgan wrote:
> After the 2PC process has determined that a commit should happen there is no
> roll-back. The commit must be processed.


 From org.apache.hadoop.hbase.client.transactional.TransactionManager

doCommit() which is the 2nd phase of 2-phase commit, on throwing 
Exception results in abort() which does the rollback.
And this abort specifically ignores the region which hit the error - 
thereby making the index go out of sync.


I hope I am not missing something with this assertion, since I had 
mentioned this earlier too (possibly got buried in my details ?).


Since abort is resulting in an rpc call, which results in some log 
manipulation, I left it at that and did not dig deeper - do you mean it 
actually does nothing ?


> 
> So in your example, a commit has been approved, and one the of the regions
> is told to go ahead and commit. The region triggers the index Put, but then
> fails on his Puts (like out of disk space, out of memory, etc). This should
> shutdown the RegionServer. Then when the region's WAL is recovered from, the
> trx puts from the partially-committed transaction will be there. We will
> look in the global transaction log to see that the trx is to be committed,
> and then apply the puts to the base table.


I relooked at the implementation just to make sure I got the basic issue 
right.
I did not see this behavior you mention above - of IOException resulting 
in shutting down of a region server - and quite a lot of methods 
actually could result in IOException's getting thrown when traversing 
the call-graph from indexedregion.Put's invocation (filesystem going 
missing is just one case where this happens I think - but I did not see 
this as being the only case : atleast impl/doc wise).




Anyway, to make progress, if commit failure in a indexed regionserver 
does a rollback of the txn, then the issue I mentioned can occur ?


Thanks for your patience and time !

Regards,
Mridul

> 
> -clint
> 
> On Fri, Jan 15, 2010 at 2:43 AM, Mridul Muralidharan
> <mr...@yahoo-inc.com>wrote:
> 
>> I think I might not have explained it well enough.
>> As part of executing a Put, the index update happens prior to updating the
>> underlying transactional table currently - and is done outside of the
>> lock's.
>> If the underlying transactional table update results in an exception - what
>> is the state of the index ? From what I understand, a rollback is initiated
>> - and this results in rolling back all regions - except for the one which
>> threw the exception : and so the secondary index update which happened
>> implicitly is never reverted.
>> Or am I missing something here ?
>>
>> To be clear, I am talking about the actual commit as part of the two phase
>> commit throwing an exception : not a conflict exception, but an IOException
>> or variant - which can result in the secondary index going out of sync.
>> I am contrasting it with the case of explicit indexes maintained by client
>> - where the rollback by client (when the commit fails for a region) results
>> in rollback on all the regions in the transaction - which includes the
>> seconday indexes 'visible' to the client.
>>
>>
>>
>>
>>
>> Thanks,
>> Mridul
>>
>>
>>
>>
>>
>>> If the regionserver crashes during this commit process, then I *think* it
>>> should still recover correctly. It will see the transactional operations
>>> in
>>> the WAL, and the propagate the puts into the index. However this WAL
>>> recovery stuff has been changing, and I'm not confident that it currently
>>> works in all failure cases.
>>>
>>> Does this normal case address your concerns?
>>>
>>> -clint
>>>
>>> On Sun, Jan 3, 2010 at 4:46 PM, Mridul Muralidharan
>>> <mr...@yahoo-inc.com>wrote:
>>>
>>>  stack wrote:
>>>>  On Sun, Jan 3, 2010 at 10:46 AM, Mridul Muralidharan
>>>>> <mr...@yahoo-inc.com>wrote:
>>>>>
>>>>>  I was wondering about the atomicity guarantees when using secondary
>>>>>
>>>>>> indexes from within a transaction.
>>>>>>
>>>>>>
>>>>>>  You are talking about indexed hbase from transactional hbase contrib?
>>>>>>
>>>> Yes, exactly.
>>>>
>>>>
>>>>
>>>>   From what I could gather, updates to the index table goes through its
>>>>> own
>>>>>
>>>>>> (set of) rpc before the underlying transactional table is updated - and
>>>>>> these update happens outside of the locks for the transaction table.
>>>>>>
>>>>>>
>>>>>>  Yes.  But IIUC, the client is running a transaction that spans the
>>>>> update
>>>>> to
>>>>> the two tables.  It'll take care of the undo should say the update to
>>>>> the
>>>>> transacation table fails.
>>>>>
>>>>>
>>>>>  Isn't the update to the secondary index implicitly done ? As in, does
>>>> the
>>>> client 'see' this update ?
>>>> My impression was that the secondary index update was done by the
>>>> indexedregion - and was not visible to the client : which manages occ
>>>> transaction ...
>>>>
>>>>
>>>>
>>>>
>>>>   Also, the index regions need not colocate with the table region.
>>>>>> So essentially wondering
>>>>>> a) if the index can go out of sync with the transactional table ?
>>>>>>
>>>>>>
>>>>>>  It should not.  The client should run the undos if the insert does not
>>>>> go
>>>>> into both tables successfully.
>>>>>
>>>>>
>>>>>
>>>>>  b) if there are errors with update to table, are the indexes rolled
>>>>> back
>>>>>
>>>>>> ?
>>>>>>
>>>>>>
>>>>>>  Yes.
>>>>>
>>>>>
>>>>>  c) Whether there can be issues if there are parallel updates invoked
>>>>> for
>>>>>
>>>>>> the same row - whether index changes end up being inconsistent with
>>>>>> table
>>>>>> data (due to lock not being held while updating index).
>>>>>>
>>>>>>
>>>>>>  This might be possible.  There is a lock held on a row.  I'm not sure
>>>>> if
>>>>> the
>>>>> lock is held on transaction table row while the update is being done to
>>>>> the
>>>>> index table.
>>>>>
>>>>> This is the doc. as it stands on transactional hbase:
>>>>>
>>>>>
>>>>> http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/client/transactional/package-summary.html#package_description
>>>>>
>>>>> Here is the doc. on indexed-transactional hbase:
>>>>>
>>>>>
>>>>> http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/client/tableindexed/package-summary.html#package_description
>>>>>
>>>>> You've probably tripped over it already but just in case, it might help.
>>>>>
>>>>>
>>>> I did go through the package sumamries, thanks : which is what increased
>>>> my
>>>> confusion.
>>>>
>>>> My current understanding is :
>>>>
>>>> a) Client 'simulates' the transaction - by inspecting the state of the
>>>> rows
>>>> on commit and rolls back in case of conflicting updates.
>>>>
>>>> b) secondary index updates are transparent to client api and are directly
>>>> done by the indexedregion as part of its implementation.
>>>>
>>>>
>>>> If this is correct, I am wondering if overlapping rollbacks can result in
>>>> secondary index going out of sync with the table since (a) does not see
>>>> those (one update gets rolled back while another goes through - or
>>>> variations of it).
>>>>
>>>>
>>>>
>>>> Thanks,
>>>> Mridul
>>>>
>>>>
>>>>
>>>>  St.Ack
>>>>
>>>>>
>>>>>  I guess they are all kind of related queries.
>>>>>>
>>>>>> I was not able to get a clear picture from the archives, so
>>>>>> RTFM/pointers
>>>>>> would be helpful if this is already answered.
>>>>>>
>>>>>> Thanks,
>>>>>> Mridul
>>>>>>
>>>>>>
>>>>>>


Re: Secondary indexes and transactions

Posted by Clint Morgan <cl...@troove.net>.
After the 2PC process has determined that a commit should happen there is no
roll-back. The commit must be processed.

So in your example, a commit has been approved, and one the of the regions
is told to go ahead and commit. The region triggers the index Put, but then
fails on his Puts (like out of disk space, out of memory, etc). This should
shutdown the RegionServer. Then when the region's WAL is recovered from, the
trx puts from the partially-committed transaction will be there. We will
look in the global transaction log to see that the trx is to be committed,
and then apply the puts to the base table.

-clint

On Fri, Jan 15, 2010 at 2:43 AM, Mridul Muralidharan
<mr...@yahoo-inc.com>wrote:

> I think I might not have explained it well enough.
> As part of executing a Put, the index update happens prior to updating the
> underlying transactional table currently - and is done outside of the
> lock's.
> If the underlying transactional table update results in an exception - what
> is the state of the index ? From what I understand, a rollback is initiated
> - and this results in rolling back all regions - except for the one which
> threw the exception : and so the secondary index update which happened
> implicitly is never reverted.
> Or am I missing something here ?
>
> To be clear, I am talking about the actual commit as part of the two phase
> commit throwing an exception : not a conflict exception, but an IOException
> or variant - which can result in the secondary index going out of sync.
> I am contrasting it with the case of explicit indexes maintained by client
> - where the rollback by client (when the commit fails for a region) results
> in rollback on all the regions in the transaction - which includes the
> seconday indexes 'visible' to the client.
>
>
>
>
>
> Thanks,
> Mridul
>
>
>
>
>
>> If the regionserver crashes during this commit process, then I *think* it
>> should still recover correctly. It will see the transactional operations
>> in
>> the WAL, and the propagate the puts into the index. However this WAL
>> recovery stuff has been changing, and I'm not confident that it currently
>> works in all failure cases.
>>
>> Does this normal case address your concerns?
>>
>> -clint
>>
>> On Sun, Jan 3, 2010 at 4:46 PM, Mridul Muralidharan
>> <mr...@yahoo-inc.com>wrote:
>>
>>  stack wrote:
>>>
>>>  On Sun, Jan 3, 2010 at 10:46 AM, Mridul Muralidharan
>>>> <mr...@yahoo-inc.com>wrote:
>>>>
>>>>  I was wondering about the atomicity guarantees when using secondary
>>>>
>>>>> indexes from within a transaction.
>>>>>
>>>>>
>>>>>  You are talking about indexed hbase from transactional hbase contrib?
>>>>>
>>>>
>>> Yes, exactly.
>>>
>>>
>>>
>>>   From what I could gather, updates to the index table goes through its
>>>> own
>>>>
>>>>> (set of) rpc before the underlying transactional table is updated - and
>>>>> these update happens outside of the locks for the transaction table.
>>>>>
>>>>>
>>>>>  Yes.  But IIUC, the client is running a transaction that spans the
>>>> update
>>>> to
>>>> the two tables.  It'll take care of the undo should say the update to
>>>> the
>>>> transacation table fails.
>>>>
>>>>
>>>>  Isn't the update to the secondary index implicitly done ? As in, does
>>> the
>>> client 'see' this update ?
>>> My impression was that the secondary index update was done by the
>>> indexedregion - and was not visible to the client : which manages occ
>>> transaction ...
>>>
>>>
>>>
>>>
>>>   Also, the index regions need not colocate with the table region.
>>>>
>>>>> So essentially wondering
>>>>> a) if the index can go out of sync with the transactional table ?
>>>>>
>>>>>
>>>>>  It should not.  The client should run the undos if the insert does not
>>>> go
>>>> into both tables successfully.
>>>>
>>>>
>>>>
>>>>  b) if there are errors with update to table, are the indexes rolled
>>>> back
>>>>
>>>>> ?
>>>>>
>>>>>
>>>>>  Yes.
>>>>
>>>>
>>>>
>>>>  c) Whether there can be issues if there are parallel updates invoked
>>>> for
>>>>
>>>>> the same row - whether index changes end up being inconsistent with
>>>>> table
>>>>> data (due to lock not being held while updating index).
>>>>>
>>>>>
>>>>>  This might be possible.  There is a lock held on a row.  I'm not sure
>>>> if
>>>> the
>>>> lock is held on transaction table row while the update is being done to
>>>> the
>>>> index table.
>>>>
>>>> This is the doc. as it stands on transactional hbase:
>>>>
>>>>
>>>> http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/client/transactional/package-summary.html#package_description
>>>>
>>>> Here is the doc. on indexed-transactional hbase:
>>>>
>>>>
>>>> http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/client/tableindexed/package-summary.html#package_description
>>>>
>>>> You've probably tripped over it already but just in case, it might help.
>>>>
>>>>
>>> I did go through the package sumamries, thanks : which is what increased
>>> my
>>> confusion.
>>>
>>> My current understanding is :
>>>
>>> a) Client 'simulates' the transaction - by inspecting the state of the
>>> rows
>>> on commit and rolls back in case of conflicting updates.
>>>
>>> b) secondary index updates are transparent to client api and are directly
>>> done by the indexedregion as part of its implementation.
>>>
>>>
>>> If this is correct, I am wondering if overlapping rollbacks can result in
>>> secondary index going out of sync with the table since (a) does not see
>>> those (one update gets rolled back while another goes through - or
>>> variations of it).
>>>
>>>
>>>
>>> Thanks,
>>> Mridul
>>>
>>>
>>>
>>>  St.Ack
>>>
>>>>
>>>>
>>>>  I guess they are all kind of related queries.
>>>>>
>>>>>
>>>>> I was not able to get a clear picture from the archives, so
>>>>> RTFM/pointers
>>>>> would be helpful if this is already answered.
>>>>>
>>>>> Thanks,
>>>>> Mridul
>>>>>
>>>>>
>>>>>
>

Re: Secondary indexes and transactions

Posted by Mridul Muralidharan <mr...@yahoo-inc.com>.
Clint Morgan wrote:
> The client drives the 2PC process, so after it has established that a trx
> may be committed (by asking each region), it tells each region to commit.
> Only then does it actually start to write to the base/indexed tables. So we
> don't really have a problem with "overlapping rollbacks", because a rollback
> is simply not processing the Puts.
> 
> When the client tells each region to commit, the region will process the
> Puts which will then trigger the RPCs to update the index. Transactional
> conflicts should not cause an index to get out of sync because the writes
> never happen.


I think I might not have explained it well enough.
As part of executing a Put, the index update happens prior to updating 
the underlying transactional table currently - and is done outside of 
the lock's.
If the underlying transactional table update results in an exception - 
what is the state of the index ? From what I understand, a rollback is 
initiated - and this results in rolling back all regions - except for 
the one which threw the exception : and so the secondary index update 
which happened implicitly is never reverted.
Or am I missing something here ?

To be clear, I am talking about the actual commit as part of the two 
phase commit throwing an exception : not a conflict exception, but an 
IOException or variant - which can result in the secondary index going 
out of sync.
I am contrasting it with the case of explicit indexes maintained by 
client - where the rollback by client (when the commit fails for a 
region) results in rollback on all the regions in the transaction - 
which includes the seconday indexes 'visible' to the client.





Thanks,
Mridul



> 
> If the regionserver crashes during this commit process, then I *think* it
> should still recover correctly. It will see the transactional operations in
> the WAL, and the propagate the puts into the index. However this WAL
> recovery stuff has been changing, and I'm not confident that it currently
> works in all failure cases.
> 
> Does this normal case address your concerns?
> 
> -clint
> 
> On Sun, Jan 3, 2010 at 4:46 PM, Mridul Muralidharan
> <mr...@yahoo-inc.com>wrote:
> 
>> stack wrote:
>>
>>> On Sun, Jan 3, 2010 at 10:46 AM, Mridul Muralidharan
>>> <mr...@yahoo-inc.com>wrote:
>>>
>>>   I was wondering about the atomicity guarantees when using secondary
>>>> indexes from within a transaction.
>>>>
>>>>
>>>>  You are talking about indexed hbase from transactional hbase contrib?
>>
>> Yes, exactly.
>>
>>
>>
>>>  From what I could gather, updates to the index table goes through its own
>>>> (set of) rpc before the underlying transactional table is updated - and
>>>> these update happens outside of the locks for the transaction table.
>>>>
>>>>
>>> Yes.  But IIUC, the client is running a transaction that spans the update
>>> to
>>> the two tables.  It'll take care of the undo should say the update to the
>>> transacation table fails.
>>>
>>>
>> Isn't the update to the secondary index implicitly done ? As in, does the
>> client 'see' this update ?
>> My impression was that the secondary index update was done by the
>> indexedregion - and was not visible to the client : which manages occ
>> transaction ...
>>
>>
>>
>>
>>>  Also, the index regions need not colocate with the table region.
>>>> So essentially wondering
>>>> a) if the index can go out of sync with the transactional table ?
>>>>
>>>>
>>> It should not.  The client should run the undos if the insert does not go
>>> into both tables successfully.
>>>
>>>
>>>
>>>  b) if there are errors with update to table, are the indexes rolled back
>>>> ?
>>>>
>>>>
>>> Yes.
>>>
>>>
>>>
>>>  c) Whether there can be issues if there are parallel updates invoked for
>>>> the same row - whether index changes end up being inconsistent with table
>>>> data (due to lock not being held while updating index).
>>>>
>>>>
>>> This might be possible.  There is a lock held on a row.  I'm not sure if
>>> the
>>> lock is held on transaction table row while the update is being done to
>>> the
>>> index table.
>>>
>>> This is the doc. as it stands on transactional hbase:
>>>
>>> http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/client/transactional/package-summary.html#package_description
>>>
>>> Here is the doc. on indexed-transactional hbase:
>>>
>>> http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/client/tableindexed/package-summary.html#package_description
>>>
>>> You've probably tripped over it already but just in case, it might help.
>>>
>>
>> I did go through the package sumamries, thanks : which is what increased my
>> confusion.
>>
>> My current understanding is :
>>
>> a) Client 'simulates' the transaction - by inspecting the state of the rows
>> on commit and rolls back in case of conflicting updates.
>>
>> b) secondary index updates are transparent to client api and are directly
>> done by the indexedregion as part of its implementation.
>>
>>
>> If this is correct, I am wondering if overlapping rollbacks can result in
>> secondary index going out of sync with the table since (a) does not see
>> those (one update gets rolled back while another goes through - or
>> variations of it).
>>
>>
>>
>> Thanks,
>> Mridul
>>
>>
>>
>>  St.Ack
>>>
>>>
>>>> I guess they are all kind of related queries.
>>>>
>>>>
>>>> I was not able to get a clear picture from the archives, so RTFM/pointers
>>>> would be helpful if this is already answered.
>>>>
>>>> Thanks,
>>>> Mridul
>>>>
>>>>


Re: Secondary indexes and transactions

Posted by Clint Morgan <cl...@troove.net>.
The client drives the 2PC process, so after it has established that a trx
may be committed (by asking each region), it tells each region to commit.
Only then does it actually start to write to the base/indexed tables. So we
don't really have a problem with "overlapping rollbacks", because a rollback
is simply not processing the Puts.

When the client tells each region to commit, the region will process the
Puts which will then trigger the RPCs to update the index. Transactional
conflicts should not cause an index to get out of sync because the writes
never happen.

If the regionserver crashes during this commit process, then I *think* it
should still recover correctly. It will see the transactional operations in
the WAL, and the propagate the puts into the index. However this WAL
recovery stuff has been changing, and I'm not confident that it currently
works in all failure cases.

Does this normal case address your concerns?

-clint

On Sun, Jan 3, 2010 at 4:46 PM, Mridul Muralidharan
<mr...@yahoo-inc.com>wrote:

> stack wrote:
>
>> On Sun, Jan 3, 2010 at 10:46 AM, Mridul Muralidharan
>> <mr...@yahoo-inc.com>wrote:
>>
>>   I was wondering about the atomicity guarantees when using secondary
>>> indexes from within a transaction.
>>>
>>>
>>>  You are talking about indexed hbase from transactional hbase contrib?
>>
>
>
> Yes, exactly.
>
>
>
>>
>>  From what I could gather, updates to the index table goes through its own
>>> (set of) rpc before the underlying transactional table is updated - and
>>> these update happens outside of the locks for the transaction table.
>>>
>>>
>> Yes.  But IIUC, the client is running a transaction that spans the update
>> to
>> the two tables.  It'll take care of the undo should say the update to the
>> transacation table fails.
>>
>>
>
> Isn't the update to the secondary index implicitly done ? As in, does the
> client 'see' this update ?
> My impression was that the secondary index update was done by the
> indexedregion - and was not visible to the client : which manages occ
> transaction ...
>
>
>
>
>>
>>  Also, the index regions need not colocate with the table region.
>>>
>>> So essentially wondering
>>> a) if the index can go out of sync with the transactional table ?
>>>
>>>
>> It should not.  The client should run the undos if the insert does not go
>> into both tables successfully.
>>
>>
>>
>>  b) if there are errors with update to table, are the indexes rolled back
>>> ?
>>>
>>>
>> Yes.
>>
>>
>>
>>  c) Whether there can be issues if there are parallel updates invoked for
>>> the same row - whether index changes end up being inconsistent with table
>>> data (due to lock not being held while updating index).
>>>
>>>
>>
>> This might be possible.  There is a lock held on a row.  I'm not sure if
>> the
>> lock is held on transaction table row while the update is being done to
>> the
>> index table.
>>
>> This is the doc. as it stands on transactional hbase:
>>
>> http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/client/transactional/package-summary.html#package_description
>>
>> Here is the doc. on indexed-transactional hbase:
>>
>> http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/client/tableindexed/package-summary.html#package_description
>>
>> You've probably tripped over it already but just in case, it might help.
>>
>
>
> I did go through the package sumamries, thanks : which is what increased my
> confusion.
>
> My current understanding is :
>
> a) Client 'simulates' the transaction - by inspecting the state of the rows
> on commit and rolls back in case of conflicting updates.
>
> b) secondary index updates are transparent to client api and are directly
> done by the indexedregion as part of its implementation.
>
>
> If this is correct, I am wondering if overlapping rollbacks can result in
> secondary index going out of sync with the table since (a) does not see
> those (one update gets rolled back while another goes through - or
> variations of it).
>
>
>
> Thanks,
> Mridul
>
>
>
>  St.Ack
>>
>>
>>
>>> I guess they are all kind of related queries.
>>>
>>>
>>> I was not able to get a clear picture from the archives, so RTFM/pointers
>>> would be helpful if this is already answered.
>>>
>>> Thanks,
>>> Mridul
>>>
>>>
>

Re: Secondary indexes and transactions

Posted by Mridul Muralidharan <mr...@yahoo-inc.com>.
stack wrote:
> On Sun, Jan 3, 2010 at 10:46 AM, Mridul Muralidharan
> <mr...@yahoo-inc.com>wrote:
> 
>>  I was wondering about the atomicity guarantees when using secondary
>> indexes from within a transaction.
>>
>>
> You are talking about indexed hbase from transactional hbase contrib?


Yes, exactly.

> 
> 
>> From what I could gather, updates to the index table goes through its own
>> (set of) rpc before the underlying transactional table is updated - and
>> these update happens outside of the locks for the transaction table.
>>
> 
> Yes.  But IIUC, the client is running a transaction that spans the update to
> the two tables.  It'll take care of the undo should say the update to the
> transacation table fails.
> 


Isn't the update to the secondary index implicitly done ? As in, does 
the client 'see' this update ?
My impression was that the secondary index update was done by the 
indexedregion - and was not visible to the client : which manages occ 
transaction ...


> 
> 
>> Also, the index regions need not colocate with the table region.
>>
>> So essentially wondering
>> a) if the index can go out of sync with the transactional table ?
>>
> 
> It should not.  The client should run the undos if the insert does not go
> into both tables successfully.
> 
> 
> 
>> b) if there are errors with update to table, are the indexes rolled back ?
>>
> 
> Yes.
> 
> 
> 
>> c) Whether there can be issues if there are parallel updates invoked for
>> the same row - whether index changes end up being inconsistent with table
>> data (due to lock not being held while updating index).
>>
> 
> 
> This might be possible.  There is a lock held on a row.  I'm not sure if the
> lock is held on transaction table row while the update is being done to the
> index table.
> 
> This is the doc. as it stands on transactional hbase:
> http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/client/transactional/package-summary.html#package_description
> 
> Here is the doc. on indexed-transactional hbase:
> http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/client/tableindexed/package-summary.html#package_description
> 
> You've probably tripped over it already but just in case, it might help.


I did go through the package sumamries, thanks : which is what increased 
my confusion.

My current understanding is :

a) Client 'simulates' the transaction - by inspecting the state of the 
rows on commit and rolls back in case of conflicting updates.

b) secondary index updates are transparent to client api and are 
directly done by the indexedregion as part of its implementation.


If this is correct, I am wondering if overlapping rollbacks can result 
in secondary index going out of sync with the table since (a) does not 
see those (one update gets rolled back while another goes through - or 
variations of it).



Thanks,
Mridul


> St.Ack
> 
> 
>>
>> I guess they are all kind of related queries.
>>
>>
>> I was not able to get a clear picture from the archives, so RTFM/pointers
>> would be helpful if this is already answered.
>>
>> Thanks,
>> Mridul
>>


Re: Secondary indexes and transactions

Posted by stack <st...@duboce.net>.
On Sun, Jan 3, 2010 at 10:46 AM, Mridul Muralidharan
<mr...@yahoo-inc.com>wrote:

>
>  I was wondering about the atomicity guarantees when using secondary
> indexes from within a transaction.
>
>
You are talking about indexed hbase from transactional hbase contrib?


> From what I could gather, updates to the index table goes through its own
> (set of) rpc before the underlying transactional table is updated - and
> these update happens outside of the locks for the transaction table.
>

Yes.  But IIUC, the client is running a transaction that spans the update to
the two tables.  It'll take care of the undo should say the update to the
transacation table fails.



> Also, the index regions need not colocate with the table region.
>
> So essentially wondering
> a) if the index can go out of sync with the transactional table ?
>

It should not.  The client should run the undos if the insert does not go
into both tables successfully.



> b) if there are errors with update to table, are the indexes rolled back ?
>

Yes.



> c) Whether there can be issues if there are parallel updates invoked for
> the same row - whether index changes end up being inconsistent with table
> data (due to lock not being held while updating index).
>


This might be possible.  There is a lock held on a row.  I'm not sure if the
lock is held on transaction table row while the update is being done to the
index table.

This is the doc. as it stands on transactional hbase:
http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/client/transactional/package-summary.html#package_description

Here is the doc. on indexed-transactional hbase:
http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/client/tableindexed/package-summary.html#package_description

You've probably tripped over it already but just in case, it might help.
St.Ack


>
>
> I guess they are all kind of related queries.
>
>
> I was not able to get a clear picture from the archives, so RTFM/pointers
> would be helpful if this is already answered.
>
> Thanks,
> Mridul
>